Re: Cacheblocksonwrite not working during compaction?

2019-09-20 Thread Vladimir Rodionov
ill in a BLOCKED state on the same input > stream object an hour later! This is after a 3.0 GB compaction was already > done. If prefetching was happening, then something seems wrong if it takes > an hour to populate 3.0 GB worth of data in a local disk cache from S3. > > I appreciate

Re: Cacheblocksonwrite not working during compaction?

2019-09-20 Thread Vladimir Rodionov
>>- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does it only work for flushing and not for compaction? I can see from the logs that the file is renamed >>after being written. Does that have something to do with why cacheblocksonwrite isn't working? Generally, it is a very bad idea

Re: Need to apply patch HBASE-8163 on Hbase 1.4.10

2019-07-02 Thread Vladimir Rodionov
HBase 1.4.10 should have MSLAB already. I presume. I personally have never used 1.3.x and 1.4.x but I think if 1.1.x has this feature (its probably there since 0.96?) 1.3 and 1.4 must have it as well. -Vlad On Mon, Jul 1, 2019 at 10:20 PM Syni Guo wrote: > > Hi , > > I want to use MSLAB feature

Re: Does TTL works on Column Qualifier and not Column Family level.

2019-03-13 Thread Vladimir Rodionov
on qualifier level and not family > level. > > On Wed 13 Mar, 2019, 23:17 Vladimir Rodionov, > wrote: > > > I do not see a difference between " TTL acts per column qualifier level " > > and "TTL per column qualifier" > > So, the answer is still

Re: Does TTL works on Column Qualifier and not Column Family level.

2019-03-13 Thread Vladimir Rodionov
ts per column qualifier level and not at family > level. > > The qualifier that is added last is deleted last even if its a part of same > family. > > How can i get rid of all columns based on the first addition ? > > On Wed 13 Mar, 2019, 23:10 Vladimir Rodionov, > wrote: &g

Re: Aggregation

2019-03-13 Thread Vladimir Rodionov
Hi, Jean-Marc I am mot aware about implementation of #2 in HBase. In RocksDB there is a Merge operator which does exactly what you need. It can be done in HBase as well with a help of a specialized coprocessor. RocksDB Merge: https://github.com/facebook/rocksdb/wiki/Merge-Operator -Vlad On Wed,

Re: Does TTL works on Column Qualifier and not Column Family level.

2019-03-13 Thread Vladimir Rodionov
No, columns are not part of HBase table schema , so there is no way to set TTL per column - only per column family. -Vlad On Wed, Mar 13, 2019 at 4:33 AM Vikash Agarwal wrote: > TTL for Hbase Tables works on Column Qualifier and does not delete updated > Columns > https://stackoverflow.com/q/55

Re: question on snapshot and export utility

2018-09-05 Thread Vladimir Rodionov
No, it is not, to my best knowledge. ExportSnapshot just move files to new destination using M/R job. But, you can do the custom filtering yourself. Look at ExportSnapshot implementation. All you need is a new Mapper which does required filtering of a HFile before moving data to a destination. -Vl

Re: Extremely high CPU usage after upgrading to Hbase 1.4.4

2018-09-04 Thread Vladimir Rodionov
Hi, Srinidhi Next time you will see this issue, take jstack of a RS several times in a row. W/o stack traces it is hard to tell what was going on with your cluster after upgrade. -Vlad On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla wrote: > Hello all, > > We are currently running Hbase 1.3

Re: how to get random rows from a big hbase table faster

2018-04-12 Thread Vladimir Rodionov
1% from 1B is 10M. 10M random reads is doable if : a. Cluster is sufficiently large b. equipped with SSDs c. you run multiple clients in parallel to retrieve these rows You need to know in advance min/max rows in a table, then generate randomly start row and open scanner with this start row, then

Re: Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-21 Thread Vladimir Rodionov
>>So my question is whether taking a snapshot is supposed to work even with >>regions being moved around. In our case it is usually only a couple here >>and there. No, if region was moved, split or merged during snapshot operation - snapshot will fail. This is why taking snapshots on a large table

Re: OutOfMemoryError: Direct buffer memory on PUT

2017-10-10 Thread Vladimir Rodionov
x.htm : > > > > java -XX:MaxDirectMemorySize=2g myApp > > > > Default Value > > > > The default value is zero, which means the maximum direct memory is > > unbounded. > > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov < > > vladrod

Re: OutOfMemoryError: Direct buffer memory on PUT

2017-10-10 Thread Vladimir Rodionov
Size is set to the default 0, which means unlimited as far > as I can tell. > Thanks, > Daniel > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov : > > > Have you try to increase direct memory size for server process? > > -XXMaxDirectMemorySize=? > > > >

Re: OutOfMemoryError: Direct buffer memory on PUT

2017-10-09 Thread Vladimir Rodionov
Have you try to increase direct memory size for server process? -XXMaxDirectMemorySize=? On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński wrote: > Hello, > I'm running an application doing a lot of Puts (size anywhere between 0 and > 10MB, one cell at a time); occasionally I'm getting an error li

Re: HBase Lease Exceptions

2017-10-04 Thread Vladimir Rodionov
The first is just INFO message (no impact) responseTooSlow indicates that your RS is under high load or experiences large GC pauses. Check JvmPauseMonitor messages in RS log file. On Wed, Oct 4, 2017 at 11:48 AM, alwin james wrote: > Hi Experts - > > I see lot of following messages in my region

Re: Slow HBase write across data center

2017-06-29 Thread Vladimir Rodionov
SND/RECEIVE buffer sizes (TCP/IP) must be increased from default 8KB for WAN communications. That is TCP/IP feature Google "TCP/IP send receive buffer size for WAN" -Vlad On Thu, Jun 29, 2017 at 11:27 AM, Steve Howard wrote: > 2 seconds for 100 puts that aggregate 50mb is 20 milliseconds per p

Re: Setting TTL at the row level

2017-06-21 Thread Vladimir Rodionov
Should work On Wed, Jun 21, 2017 at 11:31 AM, wrote: > Hi all, > > I know it is possible to set TTL in HBase at the column family level - > which makes HBase delete rows in the column family when they reach a > certain age. > > Rather than expire a row after it's reached a certain age, I would l

Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread Vladimir Rodionov
017 at 5:20 PM, Vladimir Rodionov > > wrote: > > > When Master attempt to assign region to RS and assignment fails, there > > should be something in RS log file (check errors), > > that explains reason of a failure. > > > > How many not-assigned region do you h

Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread Vladimir Rodionov
wrote: > Are dead region servers to blame? Is this possibly stale information in > the ZK? > > ____ > From: Vladimir Rodionov > Sent: Tuesday, May 23, 2017 12:20:16 PM > To: user@hbase.apache.org > Subject: Re: Regions in Transition: FAILED_CLO

Re: Regions in Transition: FAILED_CLOSE status

2017-05-23 Thread Vladimir Rodionov
You should check RS logs to see why regions can not be assigned. Get RS name from master log and check RS log -Vlad On Tue, May 23, 2017 at 11:47 AM, jeff saremi wrote: > Our write code throws exceptions like the following: > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException

Re: Compaction monitoring

2017-05-06 Thread Vladimir Rodionov
The major issue with HBase compactions not an excessive CPU or IO usage but excessive temporary (garbage) objects creation, which results in a more frequent GC failures and in a some cases - RS shut downs due to long GC pauses. That is why so important to keep compactions under control: disable

Re: HBase as a file repository

2017-04-03 Thread Vladimir Rodionov
t could work, it seems > overly complicated compared to implementing a custom hbase client for what > HBase offers already. > Thanks, > Daniel > > 2017-03-31 19:25 GMT+02:00 Vladimir Rodionov : > > > Use HBase as a file system meta storage (index), keep files in a

Re: HBase as a file repository

2017-03-31 Thread Vladimir Rodionov
Use HBase as a file system meta storage (index), keep files in a large blobs on hdfs, have periodic compaction/cleaning M/R job to purge deleted files. You can even keep multiple versions of files. -Vlad On Thu, Mar 30, 2017 at 11:22 PM, Jingcheng Du wrote: > Hi Daniel, > > I think it is becaus

Re: Question in WALEdit

2017-03-22 Thread Vladimir Rodionov
a) HBase does not support transaction - it only guarantees that single mutation to a row-key is atomic. WALEdit can contains cells (mutations) from different rows (for example when you do butchMutatate all operations go to the same WALEdit afaik) b) I coud not find postWALEdit() in RegionObserver

Re: Need guidance on Custom Compaction Policy

2017-03-22 Thread Vladimir Rodionov
Older files will be purged by default HBase compactor if all data inside expired (you have TTL for data?) As for custom compaction policy you can refer to FIFOCompactionPolicy class to get the idea how custom compaction works. -Vlad On Wed, Mar 22, 2017 at 12:29 PM, jeff saremi wrote: > I men

Re: how to optimize for heavy writes scenario

2017-03-17 Thread Vladimir Rodionov
>> In my opinion, 1M/s input data will result in only 70MByte/s write Times 3 (default HDFS replication factor) Plus ... Do not forget about compaction read/write amplification. If you flush 10 MB and your max region size is 10 GB, with default min file to compact (3) your amplification is 6-7

Re: Hbase on HDFS versus Cassandra

2016-11-30 Thread Vladimir Rodionov
Mich, this is a wrong group for your question. We are not Cassandra experts either and even if we are - we love HBase more :) -Vlad On Wed, Nov 30, 2016 at 7:02 AM, Mich Talebzadeh wrote: > Hi Guys, > > Used Hbase on HDFS reasonably well. Happy to to stick with it and more with > Hive/Phoenix

Re: [RegionServer Dead] Identify HBase Table Cause RegionServer Dead(Version 1.0.0-cdh5.5.2)

2016-09-22 Thread Vladimir Rodionov
Your RS was declared dead because of a long GC. What you can do: 1. Tweak CMS config: -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSInitiatingOccupancyFractionOnly. Plus increase heap size accordingly to accommodate decreasing of a working set size (now CMS starts when 50% of heap is occupied). I

Re: Increased response time of hbase calls

2016-09-21 Thread Vladimir Rodionov
m7 is vendor - specific version of HBase (MapR) You better ask this question on MapR user list. -Vlad On Wed, Sep 21, 2016 at 10:28 PM, Ted Yu wrote: > Which hbase release are you using ? > > Can you tell us the values for handler related config such as > hbase.regionserver.handler.count ? > >

Re: what causes hbase client to open large number of connectoins?

2016-09-09 Thread Vladimir Rodionov
They are 30-40 sec apart. Can you tell us how did you come up with 14K connections? -Vlad On Fri, Sep 9, 2016 at 4:52 PM, Frank Luo wrote: > I have observed a very weird behavior that crashes zookeepers. > > I have one job that uses 100 reducers to perform puts. The same code works > fine with

Re: Hbase Heap Size problem and Native API response is slow

2016-08-27 Thread Vladimir Rodionov
>> Problem is its very slow rows are not indexed by column qualifier, and you need to scan all of them. I suggest you consider different row-key design or add additional index-table for your table. -Vlad On Sat, Aug 27, 2016 at 4:12 AM, Manjeet Singh wrote: > Hi All, > > can anybody suggest me

Re: How to Speed up Prefix scan on column qualifier

2016-08-24 Thread Vladimir Rodionov
If you are on HBase 1.+ you can use Scan API: setRowPrefixFilter(byte[] rowPrefix) -Vlad On Wed, Aug 24, 2016 at 5:28 AM, Ted Yu wrote: > Please use the following API to set start row before calling > hTable.getScanner(scan): > > public Scan setStartRow(byte [] startRow) { > > On Wed, Aug

Re: Hbase federated cluster for messages

2016-08-19 Thread Vladimir Rodionov
e is particular reason. How facebook can handle > such a huge amount of ops without federation? I don't think that they just > have one namenode server and one standby namenode server. It isn't > possible. I am sure that they use federation. > > On Fri, Aug 19, 2016 at 10

Re: Hbase federated cluster for messages

2016-08-19 Thread Vladimir Rodionov
>> I am not sure how to do it but I have to configure federated cluster with >> hbase to store huge amount of messages (client to client) (40% writes, 60% >> reads). Any particular reason for federated cluster? How huge is huge amount and what is the message size? -Vladimir On Fri, Aug 19, 2016

Re: get first row of every region

2016-08-01 Thread Vladimir Rodionov
it means that for some regions you do not have any data. -Vlad On Mon, Aug 1, 2016 at 7:21 PM, jinhong lu wrote: > Thanks. Here is my code, but in most case, r is null? why this happened? > > byte[] startRowkey = > regionInfo.getStartKey(); >

Re: Intermittent flush delay warning in log

2016-07-26 Thread Vladimir Rodionov
That is normal and log level is INFO, btw. Flash request is put into a queue with a random delay. -Vlad On Tue, Jul 26, 2016 at 9:04 PM, Rural Hunter wrote: > We are using hbase 0.98.16. I noticed these flush delay warning in logs: > 2016-07-27 07:56:54,965 INFO [regionserver60020.periodicFlus

Re: what causes "Memst oreFlusherChore requesting flush" messages?

2016-06-17 Thread Vladimir Rodionov
This is done to prevent runaway WAL files. WAL file can't be deleted if some unflushed edits from this file exist in RS memstore. In case of a high load this may lead to accumulation of a large number of WAL files in a file system. -Vlad On Fri, Jun 17, 2016 at 12:49 PM, Vladimir Rod

Re: what causes "Memst oreFlusherChore requesting flush" messages?

2016-06-17 Thread Vladimir Rodionov
>> has an old edit Yes, this is what PeriodicMemstoreFlusher does: it checks if there are edits older than (1h, but configurable) in memstore and flushes this memstore if find old edits. -Vlad On Fri, Jun 17, 2016 at 11:49 AM, Ted wrote: > Wondering if anyone knows the cause of messages like :

Re: dfs.block.size recommendations for HBase

2016-06-01 Thread Vladimir Rodionov
>> Does >> the datanode need to read that entire block when HBase tries to fetch data >> from it? No. -Vlad On Wed, Jun 1, 2016 at 8:51 AM, Bryan Beaudreault wrote: > Hello, > > There is very little information that I can find online with regards to > recommended dfs.block.size setting for H

Re: Snapshot performance and helper script

2016-05-18 Thread Vladimir Rodionov
Snapshots are light when you take them, but not that light when you export them. If you do not do export and only need to protect against user errors - fine, otherwise, bear in mind that export snapshot is M/R job and it materializes (copies) all your data to another location Another possible prob

Re: HBase Write Performance Under Auto-Split

2016-04-27 Thread Vladimir Rodionov
Every split results in major compactions for both daughter regions. Concurrent major compactions across a cluster is bad. I recommend you to set DisabledRegionSplitPolicy on your table(s) and run splits manually - you will have control on what and when should be split. The same is true for major co

Re: region server crashed several times in a week

2016-04-22 Thread Vladimir Rodionov
I have two questions: >>1. Why JVM pause while NO GCs? >>2016-04-22 14:32:54,330 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately >>37360ms >>No GCs detected Your system probably swaps >> 2. We have mapreduce jobs running on the c

Re: Rows per second for RegionScanner

2016-04-21 Thread Vladimir Rodionov
Try disabling block encoding - you will get better numbers. >> I mean per region scan speed, Scan performance depends on # of CPU cores, the more cores you have the more performance you will get. Your servers are pretty low end (4 virtual CPU cores is just 2 hardware cores). With 32 cores per no

Re: Retiring empty regions

2016-04-20 Thread Vladimir Rodionov
use it as their source of authority. > > Thanks for the thoughts folks. > -n > > On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > > > ;) That was not the question ;) > > > > So Nick, merge on 1.1 is not recommended??

Re: Balancing reads and writes

2016-04-16 Thread Vladimir Rodionov
There are separate RPC queues for read and writes in 1.0+ (not sure about 0.98). You need to set sizes of these queues accordingly. -Vlad On Sat, Apr 16, 2016 at 4:23 PM, Kevin Bowling wrote: > Hi, > > Using OpenTSDB 2.2 with its "appends" feature, I see significant impact on > read performance

Re: To Store Large Number of Video and Image files

2016-04-16 Thread Vladimir Rodionov
>> have a project that needs to store large number of image and video files, >>the file size varies from 10MB to 10GB, the initial number of files will be >>0.1 billion and would grow over 1 billion, what will be the practical >>recommendations to store and view these files? >> Files are immutable

Re: Major compaction

2016-04-04 Thread Vladimir Rodionov
>> Why I am trying to understand this is because Hbase also sets it to 24 hour default (for time based compaction) and I am looking to lower it to say >> 20 mins to reduce stress by spreading the load. The more frequently you run major compaction the more IO (disk/network) you consume. Usually, i

Re: Retiring empty regions

2016-04-01 Thread Vladimir Rodionov
>> This is something >> which makes it far less useful for time-series databases with short TTL on >> the tables. With a right row-key design you will never have empty regions due to TTL. -Vlad On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov wrote: > Crazy idea, but you might be able to take

Re: deleting row for aging purpose

2016-03-12 Thread Vladimir Rodionov
>> Is that crazy idea? kind of :) Why do not use cell's timestamps and column family TTL for that purpose? Set explicitly KV's timestamp for every data point you insert to be equals to a data point's timestamp. Your data will be automatically purged during compaction on TTL expiration. -Vlad

Re: Anyone using HBase-1.1.3 with HBaseWD

2016-03-12 Thread Vladimir Rodionov
Ultimate thing for salting is Apache Phoenix :) https://phoenix.apache.org -Vlad On Fri, Mar 11, 2016 at 11:52 PM, Parsian, Mahmoud wrote: > I checked out HBaseWD from https://github.com/sematext/HBaseWD. > It is written over 3 years ago against H

Re: Hourly performance degradation on HBase cluster

2016-03-11 Thread Vladimir Rodionov
ng around that time. > > Matteo > > > On Fri, Mar 11, 2016 at 3:25 PM, Vladimir Rodionov > > wrote: > > > HBASE-13202 > > > > and Matteo Bertozzi can answer your question? > > > > -Vlad > > > > > > > > > > >

Re: Hourly performance degradation on HBase cluster

2016-03-11 Thread Vladimir Rodionov
HBASE-13202 and Matteo Bertozzi can answer your question? -Vlad there is On Fri, Mar 11, 2016 at 3:02 PM, Ganesh V wrote: > Hello- > > I am evaluating HBase for storage of time-series data and see periodic > degradation of the cluster happening at the start of the hour, every hour, > for

Re: HBase poor write performance

2016-03-08 Thread Vladimir Rodionov
hbase has too many knobs to tune and selection of right ones depends on a use case (heavy write continuous, heavy writes burst mode, neavy writes/reads etc) and available hardware. General recommendations: 1. Try to load data in bulk 2. Presplit tables in advance and avoid splitting after that (s

Re: disable major compaction per table

2016-02-16 Thread Vladimir Rodionov
1.does major compaction in hbase runs per table basis. Per Region 2.By default every 24 hours? In older versions - yes. Current (1.x+) - 7 days 3.Can I disable automatic major compaction for few tables while keep it enable for rest of tables? yes, you can. You can set hbase.hregion.majorcomp

Re: conditional atomic operations

2016-01-26 Thread Vladimir Rodionov
It is worth taking look at: https://issues.apache.org/jira/browse/HBASE-10390 -Vlad On Tue, Jan 26, 2016 at 11:26 AM, Yakubovich, Alexey < alexey.yakubov...@searshc.com> wrote: > I notices an unanswered question: “Check and Put method used with > a long)”. I would like to ask it in a slightly d

Re: data write/read consistency issue

2016-01-25 Thread Vladimir Rodionov
Correction: not always visible. To make sure that they are visible - call the table flush or setAutoFlush(true) (not recommended) -Vlad On Mon, Jan 25, 2016 at 11:21 AM, Vladimir Rodionov wrote: > Writes are not visible until client calls flush on a table. > > -Vlad > > On Mo

Re: data write/read consistency issue

2016-01-25 Thread Vladimir Rodionov
Writes are not visible until client calls flush on a table. -Vlad On Mon, Jan 25, 2016 at 9:18 AM, Stack wrote: > Try some more basic and see if you can reproduce the failure to > read-your-own-writes. Do a put, flush of the writes, then a get. Past too > your addTagColumn code. Thanks. > St.Ac

Re: [CIS-CMMI-3] IllegalArgumentException: Row length 41221 is > 32767

2016-01-21 Thread Vladimir Rodionov
Maximum size of row in HBase is 32767 and your application is trying to get rows which exceed this limit. You probably better ask your question in gora/nutch user group. -Vlad On Thu, Jan 21, 2016 at 5:39 AM, Kshitij Shukla wrote: > Hello everyone, > > Software stack is *nutch-branch-2.3.1, go

Re: Java API vs Hbase Thrift

2016-01-14 Thread Vladimir Rodionov
>> I have to access hbase using Java API will it be fast like thrift. Bear in mind that when you use Thrift Gateway/Thrift API you access HBase RegionServer through the single gateway server, when you use Java API - you access Region Server directly. Java API is much more scalable. -Vlad On Tue,

Re: When compactions become major ones

2016-01-05 Thread Vladimir Rodionov
>>And I still dont understand how the store files resulting after memstore >>flushes are having a size of 40MB. Does it hove smth to do with memstore >>upper limit and these 42MB are the result of forcing the memstore to be >>flushed? The problem is that all the newly store files added to HDFS are

Re: Disk usage drops after RegionServer restart? (0.98)

2015-12-01 Thread Vladimir Rodionov
I think this is because some files with open handles get deleted. The space can be reclaimed only when process exit. This is known "feature" of Linux. -Vlad On Mon, Nov 30, 2015 at 10:56 PM, Stack wrote: > Thanks for writing back Otis. What was your CP doing? > St.Ack > > On Sat, Nov 28, 2015 a

Re: Slow reads coinciding with higher compaction time avg time

2015-11-02 Thread Vladimir Rodionov
. -Vlad On Mon, Nov 2, 2015 at 4:20 PM, Girish Joshi wrote: > Thanks. Do you have any specific suggestions to avoid swapping during hbase > compactions. > > Thanks, > > Girish. > > On Sun, Nov 1, 2015 at 6:25 PM, Vladimir Rodionov > wrote: > > > >>- T

Re: Slow reads coinciding with higher compaction time avg time

2015-11-01 Thread Vladimir Rodionov
>>- There is a spike in compaction time avg time metric. At the same time the >>swap bytes in and swap bytes out also have higher value. Swapping is bad. You have to avoid it. -Vlad On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi wrote: > Hello > > In my hbase cluster, I observe the following co

Re: Slow region moves

2015-10-22 Thread Vladimir Rodionov
Good to know :) -Vlad On Thu, Oct 22, 2015 at 9:40 AM, Randy Fox wrote: > Hi Vlad, > > So far patch seems to work perfectly. > > -randy > > > > > On 10/21/15, 12:52 PM, "Vladimir Rodionov" wrote: > > >Randy, > > > >You can try

Re: Slow region moves

2015-10-21 Thread Vladimir Rodionov
Randy, You can try patch I just submitted. It is for master but I verified it on 1.0 branch as well. -Vlad On Wed, Oct 21, 2015 at 11:40 AM, Randy Fox wrote: > https://issues.apache.org/jira/browse/HBASE-14663 > > -r > > > > On 10/21/15, 10:35 AM, "Vladimir Rodi

Re: Slow region moves

2015-10-21 Thread Vladimir Rodionov
Reader(true); > return null; > } > }); > } > > > Where does that setting come into play? > > -r > > > > > > On 10/21/15, 8:14 AM, "Vladimir Rodionov" wrote: > > >I wonder why disabling cache eviction on close does not work in a case of > a > &g

Re: Slow region moves

2015-10-21 Thread Vladimir Rodionov
enabled - it took 19 minutes. To > turn > >> block cache back on took 4.3 seconds. > >> > >> Let me know if there is anything else to try. This issue is really > >> hurting our day to day ops. > >> > >> Thanks, > >> > >>

Re: Slow region moves

2015-10-15 Thread Vladimir Rodionov
Hey, Randy You can verify your hypothesis by setting hbase.rs.evictblocksonclose to false for your tables. -Vlad On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox wrote: > Caveat - we are trying to tune the BucketCache (probably a new thread - as > we are not sure we are getting the most out of it) >

Re: Slow region moves

2015-10-15 Thread Vladimir Rodionov
That is not 0.94 vs 1.0 issue - its BucketCache vs LRUCache issue. It seems that, BucketCache.freeBlock is expensive (and source of contention). It is the issue. Please open JIRA. -Vlad On Thu, Oct 15, 2015 at 11:29 AM, Randy Fox wrote: > Here is the region server log distilled down to events

Re: Minor compacts seem blocked on major compacts

2015-10-06 Thread Vladimir Rodionov
>> If I read this correct the size is 4.8G and the throttle is 2.5G so it should have been put into the Large compaction pool. You answered your question yourself. Minor compaction in the same pool (1 thread default) will be waiting until major is finished. -Vlad On Tue, Oct 6, 2015 at 3:59 PM,

Re: Question about reading data very recently written to hbase

2015-10-05 Thread Vladimir Rodionov
HBase writes are consistent. Writes are available immediately only after table's flush on a client side. (HTable.flushCommits()) -Vlad On Mon, Oct 5, 2015 at 1:30 PM, Mike Thomsen wrote: > My team has a set of web services that read data from HBase and prepare it > to be exported as a repor

Re: Large number of column qualifiers

2015-09-23 Thread Vladimir Rodionov
767 bytes. > >* @return The array containing the qualifier bytes. > >*/ > > byte[] getQualifierArray(); > > > > On Wed, Sep 23, 2015 at 11:43 PM, Vladimir Rodionov < > > vladrodio...@gmail.com> wrote: > > > >> Check KeyValue class (Cell&

Re: Large number of column qualifiers

2015-09-23 Thread Vladimir Rodionov
Check KeyValue class (Cell's implementation). getQualifierArray() returns kv's backing array. There is no SHORT limit on a size of this array, but there are other limits in HBase - maximum KV size, for example, which is configurable, but, by default, is 1MB. Having 50K qualifiers is a bad idea. Co

Re: hbase thrift connection drops / refuses after warning

2015-09-18 Thread Vladimir Rodionov
Can it be https://issues.apache.org/jira/browse/HBASE-14196? Do you idle your Thrift server for more than 10 min sometimes? -Vlad On Fri, Sep 18, 2015 at 4:23 PM, Nick Dimiduk wrote: > The thrift deamon just stalls? You don't see this on 0.98.6 -- do you ever > see retries exhausted exceptions

Re: Multiwal performance with HBase 1.x

2015-09-18 Thread Vladimir Rodionov
Hi, Jingcheng You postpone compaction until your test completes by setting number of blocking stores to 120. That is kind of cheating :) As I said previously, in a long run, compaction rules the world - not number of wal files. In a real production setting, the existing write performance is more t

Re: Multiwal performance with HBase 1.x

2015-09-17 Thread Vladimir Rodionov
Compaction is no longer a serious problem? Close to 850GB/h ingest rate for 3 nodes cluster? Effect of 8 SSDs? Looks great, but ... This is 78MB/sec load rate per server, taking into account compaction write amplification looks unbelievable even for 8 SSD. Can you publish test/hardware configura

Re: Multiwal performance with HBase 1.x

2015-09-16 Thread Vladimir Rodionov
Yes, sure you can pre-split tables to avoid splitting during test. You will need at least 200 regions with default max size 10GB and ConstantSizeRegionSplitPolicy. Its 70 regions per server. I would create 300 regions with default size, or 150 with double size to guarantee splitless run - it is up

Re: Multiwal performance with HBase 1.x

2015-09-15 Thread Vladimir Rodionov
Please load 10TB and publish performance numbers. Should be pretty close. -Vlad On Tue, Sep 15, 2015 at 7:44 PM, Jingcheng Du wrote: > I did some tests on multiple wal, but with a much smaller data size, only > 100G. > I have 1 HMaster and 3 RegionServers. Each node has 8 SSD. > When I ran the

Re: Defining WAL location within HDFS outside of rootdir

2015-09-14 Thread Vladimir Rodionov
Hi, Anthony Short answer - it is not possible. Feel free to submit a patch. Make sure you find all places in the code where log / old directory path is constructed. I think Mattheo Bartozzi is working on a unified RS file system so you will need to align your patch with his work. -Vlad On Mon, Se

Re: Wide table vs narrower table with blob

2015-09-10 Thread Vladimir Rodionov
It depends on your read pattern. If you mostly read small subset of columns (you have a lot of them) both approaches are bad. You will need to scan all your columns and deserialize blobs to extract only few of them (that is 5MB at least). Consider adding more data (columns) to rowkey and using Fuzz

Re: Multiwal performance with HBase 1.x

2015-09-03 Thread Vladimir Rodionov
Clark wrote: > On Thu, Sep 3, 2015 at 1:53 PM, Vladimir Rodionov > wrote: > > > In a long run, it does not matter how many WAL files per RS do you have > > > That's not been my experience in production. We got more write throughput > when adding in more regions

Re: Multiwal performance with HBase 1.x

2015-09-03 Thread Vladimir Rodionov
In a long run, it does not matter how many WAL files per RS do you have, unless you disable compaction completely. You load rate will decrease over time as 1/LOG(S), where S is RS data size, base of LOG is something between min/max files for compaction ( 3 - 10). This is because of compaction. Comp

Re: HBase schema design

2015-08-27 Thread Vladimir Rodionov
_ is better (Long.MAX_VALUE - time) - most recent events will come first during scan. This will allow you to do efficient time range queries by user_id and start and end time. -Vlad On Thu, Aug 27, 2015 at 11:58 AM, Buntu Dev wrote: > I'm planning on writing a time series of user action events

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

2015-08-24 Thread Vladimir Rodionov
to > > stay with 512MB. I don't think 800MB is a good idea... > > > > JM > > > > 2015-08-24 13:23 GMT-04:00 Vladimir Rodionov : > > > > > 1. How many regions per RS? > > > 2. What is your dfs.block.size? > > > 3. What is your h

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

2015-08-24 Thread Vladimir Rodionov
1. How many regions per RS? 2. What is your dfs.block.size? 3. What is your hbase.regionserver.maxlogs? Flush can be requested when: 1. Region size exceeds hbase.hregion.memstore.flush.size 2. Region's memstore is too old (periodic memstore flusher checks the age of memstore, default is 1hour) Co

Re: Major compaction skipping for older regions

2015-08-20 Thread Vladimir Rodionov
Hey, this looks like a BUG. You may try this little hack (hbase-site.xml): hbase.hstore.min.locality.to.skip.major.compact=1.1 Any value above 1.0 should work -Vlad On Thu, Aug 20, 2015 at 5:04 AM, mukund murrali wrote: > Any update from anyone on this? We are invoking major compaction manua

Re: mornitor the hbase

2015-08-20 Thread Vladimir Rodionov
Upgrade first. Current is 1.1.1 Apache, 1.1.1 HDP 2.3 -Vlad On Thu, Aug 20, 2015 at 5:42 AM, Ted Yu wrote: > You can use opentsdb / ganglia > > Cheers > > > > > On Aug 20, 2015, at 5:25 AM, jackiehbaseuser > wrote: > > > > Hi > > > > I want to the mornitor the Hbase(based hbase 0.96.2),which

Re: Hbase compound filter support

2015-08-17 Thread Vladimir Rodionov
FilterList supports both operators: AND and OR. -Vlad On Mon, Aug 17, 2015 at 1:26 PM, Shahab Yunus wrote: > As far as I know, yes. I think this is more flexible and gives you > arbitrarily more combinations. > > Regards, > Shahab > > On Mon, Aug 17, 2015 at 4:24 PM, wrote: > > > Thanks for th

Re: PerformanceEvaluation randomWrite and custom "put" test

2015-08-13 Thread Vladimir Rodionov
--autoFlush=true is ignored in PE. Make sure you run your own tests with autoFlush= false in HTable. -Vlad On Thu, Aug 13, 2015 at 1:22 AM, Serega Sheypak wrote: > Hi. I used PerformanceEvaluation randomWrite for perf measurement. > > Here are my metrics: > > -- Timers > --

Re: Spikes when writing data to HBase

2015-08-12 Thread Vladimir Rodionov
93d13e56e > > 2015-08-12 4:25 GMT+02:00 Vladimir Rodionov : > > > Can you post code snippet? Pastbin link is fine. > > > > -Vlad > > > > On Tue, Aug 11, 2015 at 4:03 PM, Serega Sheypak < > serega.shey...@gmail.com> > > wrote: > > >

Re: Spikes when writing data to HBase

2015-08-11 Thread Vladimir Rodionov
nectionManager is > > multithreaded env, when same servlet instance shared across many threads > > 4. some mystic process somewhere in the cluster > > > > >Is the issue reproducible? or you got it first time? > > always. Spikes disappear during night, but RPM doe

Re: Spikes when writing data to HBase

2015-08-11 Thread Vladimir Rodionov
erLimit=0.38 > RS heapsize=8GB > > >*Do you see any region splits? * > no, never happened since tables are pre-splitted > > 2015-08-11 18:54 GMT+02:00 Vladimir Rodionov : > > > *Common questions:* > > > > > >1. How large is your single write? &g

Re: Spikes when writing data to HBase

2015-08-11 Thread Vladimir Rodionov
Monitor GC events (application stop time). Your RS may have nonoptimal hotspot GC settings. Search Internet on how to tune GC large heaps. -Vlad On Tue, Aug 11, 2015 at 9:54 AM, Vladimir Rodionov wrote: > *Common questions:* > > >1. How large is your single write? >2.

Re: Spikes when writing data to HBase

2015-08-11 Thread Vladimir Rodionov
*Common questions:* 1. How large is your single write? 2. Do you see any RegionTooBusyException in a client log files 3. How large is your table ( # of regions, # of column families) 4. RS memory related config: Max heap 5. memstore size (if not default - 0.4) Memstore flush hba

Re: understanding the executorService

2015-08-10 Thread Vladimir Rodionov
You can use single (default) pool for connection in your application: ConnectionFactory.createConnection(Configuration conf); or, provide separate pool per connection: ConnectionFactory.createConnection(Configuration conf, ExecutorService pool); or, provide dedicate pool for a specific table:

Re: Get using `.addColumn()` and `.setFilter()`

2015-08-03 Thread Vladimir Rodionov
s more filtration. I hadn't seen > > MultipleColumnPrefixFilter but I have seen FilterList which I found can > be > > used to combine ColumnPrefixFilter and QualifierFilter for the desired > > effect. But this uses filtration. Does this filtration scale? > > > &g

Re: Get using `.addColumn()` and `.setFilter()`

2015-08-03 Thread Vladimir Rodionov
Get get = new Get(row) .addFamily(FAMILY) .setFilter(new ColumnPrefixFilter(Bytes.toBytes("sess"))); should work. -Vlad On Mon, Aug 3, 2015 at 2:41 PM, Ted Yu wrote: > Is there column with prefix 'name' whose column name is longer than 'name' > (such as 'name0') ? > > I

Re: scan column families with different time ranges

2015-08-01 Thread Vladimir Rodionov
I think TimeRange is handled higher, when region scanner is created. With data size in B 100x smaller than in A, I do not understand where is a source of IO bottleneck? On Aug 1, 2015 9:16 AM, "Andrew Purtell" wrote: > Hi Dave, > > > Would HBase be willing to accept updating Scan to have differe

Re: JvmPauseMonitor

2015-07-24 Thread Vladimir Rodionov
Hi, jeevi Is there any reason you are testing ancient, unsupported version of HBase? -Vlad On Fri, Jul 24, 2015 at 3:23 AM, jeevi tesh wrote: > I'm aware Gc is a common event i have checked in logs several times it has > happened before. But few time when it happens i also get this following >

Re: How can I tell when a client is connected and ready to go?

2015-07-17 Thread Vladimir Rodionov
>> - Under what conditions is the IOException catch actually reached? ConnectionFactory just instantiates and initiates connection implementation, I do not think it does anything beyond that (the only way to get cluster status is to connect to master, use HBaseAdmin instance, and wait until rpcTime

  1   2   3   >