Re: Slow full-table scans

2012-08-12 Thread Stack
On Sun, Aug 12, 2012 at 7:04 AM, Gurjeet Singh wrote: > Am I missing something ? Is there a way to optimize this ? > You've checked out the perf section of the refguide? http://hbase.apache.org/book.html#performance And have you read the postings by the GBIF lads starting with this one: http:/

Re: column based or row based storage for HBase?

2012-08-12 Thread Lin Ma
Hi Jason, This is very good reference. I read it from begin to the end and learned a lot. Thanks and have a good weekend. regards, Lin On Tue, Aug 7, 2012 at 2:00 AM, Jason Frantz wrote: > Lin, > > Looks like your questions may already be answered, but you might find the > following link compa

Re: is there anyway to turn off compaction in hbase

2012-08-12 Thread Harsh J
Richard, The property disables major compactions from happening automatically. However, if you choose to do this, you should ensure you have a cron job that does trigger major_compact on all tables - for compaction is a necessary thing, but you just do not want it to happen at any time it likes to

Re: Printing integers in the Hbase shell

2012-08-12 Thread David Koch
Hi Anil, Thank you for your advice. On Sat, Aug 11, 2012 at 10:12 PM, anil gupta wrote: > Hi David, > > As i understand that you want to print the Integer values as Strings in > HBase shell. There are two ways to do it: > 1. You can write a ruby script to interpret the value as bytes. This migh

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Thanks for the reply Stack. My comments are inline. > You've checked out the perf section of the refguide? > > http://hbase.apache.org/book.html#performance Yes. HBase has 8GB RAM both on my cluster as well as my dev machine. Both configurations are backed by SSDs and Hbase options are set to HB

Secondary indexes suggestions

2012-08-12 Thread Lukáš Drbal
Hi all, iam new user of Hbase and i need help with secondary indexes. For example i have messages and users. Each user has many messages. Data structure will be like this: Message: - String id - Long sender_id - Long recipient_id - String text - Timestamp created_at [...] User: - Long id - Stri

Re: Slow full-table scans

2012-08-12 Thread Ted Yu
Gurjeet: Can you tell us which HBase version you are using ? Thanks On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh wrote: > Thanks for the reply Stack. My comments are inline. > > > You've checked out the perf section of the refguide? > > > > http://hbase.apache.org/book.html#performance > > Ye

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Hi Ted, Yes, I am using the cloudera distribution 3. Gurjeet Sent from my iPad On Aug 12, 2012, at 7:11 AM, Ted Yu wrote: > Gurjeet: > Can you tell us which HBase version you are using ? > > Thanks > > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh wrote: > >> Thanks for the reply Stack.

Re: an emtry region

2012-08-12 Thread Harsh J
You're right there - compactions does not merge region boundaries. They just merge the accumulated (flushed, etc.) storefiles belonging to each region, for every region thats fragmented over time. On Fri, Aug 10, 2012 at 2:47 PM, J Mohamed Zahoor wrote: > No. I just now learnt that compactions do

Re: HBase won't run on OSX 10.8

2012-08-12 Thread Harsh J
Bryan, I believe running with "-Djava.net.preferIPv4Stack=true" should work just fine. On Thu, Aug 9, 2012 at 1:17 AM, Bryan Beaudreault wrote: > Did this fix end up working? I'm hesitant to upgrade to 10.8 if I'm going > to run into this issue. I'm running the CDH3 jars locally to mirror my >

Secondary indexes suggestions

2012-08-12 Thread Lukáš Drbal
Hi all, iam new user of Hbase and i need help with secondary indexes. For example i have messages and users. Each user has many messages. Data structure will be like this: Message: - String id - Long sender_id - Long recipient_id - String text - Timestamp created_at [...] User: - Long id - Stri

Re: Slow full-table scans

2012-08-12 Thread Jacques
Something to consider is that HBase stores and retrieves the row key (8 bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every single value. The schemaless nature of HBase generally means that this data has to be stored for each row (certain kinds of newer block level compressi

Re: Extremely long flush times

2012-08-12 Thread lars hofhansl
I filed HBASE-6561 for this (Jira is back). - Original Message - From: lars hofhansl To: "d...@hbase.apache.org" ; "user@hbase.apache.org" Cc: Sent: Saturday, August 11, 2012 12:42 AM Subject: Re: Extremely long flush times A possible solution is to have the MemStoreScanner reseek e

Re: Slow full-table scans

2012-08-12 Thread lars hofhansl
Do you really have to retrieve all 200.000 each time? Scan.setBatch(...) makes no difference?! (note that batching is different and separate from caching). Also note that the scanner contract is to return sorted KVs, so a single scan cannot be parallelized across RegionServers (well not entirely

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Hi Jacques, I did consider that. So, this increases the on-disk size of my data by 3-4x (=600-800MB). That still does not explain why reading 1row (=~4MB with overhead) takes 5sec. About serialization/deserialization on the client side - it happens on a different thread out of a buffer and most of

Re: Slow full-table scans

2012-08-12 Thread Mohammad Tariq
Hello experts, Would it be feasible to create a separate thread for each region??I mean we can determine start and end key of each region and issue a scan for each region in parallel. Regards, Mohammad Tariq On Mon, Aug 13, 2012 at 3:54 AM, lars hofhansl wrote: > Do you really hav

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Hi Lars, Yes, I need to retrieve all the values for a row at a time. That said, I did experiment with different batch sizes and that made no difference whatsoever. (caching on the other hand did make some difference ~2-3% faster for larger cache) I see your point about scanners returning sorted K

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Hi Mohammad, This is a great idea. Is there a API call to determine the start/end key for each region ? Thanks, Gurjeet On Sun, Aug 12, 2012 at 3:49 PM, Mohammad Tariq wrote: > Hello experts, > >Would it be feasible to create a separate thread for each region??I > mean we can determine

Re: Slow full-table scans

2012-08-12 Thread lars hofhansl
You can use HTable.{getStartEndKeys|getEndKeys|getStartKeys} to get the current region demarcations for your table. If you wanted to group threads by RegionServer (which you should) you get that information via HTable.getRegionLocation{s} -- Lars - Original Message - From: Gurjeet Sin

Re: Slow full-table scans

2012-08-12 Thread Mohammad Tariq
Methods getStartKey and getEndKey provided by HRegionInfo class can used for that purpose. Also, please make sure, any HTable instance is not left opened once you are are done with reads. Regards, Mohammad Tariq On Mon, Aug 13, 2012 at 4:22 AM, Gurjeet Singh wrote: > Hi Mohammad, > > This

Re: Slow full-table scans

2012-08-12 Thread Jacques
HTable.getRegionLocations() I didn't realize the KeyValue serializations/deserialization happened on a separate thread in the hbase client code. J On Sun, Aug 12, 2012 at 3:52 PM, Gurjeet Singh wrote: > Hi Mohammad, > > This is a great idea. Is there a API call to determine the start/end > k

Re: Slow full-table scans

2012-08-12 Thread Jacques
I think the first question is where is the time spent. Does your analysis show that all the time spent is on the regionservers or is a portion of the bottleneck on the client side? Jacques On Sun, Aug 12, 2012 at 4:00 PM, Mohammad Tariq wrote: > Methods getStartKey and getEndKey provided by

Re: Slow full-table scans

2012-08-12 Thread Mohammad Tariq
Also, give it a shot using HTablePools and see if it makes any significant difference. Regards, Mohammad Tariq On Mon, Aug 13, 2012 at 4:43 AM, Jacques wrote: > I think the first question is where is the time spent. Does your analysis > show that all the time spent is on the regionserve

Coprocessor tests under busy insertions

2012-08-12 Thread Henry JunYoung KIM
Hi, hbase users. now, I am testing coprocessors to create secondary indexes in background. coprocessors itself is packaged in base 0.92.1 I am using. the scenario I want to describe is this one. the main table is 'blog' which is having a field named 'userId'. from this field I want to create

RE: Coprocessor tests under busy insertions

2012-08-12 Thread Anoop Sam John
Can u paste your CP implementation here [prePut/ postPut?] Are u doing check for the table in CP hook? U need to only handle the hooks while it is being called for your table. Remember that your index table also have these same hooks. -Anoop- From: Henry

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
It seems like the client code just sits idle, waiting for data from the regionservers. Gurjeet On Sun, Aug 12, 2012 at 4:13 PM, Jacques wrote: > I think the first question is where is the time spent. Does your analysis > show that all the time spent is on the regionservers or is a portion of th

HBase write performance issue

2012-08-12 Thread 某因幡
Hi, I'm new to HBase. I'm working with Hadoop-1.0.3 and HBase 0.92.1. I have 8 data nodes which also work as region servers. And I'm trying to import my data into HBase. I wrote two programs, one is using HBase client API(auto flush is off, WAL is on, multi-threaded) and the other is using HBase MR

Re: Slow full-table scans

2012-08-12 Thread Gurjeet Singh
Thanks Lars! One final question : is it advisable to issue multiple threads against a single HTable instance, like so: HTable table = ... for (i = 0; i < 10; i++) { new ScanThread(table, startRow, endRow, rowProcessor).start(); } class ScanThread implements Runnable { public void run

Re: Coprocessor tests under busy insertions

2012-08-12 Thread Henry JunYoung KIM
hi, Anoop. this is my implementation using Coprocessors RegionObserver. … @Override public void prePut(ObserverContext e, Put put, WALEdit edit, boolean writeToWAL) throws IOException { String tableName = e.getEnvironment().getRegion().getRegionInfo().getTableNameAsStrin

Re: HBase write performance issue

2012-08-12 Thread J Mohamed Zahoor
Hi Can you see if this helps.. http://hbase.apache.org/book/performance.html ./zahoor On Mon, Aug 13, 2012 at 10:28 AM, 某因幡 wrote: > Hi, I'm new to HBase. > I'm working with Hadoop-1.0.3 and HBase 0.92.1. > I have 8 data nodes which also work as region servers. > And I'm trying to import my d

Re: Hbase- Hadoop DFS size not decreased even after deleting a column family

2012-08-12 Thread J Mohamed Zahoor
HBASE-6564. I will try to take a stab on it this weekend. ./zahoor On Fri, Aug 10, 2012 at 12:47 PM, J Mohamed Zahoor wrote: > Hi Lars, > > Will file it... > > ./Zahoor > > > On Fri, Aug 10, 2012 at 12:00 AM, lars hofhansl wrote: > >> Hi zahoor, >> >> could you file a jira with what you found?