Re: Row distribution

2012-07-24 Thread Adrien Mogenet
>From the web-interface, you can have such statistics when viewing the details of a table. You can also develop your own "balance viewer" through the HBase API (list of RS, regions, storeFiles, their size, etc.) On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia wrote: > Is there an easy way to tell

Row distribution

2012-07-24 Thread Mohit Anchlia
Is there an easy way to tell how my nodes are balanced and how the rows are distributed in the cluster?

Re: Enabling compression

2012-07-24 Thread Asaf Mesika
You also need to install Snappy - the Shared Object. I've done it using "yum install snappy" on Fedora Core. Sent from my iPad On 25 ביול 2012, at 04:40, Dhaval Shah wrote: Yes you need to add the snappy libraries to hbase path (i think the variable to set is called HBASE_LIBRARY_PATH) -

Re: Presplitting regions + Bulk import data into table

2012-07-24 Thread Ioakim Perros
Excuse me if I mis-expressed the problem, but this (what you propose) is what I do. The problem is that although the output of my job has as key an ImmutableBytesWritable object, the function that is being used in order to define the split points of the table, is the following: public void

Re: MR hbase export is failing

2012-07-24 Thread Ooh Rong
We had a similar issue on CDH3u4(hbase 0.90) and upgrading to CDH4(hbase 0.92) solved the problem for us. If it's an option, I recommend upgrading HBase. On Wed, Jul 25, 2012 at 4:04 AM, Paul Mackles wrote: > I've seen this when writing when exporting to s3 and assumed it was > related to write

Re: Presplitting regions + Bulk import data into table

2012-07-24 Thread Bryan Beaudreault
Change the output of your job (or whatever you are using to seed this reducer -- mapper, whatever), to output ImmutableBytesWritable as the key. Then wrap your bytes in the writable. Basically, Bytes.toBytes() only returns a raw byte[] object. You need an object that implements WritableComparabl

Re: Enabling compression

2012-07-24 Thread Dhaval Shah
Yes you need to add the snappy libraries to hbase path (i think the variable to set is called HBASE_LIBRARY_PATH) -- On Wed 25 Jul, 2012 3:46 AM IST Mohit Anchlia wrote: >On Tue, Jul 24, 2012 at 2:04 PM, Dhaval Shah >wrote: > >> I bet that your compression librarie

Re: Insert blocked

2012-07-24 Thread lars hofhansl
You share it for as many operation it makes sense to keep the HTable. In an AppServer it would be for the duration of an incoming request for example (i.e. each thread creates and destroys its own HTables as needed). The HTable is just then just used as a functional construct to execute RPCs and

Re: UnknownRowLockException

2012-07-24 Thread Jean-Daniel Cryans
On Tue, Jul 24, 2012 at 3:55 PM, Marco Gallotta wrote: > > I have thousands of columns of per row, so I doubt that's still true? No, since every value is stored along its key (row key + family + qualifier + timestamp + length of each). That's why compression is pretty much always recommended. J-

Re: UnknownRowLockException

2012-07-24 Thread Marco Gallotta
On Tuesday 24 July 2012 at 3:51 PM, Jean-Daniel Cryans wrote: > The space saving is a good question, are you really that short on hard > drives? The key is still the bigger part of your whole row by an order > of magnitude and this is usually where people try to shave a few bytes > off. > > I h

Re: UnknownRowLockException

2012-07-24 Thread Jean-Daniel Cryans
On Tue, Jul 24, 2012 at 3:44 PM, Marco Gallotta wrote: > Oh, oops, I misread increments documentation. I'm fine with that. The only > problem now then is that it only operates on longs, but I'm storing ints and > shorts to save space. Perhaps the space saving isn't worth the cost of not > using

Re: UnknownRowLockException

2012-07-24 Thread Marco Gallotta
Oh, oops, I misread increments documentation. I'm fine with that. The only problem now then is that it only operates on longs, but I'm storing ints and shorts to save space. Perhaps the space saving isn't worth the cost of not using this though. Marco -- Marco Gallotta | Mountain View, Calif

Re: UnknownRowLockException

2012-07-24 Thread Jean-Daniel Cryans
On Tue, Jul 24, 2012 at 3:29 PM, Marco Gallotta wrote: > I acquire the lock before the get. I only setup the get before acquiring the > lock. Yes, but let's say you want to increment "f:a" and "f:b" in the row "example". Your code requires 2 calls to increment() and would be seen as two differen

Re: UnknownRowLockException

2012-07-24 Thread Marco Gallotta
I acquire the lock before the get. I only setup the get before acquiring the lock. I'm also looking into using checkAndPut. Marco -- Marco Gallotta | Mountain View, California Software Engineer, Infrastructure | Loki Studios fb.me/marco.gallotta | twitter.com/marcog ma...@gallotta.co.za | +1

Re: Enabling compression

2012-07-24 Thread Mohit Anchlia
On Tue, Jul 24, 2012 at 2:04 PM, Dhaval Shah wrote: > I bet that your compression libraries are not available to HBase.. Run the > compression test utility and see if it can find LZO > > That seems to be the case for SNAPPY. However, I do have snappy installed and it works with hadoop just fine an

Re: UnknownRowLockException

2012-07-24 Thread Jean-Daniel Cryans
On Tue, Jul 24, 2012 at 3:08 PM, Marco Gallotta wrote: > HBase's increment method says "readers do not take row locks so get and scan > operations can see this operation partially completed", which could cause > problems. As far as I can tell your code suffers from the same issue eg if you need

Re: UnknownRowLockException

2012-07-24 Thread Marco Gallotta
I was unaware of locks not moving with regions/splits. Hmm…I just came across http://jerryjcw.blogspot.com/2009/10/hbase-notes-casual-remark-about-row.html which I'm going to try. HBase's increment method says "readers do not take row locks so get and scan operations can see this operation part

Re: UnknownRowLockException

2012-07-24 Thread Jean-Daniel Cryans
Row locks don't move with regions or splits... are such things happening frequently? Also, any reason to not use HBase's own increment method that's much more efficient? http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[], byte[], byte[], long)

UnknownRowLockException

2012-07-24 Thread Marco Gallotta
Hi there I am getting an UnknownRowLockException when adding locks to the increment() function below. The full stack trace is at the end of this message. There are a few other places I am acquiring locks, but I only ever acquire a single lock for one piece of code and the rest go through fine.

Re: cannot invoke coprocessor in trunk

2012-07-24 Thread Yin Huai
have a question regarding loading endpoint from hdfs. I used table descriptor to load my endpoint from hdfs and it seems that the region server successfully loaded the jar (i can see the name of my endpoint through "status 'detailed'"). However, when I tried to invoke the coprocessor (I used $HADO

Re: Enabling compression

2012-07-24 Thread Dhaval Shah
I bet that your compression libraries are not available to HBase.. Run the compression test utility and see if it can find LZO Regards, Dhaval - Original Message - From: Mohit Anchlia To: user@hbase.apache.org Cc: Sent: Tuesday, 24 July 2012 4:39 PM Subject: Re: Enabling compression

Re: Enabling compression

2012-07-24 Thread Mohit Anchlia
Thanks! I was trying it out and I see this message when I use COMPRESSION, but it works when I don't use it. Am I doing something wrong? hbase(main):012:0> create 't2', {NAME => 'f1', VERSIONS => 1, COMPRESSION => 'LZO'} ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 r

Re: Enabling compression

2012-07-24 Thread Jean-Daniel Cryans
On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari wrote: > Also, if I understand it correctly, this will enable the compression > for the new put but will not compresse the actual cells already stored > right? For that, we need to run a major compaction of the table which > will rewrite all the

Re: Enabling compression

2012-07-24 Thread Jean-Marc Spaggiari
Also, if I understand it correctly, this will enable the compression for the new put but will not compresse the actual cells already stored right? For that, we need to run a major compaction of the table which will rewrite all the cells and so compact them? I'm not 100% sure about that, so it's ha

Re: Enabling compression

2012-07-24 Thread Rob Roland
Yes. You'll need to disable the table, then alter it. disable 'my_table' alter 'my_table', {NAME => 'my_column_family', COMPRESSION => 'snappy'} enable 'my_table' You don't enable compression for the whole table - you enable it per column family. (At least this is the case on CDH3's HBase) On

Re: Enabling compression

2012-07-24 Thread Jean-Daniel Cryans
See http://hbase.apache.org/book.html#changing.compression J-D On Tue, Jul 24, 2012 at 1:28 PM, Mohit Anchlia wrote: > Is it possible to enable compression on the table on a already existing > table?

Re: Insert blocked

2012-07-24 Thread Mohit Anchlia
On Tue, Jul 24, 2012 at 12:55 PM, Elliott Clark wrote: > Thanks I hadn't seen that before > Do you mean in your code you close HTableInterface after each put/get/scan operations? > > On Mon, Jul 23, 2012 at 10:29 PM, lars hofhansl > wrote: > > > Or you can pre-create your HConnection and Threa

Re: Insert blocked

2012-07-24 Thread Elliott Clark
Thanks I hadn't seen that before On Mon, Jul 23, 2012 at 10:29 PM, lars hofhansl wrote: > Or you can pre-create your HConnection and Threadpool and use the HTable > constructor that takes these as arguments. > That is faster and less "byzantine" compared to the HTablePool "monster". > > Also see

Presplitting regions + Bulk import data into table

2012-07-24 Thread Ioakim Perros
Hi, I am bulk importing data through code and presplitting regions of a table - though I see all data to lead to the first server. The byte objects to compare with ( so to decide for each reducer' s output to which region it should go to ) are of the form : Bytes.toBytes(String.valueOf(#some

Re: Insert blocked

2012-07-24 Thread Mohit Anchlia
I removed the close call and it works. So it looks like close call should be called only at the end. But then how does the pool know that the object is available if it's not returned to the pool explicitly? On Tue, Jul 24, 2012 at 10:00 AM, Mohit Anchlia wrote: > > > On Tue, Jul 24, 2012 at 3:09

Re: Coprocessors vs MapReduce?

2012-07-24 Thread Andrew Purtell
On Tue, Jul 24, 2012 at 7:59 AM, Bertrand Dechoux wrote: > First, I thought coprocessors needed a restart but it seems a shell can be > used to add/remove them without requiring a restart. However, at the moment > the coprocessors are defined within jar and can not be dynamically created. > Could

Re: MR hbase export is failing

2012-07-24 Thread Paul Mackles
I've seen this when writing when exporting to s3 and assumed it was related to write performance. We set hbase.regionserver.lease.period to the same values as the task timeout and it helped reduce the # of failures though we still get occasional task timeouts. I haven't seen this when writing to lo

Re: MR hbase export is failing

2012-07-24 Thread Jimmy Xiang
It could be also caused by the MR takes too long to process a batch of data before coming back for another batch. Thanks, Jimmy On Tue, Jul 24, 2012 at 11:52 AM, Jeff Whiting wrote: > What would cause a scanner timeout exception? Is hdfs too slow? Do I just > increase the scanner timeout or is

MR hbase export is failing

2012-07-24 Thread Jeff Whiting
What would cause a scanner timeout exception? Is hdfs too slow? Do I just increase the scanner timeout or is there a better approach. Thanks, ~Jeff running: hadoop jar /usr/lib/hbase/hbase-0.90.1-CDH3B4.jar export -D dfs.replication=2 -D mapred.output.compress=true -D mapred.output.compress

Re: Coprocessors vs MapReduce?

2012-07-24 Thread Ted Yu
Bertrand: Your questions are quite common ones. Let me try clarifying a few. Andy or Gary should be able to give better answers. For #1, there is no support for coprocessor if your code is not compiled and built in a jar. Need to get bit more familiar with Cascading :-) For #2, can you give us so

Re: Insert blocked

2012-07-24 Thread Mohit Anchlia
On Tue, Jul 24, 2012 at 3:09 AM, Lyska Anton wrote: > Hi, > > after first insert you are closing your table in finally block. thats why > thread hangs > I thought I need to close HTableInterface to return it back to the pool. Is that not the case? > > 24.07.2012 3:41, Mohit Anchlia пишет: > >>

RES: Schema for sorted results

2012-07-24 Thread Cristofer Weber
Hi Hari, Using date as column qualifier is nice, but I experienced a drawback in a scenario where I left the window open: I kept a large range of dates per RowKey and the amount of rows per region became lower and lower as I started to split regions. You can manage this with TTL if you don't

Coprocessors vs MapReduce?

2012-07-24 Thread Bertrand Dechoux
Hello, I am learning about coprocessors and would like to know more about how to choose between coprocessors and MapReduce. First, I thought coprocessors needed a restart but it seems a shell can be used to add/remove them without requiring a restart. However, at the moment the coprocessors are d

Modify rowKey in prePut hook

2012-07-24 Thread Daniel Gorgan - SKIN
Hello, I'm trying to implement something like autoIncrement in hbase's coprocessors. If the rowKey I read in prePut is empty, I will generate a new one, be sure that it doesn't exists, and use that one. Also, the new key should return to client. I'm trying to do this using coprocessors, I know

Re: Schema for sorted results

2012-07-24 Thread Hari Prasanna
JM - I am searching for top N urls in date+category, so this rowkey does work well for the my purpose. Cristofer - I realize that having the raw date at the beginning of the rowkey makes all the writes in a day rush to the same region server. Maybe I could have the rowkey start with the category(wh

Re: Schema for sorted results

2012-07-24 Thread Jean-Marc Spaggiari
Hi Hari, Why do you think it's wasteful? Let's imagine this situation. Key=||| Value = nothing. And this one: Key= Value = || Both situation will, at the end, represent almost the same size in the database. You can also do somthing like that: Key= ColumnFamillyName= Value=| Just that the firs

RES: Schema for sorted results

2012-07-24 Thread Cristofer Weber
Hello Hari! Just for the sake of maintaining sorted results, that's it. You have to keep it in lexicographic order. An alternative, for example, could be maintain date|category as RowKey and store your N URLs as members of a Column Family, where padded_visits could be the Column Qualifier and

Re: Schema for sorted results

2012-07-24 Thread Minh Duc Nguyen
Hari, According to the HBase book: http://hbase.apache.org/book.html#dm.sort All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). ~ Mi

Schema for sorted results

2012-07-24 Thread Hari Prasanna
Hello - I'm using HBase for web server log processing and I'm trying to save the top N urls per category per day in a sorted manner in HBase. From what I've read, the only sortable structure that HBase offers is the lexicographic sort in the row keys. So, here is the rowkey format I'm currently us

Modify rowKey in prePut hook

2012-07-24 Thread Daniel Gorgan - SKIN
Hello, I'm trying to implement something like autoIncrement in hbase's coprocessors. If the rowKey I read in prePut is empty, I will generate a new one, be sure that it doesn't exists, and use that one. Also, the new key should return to client. I'm trying to do this using coprocessors, I know

Master down log.

2012-07-24 Thread Jean-Marc Spaggiari
Hi, My cluster got some troubles last night and at the end, all the servers went down. Hadoop is still running, but HBase is not. I have no clue what the root cause is. I looked at the logs on the master side, and the fist line when it started to go down was: 2012-07-24 01:20:13,227 INFO org.apac

Re: Insert blocked

2012-07-24 Thread Lyska Anton
Hi, after first insert you are closing your table in finally block. thats why thread hangs 24.07.2012 3:41, Mohit Anchlia пишет: I am now using HTablePool but still the call hangs at "put". My code is something like this: hTablePool = *new* HTablePool(config,*MAX_POOL_SIZE*); result = *new*