Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-27 Thread Adrien Mogenet
On Fri, Aug 24, 2012 at 6:52 PM, N Keywal wrote: > Hi Adrien, > >> What do you think about that hypothesis ? > > Yes, there is something fishy to look at here. Difficult to say > without more logs as well. > Are your gets totally random, or are you doing gets on rows that do > exist? That would e

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-27 Thread lars hofhansl
First off regarding "inefficiency"... If version counting would happen first and then filter were executed we'd have folks "complaining" about inefficiencies as well: ("Why does the code have to go through the versioning stuff when my filter filters the row/column/version anyway?")  ;-) For yo

Re: PIG with HBase Coprocessors

2012-08-27 Thread 某因幡
Thanks for your quick reply. The co-processor looks like: public void postGet(final ObserverContext e, final Get get, final List results) { if table is X get some columns from table Y add these columns to results } And similar for postScannerNext(). This works in

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
Thanks Harsh, A two more comments / thoughts, 1. For mapper: mapper normally runs on the same regional server which owns the row-key range for the mapper input because of locality reasons (I am not 100% confident whether it is always true mapper always runs on the same region server, please feel

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-27 Thread anil gupta
Hi All, Here are the steps i followed to load the table with HFilev1 format: 1. Set the property hfile.format.version to 1. 2. Updated the conf across the cluster. 3. Restarted the cluster. 4. Ran the bulk loader. Table has 34 million records and one column family. Results: HDFS space for one rep

Re: Retrieving 2 separate timestamps' values

2012-08-27 Thread Ioakim Perros
Unfortunately the way I am reading/writing data from/to parts of my table would be incompatible with this solution. In any case, thank you very much for your time. On Aug 28, 2012, at 4:10, Mohit Anchlia wrote: > Have you thought of making your row key as key+timestamp? And then you can > do

Re: Retrieving 2 separate timestamps' values

2012-08-27 Thread Mohit Anchlia
Have you thought of making your row key as key+timestamp? And then you can do scan on the columns itself? On Mon, Aug 27, 2012 at 5:53 PM, Ioakim Perros wrote: > Of course, thank you for responding. > > I have an iterative procedure where I get and put data from/to an HBase > table, and I am set

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-27 Thread Jerry Lam
Hi Lars: Thanks for confirming the inefficiency of the implementation for this case. For my case, a column can have more than 10K versions, I need a quick way to stop the scan from digging the column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can noti

Re: Retrieving 2 separate timestamps' values

2012-08-27 Thread Ioakim Perros
Of course, thank you for responding. I have an iterative procedure where I get and put data from/to an HBase table, and I am setting at each Put the timestamp equal to each iteration's number, as it is efficient to check for convergence in this way (by just retrieving the 2 last versions of my

Re: Retrieving 2 separate timestamps' values

2012-08-27 Thread Mohit Anchlia
You timestamp as in version? Can you describe your scenario with more concrete example? On Mon, Aug 27, 2012 at 5:01 PM, Ioakim Perros wrote: > Hi, > > Is there any way of retrieving two values with totally different > timestamps from a table? > > I am using timestamps as iteration counts, and I

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-27 Thread lars hofhansl
Currently filters are evaluated before we do version counting. Here's a comment from ScanQueryMatcher.java:     /** * Filters should be checked before checking column trackers. If we do * otherwise, as was previously being done, ColumnTracker may increment its * counter for even tha

Retrieving 2 separate timestamps' values

2012-08-27 Thread Ioakim Perros
Hi, Is there any way of retrieving two values with totally different timestamps from a table? I am using timestamps as iteration counts, and I would like to be able to get at each iteration (besides the previous iteration results from table) some pre-computed amounts I save at some columns w

Re: MemStore and prefix encoding

2012-08-27 Thread lars hofhansl
Also confirmed via experiment (in the memstore, store files, mixed store files, mixed store files and memstore). -- Lars - Original Message - From: Lars H To: user@hbase.apache.org Cc: Sent: Monday, August 27, 2012 3:52 PM Subject: Re: MemStore and prefix encoding Oops. The KVs are

Re: MemStore and prefix encoding

2012-08-27 Thread Lars H
Oops. The KVs are sorties in reverse chronological order. So I was wrong. It'll return newest version. Sorry about that confusion. The book is correct. -- Lars Tom Brown schrieb: >Lars, > >I have been relying on the expected behavior (if I write another cell >with the same {key, family, qual

Re: Column Value Reference Timestamp Filter

2012-08-27 Thread Jerry Lam
Hi Alex: We decided to use setTimeRange and setMaxVersions, and remove the column with a reference timestamp (i.e. we don't put this column into hbase anymore). This behavior is what we would like but it seems very inefficient because all versions are processed before the setMaxVersions takes effe

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-27 Thread Kevin O'dell
Anil, Please let us know how well this works. On Mon, Aug 27, 2012 at 4:19 PM, anil gupta wrote: > Hi Guys, > > I was digging through the hbase-default.xml file and i found this property > relates HFile handling: > > > hfile.format.version > 2 > > The HFile

Re: MemStore and prefix encoding

2012-08-27 Thread Stack
On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown wrote: > Lars, > > I have been relying on the expected behavior (if I write another cell > with the same {key, family, qualifier, version} it won't return the > previous one) so you're answer was confusing to me. I did more > research and I found that the

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-27 Thread anil gupta
Hi Guys, I was digging through the hbase-default.xml file and i found this property relates HFile handling: hfile.format.version 2 The HFile format version to use for new files. Set this to 1 to test backwards-compatibility. The default value of this op

Re: Pig, HBaseStorage, HBase, JRuby and Sinatra

2012-08-27 Thread Stack
On Mon, Aug 27, 2012 at 11:04 AM, Doug Meil wrote: > > I think somewhere in here in the RefGuide would workŠ > > http://hbase.apache.org/book.html#other.info.sites > > That looks good. We don't have a pig section in the refguide? You up for adding a paragraph Russell? Could link to your blog i

Re: Pig, HBaseStorage, HBase, JRuby and Sinatra

2012-08-27 Thread Doug Meil
I think somewhere in here in the RefGuide would workŠ http://hbase.apache.org/book.html#other.info.sites On 8/27/12 1:20 PM, "Stack" wrote: >On Mon, Aug 27, 2012 at 6:32 AM, Russell Jurney > wrote: >> I wrote a tutorial around HBase, JRuby and Pig that I thought would be >>of >> interest

Re: Hbase Bulk Load Java Sample Code

2012-08-27 Thread Doug Meil
Hi there, in addition there is a fair amount of documentation about bulk loads and importtsv in the Hbase RefGuide. http://hbase.apache.org/book.html#importtsv On 8/27/12 9:34 AM, "Ioakim Perros" wrote: >On 08/27/2012 04:18 PM, o brbrs wrote: >> Hi, >> >> I'm new at hase and i want to make

Re: Pig, HBaseStorage, HBase, JRuby and Sinatra

2012-08-27 Thread Stack
On Mon, Aug 27, 2012 at 10:31 AM, Russell Jurney wrote: > Yes, and if possible the HBase and JRuby page needs to be updated. If you > can grant me wiki access, I can edit it myself. > > http://wiki.apache.org/hadoop/Hbase/JRuby > I added access for a login of RussellJurney (Sorry. I believe thi

Re: Pig, HBaseStorage, HBase, JRuby and Sinatra

2012-08-27 Thread Russell Jurney
Yes, and if possible the HBase and JRuby page needs to be updated. If you can grant me wiki access, I can edit it myself. http://wiki.apache.org/hadoop/Hbase/JRuby On Mon, Aug 27, 2012 at 10:20 AM, Stack wrote: > On Mon, Aug 27, 2012 at 6:32 AM, Russell Jurney > wrote: > > I wrote a tutorial

Re: Pig, HBaseStorage, HBase, JRuby and Sinatra

2012-08-27 Thread Stack
On Mon, Aug 27, 2012 at 6:32 AM, Russell Jurney wrote: > I wrote a tutorial around HBase, JRuby and Pig that I thought would be of > interest to the HBase users list: > http://hortonworks.com/blog/pig-as-hadoop-connector-part-two-hbase-jruby-and-sinatra/ > Thanks Russell. Should we add a link in

Re: MemStore and prefix encoding

2012-08-27 Thread Tom Brown
Lars, I have been relying on the expected behavior (if I write another cell with the same {key, family, qualifier, version} it won't return the previous one) so you're answer was confusing to me. I did more research and I found that the HBase guide specifies that behavior (see section 5.8.1 of htt

Re: client cache for all region server information?

2012-08-27 Thread Harsh J
Not necessarily consecutive, unless the request itself is so. It only returns 500 rows that match the user's request. User's request of a specific row-range and filters are usually embedded into the Scan object, sent to the RS. Whatever is accumulated as the result of the Scan operation (server-si

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
Hi Harsh, I read through the document you referred, for the below comment, I am confused. Major confusion is, does it mean HBase will transfer consecutive 500 rows to client (supposing client mapper want row with row-key 100, Hbase will return row-key from 100 to 600 at one time to client, similar

fast way to do random getRowOrAfter reads

2012-08-27 Thread Ferdy Galema
I want to do a lot of random reads, but I need to get the first row after the requested key. I know I can make a scanner every time (with a specified startrow) and close it after a single result is fetched, but this seems like a lot overhead. Something like HTable's getRowOrBefore method, but then

Re: Hbase Bulk Load Java Sample Code

2012-08-27 Thread Ioakim Perros
On 08/27/2012 04:18 PM, o brbrs wrote: Hi, I'm new at hase and i want to make bulk load from hdfs to hbase with java. Is there any sample code which includes importtsv and completebulkload libraries on java? Thanks. Hi, Here is a sample configuration of a bulk loading job consisting only of

Re: PIG with HBase Coprocessors

2012-08-27 Thread Ted Yu
Allow me to refer to previous discussion: http://mail-archives.apache.org/mod_mbox/hbase-user/201203.mbox/%3CCABsY1jQ8+OiLh7SYkXZ8iO=nosy8khz7iys+6w4u6sxcpj5...@mail.gmail.com%3E If the above doesn't answer your question, please give us more details about the versions of HBase and PIG you're using

PIG with HBase Coprocessors

2012-08-27 Thread 某因幡
Hi, I've created a co-processor which will insert more columns in the result by overriding preGet and postScannerNext. And it is registered as a system co-processor, works with hbase shell, both get and scan. When I tried to access those columns in PIG, it simply returns null for the column. Is the