Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-28 Thread N Keywal
> > > Totally randoms (even on keys that do not exist). > It worth checking if it matches your real use cases. I expect that read by row key are most of the time on existing rows (as a traditional db relationship or a UI or workflow driven stuff), even if I'm sure it's possible to have something t

Re: client cache for all region server information?

2012-08-28 Thread Lin Ma
Thanks for the detailed reply, Harsh. Some further comments / thoughts, 1. For Scan function used in mapper/reducer, supposing we are using 500 size configuration, I am not sure whether the returned 500 items in one batch call must from one region server? Or it could from multiple region servers

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread Jerry Lam
Hi Lars: Thanks for the reply. I need to understand if I misunderstood the perceived inefficiency because it seems you don't think quite the same. Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a table and each column has 1000 versions. Using the following code (the cod

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-28 Thread Stack
On Mon, Aug 27, 2012 at 8:30 PM, anil gupta wrote: > Hi All, > > Here are the steps i followed to load the table with HFilev1 format: > 1. Set the property hfile.format.version to 1. > 2. Updated the conf across the cluster. > 3. Restarted the cluster. > 4. Ran the bulk loader. > > Table has 34 mi

Re: Hadoop or HBase

2012-08-28 Thread Marcos Ortiz
Regards to all the list. Well, you should ask to the Tumblr´s fellows that they use a combination of MySQL and HBase for its blogging platform. They talked about this topic in the last HBaseCon. Here is the link: http://www.hbasecon.com/sessions/growing-your-inbox-hbase-at-tumblr/ Blake Mathen

bulk loading problem

2012-08-28 Thread Oleg Ruchovets
Hi , I am on process to write my first bulk loading job. I use Cloudera CDH3U3 with hbase 0.90.4 Executing a job I see HFiles which created after job finished but there were no entries in hbase. hbase shell >> count 'uu_bulk' return 0. Here is my job configuration: Configuration

Re: MemStore and prefix encoding

2012-08-28 Thread Joe Pallas
On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote: > Each column family is its own store. All stores are flushed together, so have > many add overhead (especially if a few tend to hold a lot of data, but the > others don't, leading to very many small store files that need to be > compacted). I

Re: bulk loading problem

2012-08-28 Thread Igal Shilman
Hi, You need to complete the bulk load. Check out http://hbase.apache.org/book/arch.bulk.load.html 9.8.2 Igal. On Tue, Aug 28, 2012 at 7:29 PM, Oleg Ruchovets wrote: > Hi , >I am on process to write my first bulk loading job. I use Cloudera > CDH3U3 with hbase 0.90.4 > > Executing a job I se

Re: MemStore and prefix encoding

2012-08-28 Thread Stack
On Tue, Aug 28, 2012 at 9:59 AM, Joe Pallas wrote: > > On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote: > >> Each column family is its own store. All stores are flushed together, so >> have many add overhead (especially if a few tend to hold a lot of data, but >> the others don't, leading to ve

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread lars hofhansl
What I was saying was: It depends. :) First off, how do you get to 1000 versions? In 0.94++ older version are pruned upon flush, so you need 333 flushes (assuming 3 versions on the CF) to get 1000 versions. By that time some compactions will have happened and you're back to close to 3 versions

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-28 Thread Matt Corgan
Could it be the addition of the memstoreTS? i forget if that is in v1 as well. Matt On Tue, Aug 28, 2012 at 7:37 AM, Stack wrote: > On Mon, Aug 27, 2012 at 8:30 PM, anil gupta wrote: > > Hi All, > > > > Here are the steps i followed to load the table with HFilev1 format: > > 1. Set the proper

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-28 Thread lars hofhansl
Are we terribly concerned about 3.5% of extra disk usage? HFileV2 was designed to be more main memory efficient, which is in much shorter supply than disk space (bloom filters and index blocks are interspersed with data blocks and loaded when needed, etc) The stored MemstoreTS was introduced in

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-28 Thread Stack
On Tue, Aug 28, 2012 at 11:42 AM, lars hofhansl wrote: > Are we terribly concerned about 3.5% of extra disk usage? > HFileV2 was designed to be more main memory efficient, which is in much > shorter supply than disk space (bloom filters and index blocks are > interspersed with data blocks and lo

Re: Disk space usage of HFilev1 vs HFilev2

2012-08-28 Thread lars hofhansl
I think the memstoreTS is stored with each KV (until it can be proven to be not needed - because no older open scanners, in which case it is not written during the next compaction and assumed 0) "Mild passing interest" :)  Yep. From: Stack To: user@hbase.apac

Re: Thrift2 interface

2012-08-28 Thread Joe Pallas
Thanks for the info, Karthik (and sorry that I didn’t see it for so long, it got auto-filed). I think the reasoning behind the native client approach makes sense. I don’t know how much of the extra hop overhead is network and how much is serialization/deserialization, so for now I have been ho

Re: MemStore and prefix encoding

2012-08-28 Thread Enis Söztutar
I would still caution relying on the sorting order between values of the same cf, qualifier and timestamp. If for example, there is a Delete, it will eclipse subsequent Puts given the same timestamp, even though Put happened after Delete. Enis On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown wrote: >

Re: bulk loading problem

2012-08-28 Thread Oleg Ruchovets
Hi Igal , thank you for the quick response . Can I execute this step programmatically? >From link you sent : 9.8.5. Advanced Usage Although the importtsv tool is useful in many cases, advanced users may want to generate data programatically, or import data from other formats. To get started

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread Jerry Lam
Hi Lars: I see. Please refer to the inline comment below. Best Regards, Jerry On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl wrote: > What I was saying was: It depends. :) > > First off, how do you get to 1000 versions? In 0.94++ older version are > pruned upon flush, so you need 333 flushes

Re: bulk loading problem

2012-08-28 Thread Igal Shilman
As suggested by the book, take a look at: org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles class, This tool expects two arguments: (1) the path to the generated HFiles (in your case it's outputPath) (2) the target table. To use it programatically, you can either invoke it via the ToolRunner

sorting by value

2012-08-28 Thread Pamecha, Abhishek
Hi I probably know the usual answer but are there any tricks to do some sort of sort by value in HBase. The only option I know is to somehow embed value in the key part. The value is not a timestamp but a normal number. I want to find out, say, top 10 from a range of columns. The range could be

Re: Timeseries data

2012-08-28 Thread Marcos Ortiz
Study the OpenTSDB at StumbleUpon described by Benoit "tsuna" Sigoure (ts...@stumbleupon.com) in the HBaseCon talk called "Lessons Learned from OpenTSDB". His team have done a great job working with Time-series data, and he gave a lot of great advices to work with this kind of data with HBase:

Re: Timeseries data

2012-08-28 Thread Mohit Anchlia
How does it deal with multiple writes in the same milliseconds for the same rowkey/column? I can't see that info. On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz wrote: > Study the OpenTSDB at StumbleUpon described by Benoit "tsuna" Sigoure ( > ts...@stumbleupon.com) in the > HBaseCon talk called

Re: PIG with HBase Coprocessors

2012-08-28 Thread 某因幡
Ping? 2012/8/28 某因幡 : > Thanks for your quick reply. > The co-processor looks like: > public void postGet(final ObserverContext e, > final Get get, final List results) > { > if table is X > get some columns from table Y > add these columns to results > } > And s

Re: Timeseries data

2012-08-28 Thread Amandeep Khurana
Can you give an example of what you are trying to do and how you would use both the writes coming in at the same instant for the same cell and why do you say that the nanosecond approach is tricky? On Aug 28, 2012, at 5:54 PM, Mohit Anchlia wrote: > How does it deal with multiple writes in the s

Re: PIG with HBase Coprocessors

2012-08-28 Thread yuzhihong
Did you find some clue from region server logs ? You can pastebin snippet and show the link here. On Aug 28, 2012, at 6:54 PM, 某因幡 wrote: > Ping? > > 2012/8/28 某因幡 : >> Thanks for your quick reply. >> The co-processor looks like: >> public void postGet(final ObserverContext e, >>

RE: Inconsistent scan performance with caching set to 1

2012-08-28 Thread Ramkrishna.S.Vasudevan
Hi Jay Am not pretty much clear on exactly what is the problem because I am not able to find much difference. How you are checking the time taken? When there are multiple scanner going parallely then there is a chance for the client to be a bottle neck as it may not be able to handle so many req

Re: setTimeRange and setMaxVersions seem to be inefficient

2012-08-28 Thread lars hofhansl
Hi Jerry, my answer will be the same again: Some folks will want the max versions set by the client to be before filters and some folks will want it to restrict the end result. It's not possible to have it both ways. Your filter needs to do the right thing. There's a lot of discussion around th

the difference between hbase-0.94.1 and hbase-0.94.1-security

2012-08-28 Thread Everist
Hi The difference between hbase-0.94.1 and hbase-0.94.1-security. Regards

Re: the difference between hbase-0.94.1 and hbase-0.94.1-security

2012-08-28 Thread shixing
0.94.1-security has the Security and AccessController features if you configure these features. So the HBase Administrator can manage the table permissions(read/write/admin) like mysql. On Wed, Aug 29, 2012 at 11:35 AM, Everist wrote: > Hi > > > > The difference between hbase-0.94.1 and hbase-