Re: [ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-12 Thread Jesse Yates
Cool stuff, guys! Looking forward to reading through the code. On Monday, August 12, 2013, Nicolas Liochon wrote: > Well done, Rajesh! > > > On Tue, Aug 13, 2013 at 8:44 AM, Anoop John > > > wrote: > > > Good to see this Rajesh. Thanks a lot to Huawei HBase team! > > > > -Anoop- > > > > On Tue,

Re: Possiblities of importing into hbase table

2013-08-12 Thread Shengjie Min
Have you ruled out Sqoop as well:) Shengjie On 13 August 2013 14:46, manish dunani wrote: > I generally seen we are manually "put" the data into hbase as well as in > hbase java client we can do all the same things like "put","get","scan". > My Question is how to import the data into hbase ta

Re: [ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-12 Thread Nicolas Liochon
Well done, Rajesh! On Tue, Aug 13, 2013 at 8:44 AM, Anoop John wrote: > Good to see this Rajesh. Thanks a lot to Huawei HBase team! > > -Anoop- > > On Tue, Aug 13, 2013 at 11:49 AM, rajeshbabu chintaguntla < > rajeshbabu.chintagun...@huawei.com> wrote: > > > Hi, > > > > We have been working on

Possiblities of importing into hbase table

2013-08-12 Thread manish dunani
I generally seen we are manually "put" the data into hbase as well as in hbase java client we can do all the same things like "put","get","scan". My Question is how to import the data into hbase table using java?? If yes then can u show me?how can i do this..? (Note:Not using hbase and map reduc

Re: [ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-12 Thread Anoop John
Good to see this Rajesh. Thanks a lot to Huawei HBase team! -Anoop- On Tue, Aug 13, 2013 at 11:49 AM, rajeshbabu chintaguntla < rajeshbabu.chintagun...@huawei.com> wrote: > Hi, > > We have been working on implementing secondary index in HBase, and had > shared an overview of our design in the 20

[ANNOUNCE] Secondary Index in HBase - from Huawei

2013-08-12 Thread rajeshbabu chintaguntla
Hi, We have been working on implementing secondary index in HBase, and had shared an overview of our design in the 2012 Hadoop Technical Conference at Beijing(http://bit.ly/hbtc12-hindex). We are pleased to open source it today. The project is available on github. https://github.com/Huawei-H

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread Elliott Clark
On Mon, Aug 12, 2013 at 9:58 PM, lars hofhansl wrote: > For example we could add an RPC to the regionserver and have the regionserver > who would own the region copy the appropriate part of the file (then the data > would be local). Or even simpler, instead of actually copying the files we > co

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread lars hofhansl
Now that I wrote this, I think we should improve that. For example we could add an RPC to the regionserver and have the regionserver who would own the region copy the appropriate part of the file (then the data would be local). Or even simpler, instead of actually copying the files we could just

Re: PrefixFilter

2013-08-12 Thread lars hofhansl
What Anil said. Filters are executed per Store (i.e. per region per column family). So each filter in each store would need seek to the start row. It is more efficient to let the scanner do that ahead of time by setting the startrow to the prefix. We should document that if we haven't. -- Lars

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread lars hofhansl
A write in HDFS (by default) places one copy on the local datanode, another one on a node in a different rack (when applicable), and a third one on a node in the same rack. HBase gets data locality by being co-located with the data nodes, so after a compaction all blocks of the compacted HFile(s

Re: PrefixFilter

2013-08-12 Thread anil gupta
Hi Sudarshan, While using the prefix filter, you also have to set the startRow() and stopRow for the behavior that you are expecting. This kind of discussion have been done previously on mailing list, yet no changes have been done to behavior of PrefixFilter. Setting the startRow(Prefix3) will mak

Re: PrefixFilter

2013-08-12 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
I'm willing to be told I'm completely wrong here, but it seems like the prefix filter should be capable of using the same mechanism used in a row-key lookup or a scan with a start and stop row. If HBase were to be like a hash table with no notion of sorted-ness, I can understand a partial-key l

Re: PrefixFilter

2013-08-12 Thread Ted Yu
Adding back user@ bq. does it jump directly to Prefix3 I don't think so. Are your prefixes of fixed length ? If so, take a look at FuzzyRowFilter. Cheers On Mon, Aug 12, 2013 at 11:33 AM, Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) < skada...@bloomberg.net> wrote: > Ted: Thanks for looking that

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread Kiru Pakkirisamy
James, We actually planned to use Phoenix for this project. But we did not have much time to design on top of Phoenix.  Also, this app is more like a 'search' app and I wanted it to be doing just "key lookups". There is no write and everything is in block cache. Still, yes, let me take a look at

Re: Hbase update use case

2013-08-12 Thread Asaf Mesika
If you can mark a row by adding a column qualifier which will be used as your flag by its existence, and its name will be lexicographically first, then it won't be slow as you said about filters below. On Monday, August 12, 2013, ccalugaru wrote: > Hi all, > I have the following hbase use case: >

Re: PrefixFilter

2013-08-12 Thread Ted Yu
In filterAllRemaining() method: public boolean filterAllRemaining() { return passedPrefix; } In filterRowKey(): // if they are equal, return false => pass row // else return true, filter row // if we are passed the prefix, set flag int cmp = Bytes.compareTo(buffer, o

PrefixFilter

2013-08-12 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN)
Anyone know if the prefix filter[1] does a full table scan? 1 - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html

Is repeatedly seeing ZK EndOfStreamException normal?

2013-08-12 Thread Dongcai Shen / Xiaoli Shen
Hi, there. I ran some HBase workload. The job completed normally. The HBase uses an external ZK service. However, I saw this ZK exception showing up repeatedly. Is this a normal phenomenon? Many thanks. EndOfStreamException: Unable to read additional data from client sessionid 0xXXX, likely clien

Re: Performance Are Affected? - Table and Family

2013-08-12 Thread Stas Maksimov
Hi Bing, Generally it is not advised to have more than 2-3 column families, unless you are using them absolutely separately from each other. Please see here: http://hbase.apache.org/book/number.of.cfs.html Thanks, Stas On 12 August 2013 18:00, Bing Li wrote: > Dear all, > > I have one addition

Re: Bulkloading impacts to block locality (0.94.6)

2013-08-12 Thread Scott Kuehn
Hi JM, After forcing major compactions on all tables, the locality index crept up to ~100%. This means the table I suspected to be problematic was actually fine, and some of the legacy tables on the cluster had a high percentage of non-local blocks. A per-table version of hdfsBlocksLocalityIndex

Performance Are Affected? - Table and Family

2013-08-12 Thread Bing Li
Dear all, I have one additional question about table and family. A table which has less families is faster than the one which has more families if the amount of data they have is the same? Correct or not? Is it a higher performance design to put fewer families into a table? Thanks so much! Bes

Re: Client Get vs Coprocessor scan performance

2013-08-12 Thread James Taylor
Hey Kiru, Another option for you may be to use Phoenix ( https://github.com/forcedotcom/phoenix). In particular, our skip scan may be what you're looking for: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html. Under-the-covers, the skip scan is doing a series of paral

Re: Table and Family

2013-08-12 Thread Stas Maksimov
Hi there, On your second point, I don't think column family can ever be an optional parameter, so I'm not sure this understanding is correct. Regards, Stas. On 12 August 2013 17:22, Bing Li wrote: > Hi, all, > > My understandings about HBase table and its family are as follows. > > 1) Each tab

Table and Family

2013-08-12 Thread Bing Li
Hi, all, My understandings about HBase table and its family are as follows. 1) Each table can consist of multiple families; 2) When retrieving with SingleColumnValueFilter, if the family is specified, other families contained in the same table are not affected. Are these claims right? But I got

Re: HBase Test issue

2013-08-12 Thread Jean-Marc Spaggiari
Hi, You might most probably want to be more talkative if you are expecting some help from the community. Like: "Hi, I tried HBase version XXX. I did 'this' 'that' and 'that' and doing it I faced the issue below. Can you please let me know where I should start to look? Thanks a lot for your help.

Re: HBase/HDFS Data Nodes Management

2013-08-12 Thread Jean-Marc Spaggiari
Hi Oussama. 1) That's all the goal of Hadoop and HBase ;) You might want to ready Hadoop the Definitive guide and HBase the Definitive guide... 2) HBase is based on Hadoop and take advantage of it's repplication process. 3) There is also a way to backup the data manually or to configure replicatio

HBase/HDFS Data Nodes Management

2013-08-12 Thread Oussama Jilal
Hello everyone, I have some questions that I wish to get answers to regarding how HBase and HDFS manages the data nodes. Q1- Can I remove a node from a cluster without loosing data ? Q2- If yes (Q1), does that depend on the replication of data between nodes or I don't need to worry about it e