Re: Null rowkey with empty get operation

2012-05-29 Thread Ben Kim
Maybe I showed you a bad example. This makes more sense when it comes to using List For instance, List gets = new ArrayList(); for(String rowkey : rowkeys){ Get get = new Get(Bytes.toBytes(rowkey)); get.addFamily(family); Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new BinaryCom

Re: Null rowkey with empty get operation

2012-05-29 Thread lars hofhansl
Hi Ben, check with Result.isEmpty(). If that returns true the Result object has no results. -- Lars From: Ben Kim To: user@hbase.apache.org Sent: Tuesday, May 29, 2012 12:34 AM Subject: Re: Null rowkey with empty get operation Maybe I showed you a bad exa

RE: Null rowkey with empty get operation

2012-05-29 Thread Anoop Sam John
Hi Now it is a bit different case of consideration. As of now you need to deal with your input rowkeys and rowkeys that you get from the Results. empty rows = input rowkeys - rowkeys from result As Lars said you might need to check with isEmpty() on every Result or null check on result.ge

Re: Hbase master doesn't start

2012-05-29 Thread Mohammad Tariq
disable your IPv6 settings..problem with IPv6 is that using 0.0.0.0 for various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses..add the following lines at the end of your /etc/sysctl.conf - net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.defaul

Re: Null rowkey with empty get operation

2012-05-29 Thread Ben Kim
It looks like Result is returning the rowkey of the first keyvalue. That's why I was getting null when getting a Result's rowkey. in Result.getRow() method it is returning this.kvs[0].getRow() this probably is to save the cost of a Get RPC call, but i still think that Result should have its own r

Re: Null rowkey with empty get operation

2012-05-29 Thread N Keywal
There is a one to one mapping between the result and the get arrays; so the result for rowkeys[i] is in results[i]. That's not what you want? On Tue, May 29, 2012 at 9:34 AM, Ben Kim wrote: > Maybe I showed you a bad example. This makes more sense when it comes to > using List > For instance, > >

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Michel Segel
Depends... Try looking at a hierarchical model rather than a relational model... One thing to remember is that joins are expensive in HBase. Sent from a remote device. Please excuse any typos... Mike Segel On May 28, 2012, at 12:50 PM, Em wrote: > Hello list, > > I have some time now to tr

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread shashwat shriparv
Check out this link may be it will help you somewhat: http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies On Tue, May 29, 2012 at 4:09 PM, Michel Segel wrote: > Depends... > Try looking at a hierarchical model rather than a relational model... > > One thing to remember is

RE: ColumnCountGetFilter not working with FilterList

2012-05-29 Thread Anoop Sam John
Hi Ben, Is this the same code you are testing? Which version of HBase you are testing with? I have checked with 0.94 and it works fine ( with FilterList also ) -Anoop- From: Ben Kim [benkimkim...@gmail.com] Sent: Tuesday, May 29, 2012 5:04 P

RE: ColumnCountGetFilter not working with FilterList

2012-05-29 Thread Anoop Sam John
The problem is when we have more KVs in that row than the limit [limit in ColumnCountGetFilter].. In this case with FilterList the problem is coming. @Ben can u file a Jira? -Anoop- From: Anoop Sam John [anoo...@huawei.com] Sent: Tuesday, May 29, 2012 5

Problems with scan after lot of Puts

2012-05-29 Thread Ondřej Stašek
My program writes changes to HBase table by issuing lots of Puts (autoCommit turned off, flush on end) and afterwards uses ResultScanner on whole table to read all rows and act upon them. My problem is that on several occasions scan does not return expected rows. Either scan does not start on t

java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Amit Sela
Hi all, I just upgraded from HBase 0.90.2 to 0.94 (running on hadoop 0.20.3). It seems like the cluster is up and running. I tried running an old MR job that writes into HBase, and after Map is complete (map 100%) and before Reduce begins (reduce 0%) - I got the following Exception: 12/05/29 12

Re: Issues with Java sample for connecting to remote Hbase

2012-05-29 Thread Mohammad Tariq
change the name from "localhost" to something else in the line "10.78.32.131honeywel-4a7632localhost" and see if it works Regards,     Mohammad Tariq On Tue, May 29, 2012 at 6:59 PM, AnandaVelMurugan Chandra Mohan wrote: > I have HBase version 0.92.1 running in standalone mode. I create

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Marcos Ortiz
Are you sure that 0.94 is compatible with Hadoop 0.20.3? On 05/29/2012 09:13 AM, Amit Sela wrote: Hi all, I just upgraded from HBase 0.90.2 to 0.94 (running on hadoop 0.20.3). It seems like the cluster is up and running. I tried running an old MR job that writes into HBase, and after Map is

Re: Issues with Java sample for connecting to remote Hbase

2012-05-29 Thread AnandaVelMurugan Chandra Mohan
Thanks for the response. It still errors out. On Tue, May 29, 2012 at 7:05 PM, Mohammad Tariq wrote: > change the name from "localhost" to something else in the line > "10.78.32.131honeywel-4a7632localhost" and see if it works > > Regards, > Mohammad Tariq > > > On Tue, May 29, 2012

RE: ColumnCountGetFilter not working with FilterList

2012-05-29 Thread Ramkrishna.S.Vasudevan
Discussing with Anoop I think PageFilter also may not work when used with FilterList. Need to check this. Please file a JIRA for the same we can look into all possibilities. Regards Ram > -Original Message- > From: Anoop Sam John [mailto:anoo...@huawei.com] > Sent: Tuesday, May 29, 2012

Re: Issues with Java sample for connecting to remote Hbase

2012-05-29 Thread N Keywal
>From http://hbase.apache.org/book/os.html: HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions, for example, will default to 127.0.1.1 and this will cause problems for you. It worths reading the whole section ;-). You also don't need to set the master addre

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Amit Sela
I'm not sure Hadoop 0.20.3 is compatible with HBase 0.94 but I can't find any documentation about it On Tue, May 29, 2012 at 4:40 PM, Marcos Ortiz wrote: > Are you sure that 0.94 is compatible with Hadoop 0.20.3? > > > > On 05/29/2012 09:13 AM, Amit Sela wrote: > >> Hi all, >> >> I just upgraded

Re: understanding the client code

2012-05-29 Thread N Keywal
Hi, If you're speaking about preparing the query it's in HTable and HConnectionManager. If you're on the pure network level, then, on trunk, it's now done with a third party called protobuf. See the code from HConnectionManager#createCallable to see how it's used. Cheers, N. On Tue, May 29, 20

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Em
Hi, thanks for your help. Yes, I know these slides. However I can not find an answer to how to access such schemas efficiently. In case of the given schema for students and courses as in those slides, they say that each column contains the student's id / course's id. However, when you want to buil

Re: understanding the client code

2012-05-29 Thread S Ahmed
So how does thrift and avro fit into the picture? (I believe I saw references to that somewhere, are those alternate connection libs?) I know protobuf is just generating types for various languages... On Tue, May 29, 2012 at 10:26 AM, N Keywal wrote: > Hi, > > If you're speaking about preparin

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Amit Sela
could anyone recommend a compatible Hadoop version for HBase 0.94 ? Should I also upgrade zookeeper (3.3.2) ? Thanks. On Tue, May 29, 2012 at 5:10 PM, Amit Sela wrote: > I'm not sure Hadoop 0.20.3 is compatible with HBase 0.94 but I can't find > any documentation about it > > > On Tue, May 29,

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Marcos Ortiz
Yo have to check in the docs version, in the section 2.3 Hadoop of the HBase book, which is the compatible version: http://hbase.apache.org/book/book.html#hadoop Regards. On 05/29/2012 10:10 AM, Amit Sela wrote: I'm not sure Hadoop 0.20.3 is compatible with HBase 0.94 but I can't find any docu

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Marcos Ortiz
On 05/29/2012 10:35 AM, Amit Sela wrote: could anyone recommend a compatible Hadoop version for HBase 0.94 ? Look in the link that I send you before. http://hbase.apache.org/book/book.html#hadoop Should I also upgrade zookeeper (3.3.2) ? Yes, you should upgrade Zookeeper to 3.4.3 Thanks.

hosts unreachables

2012-05-29 Thread Cyril Scetbon
Hi, I've installed hbase on the following configuration : 12 x (rest hbase + regionserver hbase + datanode hadoop) 2 x (zookeeper + hbase master) 1 x (zookeeper + hbase master + namenode hadoop) OS used is ubuntu lucid (10.04) The issue is that when I try to load data using rest api, some host

HTablePool and the deprecated putTable in HBase 0.90.4

2012-05-29 Thread Jeroen Hoek
Hello, In the documentation for the current HBase version HTablePool's method putTable() is deprecated, and we should use HTable.close() instead. We are currently using Cloudera's HBase distribution, version 0.90.4-cdh3u2, and are still using putTable(). The Javadoc for this release does not depr

hosts unreachables

2012-05-29 Thread Cyril Scetbon
Hi, I've installed hbase on the following configuration : 12 x (rest hbase + regionserver hbase + datanode hadoop) 2 x (zookeeper + hbase master) 1 x (zookeeper + hbase master + namenode hadoop) OS used is ubuntu lucid (10.04) The issue is that when I try to load data using rest api, some host

Re: hosts unreachables

2012-05-29 Thread Stack
On Tue, May 29, 2012 at 7:25 AM, Cyril Scetbon wrote: > 07:05:01 PM       1      0.04      0.00      0.01      0.09      0.03 > 99.83 <--- last measure before host becomes reachable > 07:40:07 PM     all     14.72      0.00     17.93      0.02     13.31 > 54.02 <--- new measure after host becomes

Re: HTablePool and the deprecated putTable in HBase 0.90.4

2012-05-29 Thread Stack
On Tue, May 29, 2012 at 7:15 AM, Jeroen Hoek wrote: > Hello, > > In the documentation for the current HBase version HTablePool's method > putTable() is deprecated, and we should use HTable.close() instead. > > We are currently using Cloudera's HBase distribution, version > 0.90.4-cdh3u2, and are s

Re: understanding the client code

2012-05-29 Thread N Keywal
There are two levels: - communication between hbase client and hbase cluster: this is the code you have in hbase client package. As a end user you don't really care, but you care if you want to learn hbase internals. - communication between customer code and hbase as a whole if you don't want to us

Re: understanding the client code

2012-05-29 Thread S Ahmed
I don't really want any, I just want to learn the internals :) So why would someone not want to use the client, for data intensive tasks like mapreduce etc. where they want direct access to the files? On Tue, May 29, 2012 at 11:00 AM, N Keywal wrote: > There are two levels: > - communication be

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
Em, What you're describing is a classic relational database nested loop or hash join; the only difference is that relational databases have this feature built in, and can do it very efficiently because they typically run on a single machine, not a distributed cluster. By moving to HBase, you're

hosts unreachables

2012-05-29 Thread Cyril Scetbon
Hi, I've installed hbase on the following configuration : 12 x (rest hbase + regionserver hbase + datanode hadoop) 2 x (zookeeper + hbase master) 1 x (zookeeper + hbase master + namenode hadoop) OS used is ubuntu lucid (10.04) The issue is that when I try to load data using rest api, some host

Re: understanding the client code

2012-05-29 Thread N Keywal
So it's the right place for the internals :-). The main use case for the thrift api is when you have non java client code. On Tue, May 29, 2012 at 5:07 PM, S Ahmed wrote: > I don't really want any, I just want to learn the internals :) > > So why would someone not want to use the client, for data

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Stack
On Tue, May 29, 2012 at 6:13 AM, Amit Sela wrote: >> 12/05/29 12:32:02 INFO mapred.JobClient: Task Id : > attempt_201205291226_0001_r_00_0, Status : FAILED > Error: java.lang.ClassNotFoundException: com.google.protobuf.Message You have protobuf on your CLASSPATH when the job runs? St.Ack

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Amit Sela
I do. I tried with protobuf-java-2.4.0a.jar that ships with HBase 0.94, and also with the older version I had - protobuf-java-2.3.0.jar. Both times I got the same error.. On Tue, May 29, 2012 at 6:28 PM, Stack wrote: > On Tue, May 29, 2012 at 6:13 AM, Amit Sela wrote: > >> 12/05/29 12:32:02 I

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Em
Ian, thanks for your detailed response! Let me give you feedback to each point: > 1. You could denormalize the additional information (e.g. course > name) into the students table. Then, you're simply reading the > student row, and all the info you need is there. That places an extra > burden of w

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread N Keywal
Hi, For the multiget, if it's small enough, it will be: - parallelized on all region servers concerned. i.e. you will be as fast as the slowest region server. - there will be one query per region server (i.e. gets are grouped by region server). If there are too many gets, it will be split in smal

Re: java.lang.ClassNotFoundException: com.google.protobuf.Message

2012-05-29 Thread Stack
On Tue, May 29, 2012 at 8:39 AM, Amit Sela wrote: > I do. > > I tried with protobuf-java-2.4.0a.jar that ships with HBase 0.94, and also > with the older version I had - protobuf-java-2.3.0.jar. > > Both times I got the same error.. > I'd suggest you keep on digging down this vein. ClassNotFound

Re: hbase data

2012-05-29 Thread Josh Patterson
unless you need low latency access to all of this time series, it might be a more cost efficient path to store large archives of the data in plain HDFS. The scanning can be done more efficiently in a lot of cases in MapReduce + HDFS. Some links: OSCON-data presentation (good TVA story here): ht

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
A few more responses: On May 29, 2012, at 10:54 AM, Em wrote: > In fact, everything you model with a Key-Value-storage like HBase, > Cassandra etc. can be modeled as an RDMBS-scheme. > Since a lot of people, like me, are coming from that edge, we must > re-learn several basic things. > It starts

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Em
Hi Ian, great to hear your thoughts! Before I am going to give more feedback to your whole post, I want to take your own example and try to get an image of a hbase-approach to do that. > But you're trading time & space at write time for extremely fast > speeds at write time. You ment "extremely f

Data import from Distributed Hbase cluster to Pseudo Distributed Hbase cluster

2012-05-29 Thread arun sirimalla
Hi, I want to copy a table from Distributed Hbase cluster to Pseudo Distributed Hbase Cluster. Here the Pseudo Distributed Hbase cluster uses Linux Filesystem as storage. How can i import data from Distributed Hbase cluster to Pseudo Distributed Hbase Cluster. Thanks Arun Cloudwick Technologies

Re: Data import from Distributed Hbase cluster to Pseudo Distributed Hbase cluster

2012-05-29 Thread Amandeep Khurana
Assuming that your data fits into the pseudo dist cluster and both clusters can talk to each other, the CopyTable job that comes bundled with HBase should work. -ak On Tuesday, May 29, 2012 at 11:42 AM, arun sirimalla wrote: > Hi, > > I want to copy a table from Distributed Hbase cluster to

Re: Data import from Distributed Hbase cluster to Pseudo Distributed Hbase cluster

2012-05-29 Thread arun sirimalla
Hi Amandeep, Thanks for your reply. I tried the copytable tool to copy the table from Distributed Hbase cluster to Pseudo Distributed Hbase Cluster using the below command hadoop jar hbase-0.90.3-cdh3u1.jar copytable --peer.adr=sand001.rssand001:2181:/storage/hbase test It initializes the job an

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
On May 29, 2012, at 1:24 PM, Em wrote: >> But you're trading time & space at write time for extremely fast >> speeds at write time. > You ment "extremely fast speeds at read time", don't you? Ha, yes, thanks. That's what I meant. > However this means that Sheldon has to do at least two requests

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Em
Hi Ian, answers between the lines: Am 29.05.2012 21:26, schrieb Ian Varley: >> However this means that Sheldon has to do at least two requests to fetch >> his latest tweets. >> First: Get the latest columns (aka tweets) of his row and second do a >> multiget to fetch their content. Okay, that's m

Re: Problems with scan after lot of Puts

2012-05-29 Thread Jean-Daniel Cryans
Care to share that TestPutScan? Just attach it in a pastebin Thx, J-D On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek wrote: > My program writes changes to HBase table by issuing lots of Puts (autoCommit > turned off, flush on end) and afterwards uses ResultScanner on whole table > to read all r

Re: Data import from Distributed Hbase cluster to Pseudo Distributed Hbase cluster

2012-05-29 Thread Andrew Purtell
As this is a quite old version of HBase and a CDH version, you should ask on the CDH lists. - Andy On Tue, May 29, 2012 at 12:13 PM, arun sirimalla wrote: > Hi Amandeep, > > Thanks for your reply. I tried the copytable tool to copy the table from > Distributed Hbase cluster to Pseudo Distrib

Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table

2012-05-29 Thread anil gupta
Hi All, Sorry for late reply as i got stuck in other task at work on Friday and skimming through the HBase-4676 took me a while. HBase-6093 seems to be very close to my suggestion. The only difference is that Matt mentioned in the description that it can only be used when all inserts are type=Put

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
On May 29, 2012, at 3:25 PM, Em wrote: Yup, unless you denormalize the tweet bodies as well--then you just read the current user's record and you have everything you need (with the downside of massive data duplication). Well, I think this would be bad practice for editable stuff like tweets. Th

Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table

2012-05-29 Thread Matt Corgan
> > Is this feature going to be part of any future release of HBase? i couldn't get it finished in time for 0.94, but i think it's very likely to be in 0.96, possibly with a backport to .94. Scan speed should improve if i have time to optimize the cell comparators and collators On Tue, May 29,

Re: ColumnCountGetFilter not working with FilterList

2012-05-29 Thread Ben Kim
Ram, you are right PageFilter won't work either I filed hbase jira at https://issues.apache.org/jira/browse/HBASE-6132 On Tue, May 29, 2012 at 10:49 PM, Ramkrishna.S.Vasudevan < ramkrishna.vasude...@huawei.com> wrote: > Discussing with Anoop I think PageFilter also may not work when used with >

performance of a hbase map/reduce job

2012-05-29 Thread Ey-Chih chow
Hi, We have some performance issue with a hbase map-only job, that copies data from hbase to hdfs. We profiled the job and what follows were the top 6 ranked methods, in terms of CPU usages, and the corresponding traces. ===

Re: performance of a hbase map/reduce job

2012-05-29 Thread Ted Yu
Can you describe your zookeeper setup a little more ? Thanks On Tue, May 29, 2012 at 6:52 PM, Ey-Chih chow wrote: > Hi, > > We have some performance issue with a hbase map-only job, that copies data > from hbase to hdfs. We profiled the job and what follows were the top 6 > ranked methods, in

RE: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table

2012-05-29 Thread Anoop Sam John
Hi Anil, As HBASE-4676 is not available as of now, may be you can check other enoders, DiffKeyDeltaEncoder or FastDiffDeltaEncoder. Pls go through the javadoc of these and see what they do apart from compressing the timestamp parts. These do other nice stiff too which will make your data

Re: Problems with scan after lot of Puts

2012-05-29 Thread Ondřej Stašek
Here it is: http://pastebin.com/0AgsQjur On 29.5.2012 22:44, Jean-Daniel Cryans wrote: Care to share that TestPutScan? Just attach it in a pastebin Thx, J-D On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek wrote: My program writes changes to HBase table by issuing lots of Puts (autoCommit t

Re: performance of a hbase map/reduce job

2012-05-29 Thread Ey-Chih chow
It's a 3 node cluster running zookeeper-3.3.3 no modifications have been made by us. This is separate and dedicated only for hbase. We were looking at the ganglia graphs, not much load almost none, traffic is 1-2kb/sec. Thanks. Ey-Chih Chow On May 29, 2012, at 7:39 PM, Ted Yu wrote: > Can you