Re: repetita iuvant?

2012-10-25 Thread surfer
On 10/25/2012 07:44 AM, Anoop Sam John wrote: Hi Can you tell more details? How much data your scan is going to retrieve? it's a full scan of 1.7TB of data on 62 regionserver+master and ZK quorum machines. I hoped that in some way block caching may slightly improve the read perfomances. hbase

resource usage of ResultScanner's IteratorResult

2012-10-25 Thread Oliver Meyn (GBIF)
Hi all, I'm on cdh3u3 (hbase 0.90.4) and I need to provide a bunch of row keys based on a column value (e.g. give me all keys where column dataset = 1234). That's straightforward using a scan and filter. The trick is that I want to return an Iterator over my key type (Integer) rather than

How do I test SampleRegionWALObserver?

2012-10-25 Thread Michael Spiegle
I'm using CDH4, but I'd imagine that the steps would be similar for upstream HBase. I've already got the tests JAR file loaded as part of my classpath. I also have the following in my hbase-site.xml: property namehbase.coprocessor.region.classes/name

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Jean-Marc Spaggiari
Hi all, First, sorry about my slowness to reply to this thread, but it went to my spam folder and I lost sight of it. I don’t have good knowledge of RDBMS, and so I don’t have good knowledge of triggers too. That’s why I looked at the endpoints too because they are pretty new for me. First, I

Re: Delete by timestamp?

2012-10-25 Thread Jean-Marc Spaggiari
Hi Ted, Sorry, I totally missed this email too :( Many of my HBase list emails went into the junk folder those last few weeks and just figured that yesterday. So I have a LOT of reading to do. I looked at all the HBASE-6942 thread and seems it's now commited in 0.94.3. I will wait for this

Re: Delete by timestamp?

2012-10-25 Thread Ted Yu
Glad that you like the approach, Jean-Marc. Tell us your experience when 0.94.3 comes out next month. On Thu, Oct 25, 2012 at 6:51 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Ted, Sorry, I totally missed this email too :( Many of my HBase list emails went into the junk folder

Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Hi everyone Continuing on my journey in Hadoop world. I have installed: Hive 0.9 on Hbase 0.94.2 cluster atop an Hadoop 1.0.3 cluster. When I do simply query like select * everything is fine. When I try select foo from myTestTable I see the MR start on my hadoop monitor UI and it fails with:

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Jean-Daniel Cryans
Hi Nick, Have you tried what's in this documentation? https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-Usage It seems to me that you should add guava in the auxpath. J-D On Thu, Oct 25, 2012 at 7:52 AM, Nick maillard nicolas.maill...@fifty-five.com wrote: Hi

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Hi Jean-Daniel Thanks for the quick reply. Guava is already in my auxpath. I guess I have found a solution by adding protobuff to the auxpath. Basically instead of: ./bin/hive --auxpath $HIVE_HOME/lib/hive-hbase-handler-0.9.0.jar, $HBASE_HOME/hbase-0.94.2.jar,

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Jean-Daniel Cryans
Oh I meant to say protobuf, not guava. And yes that's how it should be done. You can also set your auxpath in your hive-env so that you don't need to keep a big command line around. J-D On Thu, Oct 25, 2012 at 8:06 AM, Nick maillard nicolas.maill...@fifty-five.com wrote: Hi Jean-Daniel Thanks

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Hi jean-Daniel Ok I'll sent it in the env thanks for the advice. Are their other libs I might need to add? Could just tell hive to use it's lib directory or hbase's lib directory in it's classpath in some way? I could just set it in the bashrc but that's not very elegant. Another thing I am

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Sorry typo my query was 'select count(*) from myTestTable'

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Jean-Daniel Cryans
On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard nicolas.maill...@fifty-five.com wrote: Hi jean-Daniel Ok I'll sent it in the env thanks for the advice. Are their other libs I might need to add? The usual client libs... doesn't seem like we documented them anywhere... it's pretty much what you

Re: Hbase import Tsv performance (slow import)

2012-10-25 Thread Jonathan Bishop
Nicolas, I just went through the same exercise. There are many ways to get this to go faster, but eventually I decided that bulk loading is the best solution as run times scaled with the number machines in my cluster when I used that approach. One thing you can try is to turn off hbase's write

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Hi Jean-Daniel Again thanks for the quick reply and for the env detail I'll get to it. Of course select count (*) is not what I want to optimize. My more regular queries will have an Hbase schema designed for them using the rowkeys and potentially column families etc... I'm guessing Hive uses

Re: Query regarding HBase Mapreduce

2012-10-25 Thread Bertrand Dechoux
Hi Amit, You might want to add details to your question. 1) Lot of small files is a known 'problem' for Hadoop MapReduce. And you will find information on it by searching. http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ I assume you have a more specific issue, what is it? 2) I am

Re: Query regarding HBase Mapreduce

2012-10-25 Thread Nick maillard
Hi amit I am starting with Hbase and MR so my opinion ismore about what I read than real world. However the documentation says Hadoop will deal better with a set of large files than a lot of small ones. regards amit bohra bohra.a@... writes:

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Jean-Daniel Cryans
On Thu, Oct 25, 2012 at 9:00 AM, Nick maillard nicolas.maill...@fifty-five.com wrote: Hi Jean-Daniel Again thanks for the quick reply and for the env detail I'll get to it. Of course select count (*) is not what I want to optimize. My more regular queries will have an Hbase schema designed

Re: Hive Hbase 0.94 ClassNotFoundException com.google.protobuf.Message

2012-10-25 Thread Nick maillard
Hi Jean-Daniel We are trying different types of software to solve our size issue. We aggregate data on website interactions and this can easily go up to a couple of millions lines with a couple tens of interactions types per day. We would like to keep a rolling 2 years history. We expect this

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Anoop John
What I still don’t understand is, since both CP and MR are both running on the region side, with is the MR better than the CP? For the case bulk delete alone CP (Endpoint) will be better than MR for sure.. Considering your over all need people were suggesting better MR.. U need a scan and move

Re: Query regarding HBase Mapreduce

2012-10-25 Thread lohit
When you say small files, do you mean to say those are stored within HBase columns? If so, you need not worry as HBase would eventually write bigger HFile on disk (or HDFS). If you are storing lot of small files on HDFS itself, then you will have scalability problems as single NameNode cannot

Re: Hbase import Tsv performance (slow import)

2012-10-25 Thread anil gupta
Hi Nicolas, As per my experience you wont get good performance if you run 3 Map task simultaneously on one Hard Drive. That seems like a lot of I/O on one disk. HBase performs well when you have at least 5 nodes in cluster. So, running HBase on 3 nodes is not something you would do in prod.

Re: Hbase import Tsv performance (slow import)

2012-10-25 Thread anil gupta
@Jonathan, As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL wont have any impact on performance. On Thu, Oct 25, 2012 at 1:33 PM, anil gupta anilgupt...@gmail.com wrote: Hi Nicolas, As per my experience you wont get good performance if you run 3 Map task

Re: Coprocessor end point vs MapReduce?

2012-10-25 Thread Jerry Lam
Hi JM: There was a thread discussing M/R bulk delete vs. Coprocessor bulk delete. The thread subject is Bulk Delete. The guy in that post suggested to write a HFile which contains all the delete markers and then use bulk incremental load facility to actually move all the delete markers to the

Re: Best technique for doing lookup with Secondary Index

2012-10-25 Thread anil gupta
Anoop: In prePut hook u call HTable#put()? Anil: Yes i call HTable#put() in prePut. Is there better way of doing it? Anoop: Why use the network calls from server side here then? Anil: I thought this is a cleaner approach since i am using BulkLoader. I decided not to run two jobs since i am

RE: Best technique for doing lookup with Secondary Index

2012-10-25 Thread Anoop Sam John
Hi Anil, Some confusion after seeing your reply. You use bulk loading? You created your own mapper? You call HTable#put() from mappers? I think confusion in another thread also.. I was refering to the HFileOutputReducer.. There is a TableOutputFormat also... In

RE: Hbase import Tsv performance (slow import)

2012-10-25 Thread Anoop Sam John
As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL wont have any impact on performance. This is if HFileOutputFormat is being used.. There is a TableOutputFormat which also can be used as the OutputFormat for MR.. Here write to wal is applicable This one, instead of

RE: problem with fliter in scan

2012-10-25 Thread Anoop Sam John
Use SingleColumnValueFilter#filterIfMissing(true) s.setBatch(10); How many total columns in the Schema? When using the SingleColumnValueFilter setBatch() might not work ou always.. FYI -Anoop- From: jian fan [xiaofanhb...@gmail.com] Sent: Friday,

RE: Best technique for doing lookup with Secondary Index

2012-10-25 Thread Ramkrishna.S.Vasudevan
Is it a good idea to create Htable instance on B and do put in my mapper? I might try this idea. Yes you can do this.. May be the same mapper you can do a put for table B. This was how we have tried loading data to another table by using the main table A Puts. Now your main question is

RE: Best technique for doing lookup with Secondary Index

2012-10-25 Thread Anoop Sam John
Anil Have a look at MultiTableOutputFormat ( I am refering to 0.94 code base Not sure whether available in older versions) -Anoop- From: Ramkrishna.S.Vasudevan [ramkrishna.vasude...@huawei.com] Sent: Friday, October 26, 2012 9:50 AM To:

Re: Best technique for doing lookup with Secondary Index

2012-10-25 Thread anil gupta
Hi Anoop, Yes i use bulk loading for loading table A. I wrote my own mapper as Importtsv wont suffice my requirements. :) No, i dont call HTable#put() from my mapper. I was thinking about trying out calling HTable#put() from my mapper and see the outcome. I meant to say that when we use MR job