Re: repetita iuvant?

2012-10-24 Thread surfer
On 10/25/2012 07:44 AM, Anoop Sam John wrote: > Hi > Can you tell more details? How much data your scan is going to retrieve? it's a full scan of 1.7TB of data on 62 regionserver+master and ZK quorum machines. I hoped that in some way block caching may slightly improve the read perfomances. hbase v

RE: repetita iuvant?

2012-10-24 Thread Anoop Sam John
Hi Can you tell more details? How much data your scan is going to retrieve? What is the time taken in each attempt ? Can you observe the cache hit ratio? What is the memory avail in RS?.Also the cluster details and regions -Anoop- From: surfer [sur..

repetita iuvant?

2012-10-24 Thread surfer
Hi I tried to run twice the same scan on my table data. I expected time to improve but that was not the case. What am I doing wrong? I set "scan.setCacheBlocks(true);" before the first scanning job to put if not all at least some block in memory. thank you surfer

RE: Best technique for doing lookup with Secondary Index

2012-10-24 Thread Ramkrishna.S.Vasudevan
Just out of curiosity, > The secondary index is stored in table "B" as rowkey B --> > family: A> what is rowkey B here? > 1. Scan the secondary table by using prefix filter and startRow. How is the startRow determined for every query ? Regards Ram > -Original Message- > From: Anoop Sam

RE: Best technique for doing lookup with Secondary Index

2012-10-24 Thread Anoop Sam John
>I build the secondary table "B" using a prePut RegionObserver. Anil, In prePut hook u call HTable#put()? Why use the network calls from server side here then? can not handle it from client alone? You can have a look at Lily project. Thoughts after seeing ur idea on put and scan.. -An

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread lars hofhansl
This is good advice Kevin we should add this to the HBase Reference Guide. From: Kevin O'dell To: user@hbase.apache.org Sent: Tuesday, October 23, 2012 10:47 AM Subject: Re: Hbase import Tsv performance (slow import) You will want to make sure your table is

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread nick maillard
hi anil I have one hard drive per slave. I have tested with 3 concurrent mappers and 28 concurrent mappers per slave. And both times the total time was about 1 hour the only difference was the time each map took aka respectfully 40min and 1h10min I have turned of the speculative execution. I'll

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread anil gupta
Hi Nick, How many hard drives your slaves has? RPM of those? How many mappers are run concurrently on a node?Did you turn off speculative execution? Have a look at disk i/o to see whether that is a bottleneck or not. MR is disk I/O bound so if you only have one disk on slave and you are running 5

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
Hello Kevin I'm using : Hadoop 1.0.3 Hbase 0.94.2 OS:ubuntu 12.04

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Kevin O'dell
Nick, What versions are you using: HDFS HBase OS On Oct 24, 2012 10:36 AM, "Nick maillard" wrote: > Hello everyone > > Still looking in the issue. > I have tried different tests and the results are surprising. > If I put mapred.tasktracker.map.tasks.maximum: 28 > I get a total of 84 tasks on

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
Hello everyone Still looking in the issue. I have tried different tests and the results are surprising. If I put mapred.tasktracker.map.tasks.maximum: 28 I get a total of 84 tasks on my cluster and the process takes about 1h15 min each task taking up 1h10 minutes. The whole file being cut down in

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
Looking my task logs there is a big gap in time I do not understand. The task connects to zookeeper creates the entries and from: 2012-10-24 12:25:24 to 2012-10-24 13:08:03 logs nothing. Doing map reduce I guess. 2012-10-24 12:25:23,323 INFO org.apache.zookeeper.ClientCnxn: Sessionestablishment

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Sonal Goyal
Hi Nick, Do you see anything in your tasktracker or datanode logs? Best Regards, Sonal Crux: Reporting for HBase Nube Technologies On Wed, Oct 24, 2012 at 3:45 PM, Nick maillard < nicolas.ma

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
As I have written in a reply above but that is kind of lost in the tread: I have set dfs.replication at 2 but this process time has not changed at all. How could I change my configuration to avoid this hotspot issue you have talked about. As Kevin has advised I have also upped: hbase.hstore.block

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
Hi John I have 42 map tasks capacity and running an avg tasks/nodes 28. when I check the map job details there are 80 tasks to complete. As i drill down on the different map tasks in task detail they all take a very long time (26 minutes) to complete. A lot of them fail as well. Fail info is "fai

Re: Hbase import Tsv performance (slow import)

2012-10-24 Thread Nick maillard
Thanks for your help I have taken my replication down to 2 but If I am not mistaken replication also has the benefit of rendering the cluster more fault by duplicating info on different nodes so that if one goes down data is note necessarily lost. I such case i would like to keep it a least at 2.