help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Qingyan(Evan) Liu
dears, I'm fresh to hbase. I just checkout hbase trunk rev-792389, and test its performance by means of org.apache.hadoop.hbase.PerformanceEvaluation (Detailed testing results are listed below). It's strange that the scan speed is as slower as randomRead. I haven't change any configuration paramet

MapReduce newbie questions

2009-07-09 Thread Michael Hauck
Hi, i'm new to hbase MapReduce and want to do following: - create daily statistics with sql queries against a sql database - store statistic results in hbase - run daily MapReduce on that results to compute monthly statistics I stored this data in hbase table 'route_conversion_statistics'. My

TSocket: timed out reading 4 bytes from

2009-07-09 Thread Hegner, Travis
Hi All, I am testing 0.20.0-alpha, r785472 and am coming up with an issue I can't seem to figure out. I am accessing hbase from php via thrift. The php script is pulling data from our pgsql server and dumping it into hbase. Hbase is running on a 6 node hadoop cluster (0.20.0-plus4681, r767961)

Re: help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Jean-Daniel Cryans
Even, The scan probably warmed the cache here. Do the same experiment with a fresh HBase for the scan and the random reads. J-D On Thu, Jul 9, 2009 at 5:14 AM, Qingyan(Evan) Liu wrote: > dears, > > I'm fresh to hbase. I just checkout hbase trunk rev-792389, and test its > performance by means of

Re: help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Qingyan(Evan) Liu
Dear J-D, Here's my another two tests. I changed the order of the tests. Before each test, I restarted both hbase & hadoop. All are 50,000 rows with sizeof 1KB. (1) randomWrite-randomRead-randomRead-scan-scan-randomRead 7117ms-15966ms-16678ms-10429ms-10730ms-15641ms (2) randomWrite-scan-scan-ran

Re: help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Jonathan Gray
Not every test is created equal, different tests are testing different things, and different environments/setups/configurations can yield different results. I posted the utility (HBench) I used to generate the statistics from those slides up in a jira. You can grab it and try it out to see wh

Re: MapReduce newbie questions

2009-07-09 Thread Jonathan Gray
First, I recommend upgrading to the latest HBase 0.19 release, 0.19.3. You have a few choices, but in short you want to use filters. http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/package-summary.html Specifically, you should look at the RegExpRowFilter: http://

Re: TSocket: timed out reading 4 bytes from

2009-07-09 Thread Travis Hegner
Hi Again, Since the tests mentioned below, I have finally figured out how to build and run from the trunk. I have re-created my hbase install from svn, configured it, updated my thrift client library, and my current import has been through more than 5 region splits without failing. Next step, wri

Re: TSocket: timed out reading 4 bytes from

2009-07-09 Thread Travis Hegner
Of course, as luck should have it... I spoke too soon. I am still suffering from that region split problem, but it doesn't seem to happen on every region split. I do know for sure that with the final split, the new daughter regions were re-assigned to the original parent's server. It made it throu

Re: TSocket: timed out reading 4 bytes from

2009-07-09 Thread Jonathan Gray
My recommendation would be to not use thrift for bulk imports. Travis Hegner wrote: Of course, as luck should have it... I spoke too soon. I am still suffering from that region split problem, but it doesn't seem to happen on every region split. I do know for sure that with the final split, the

Re: TSocket: timed out reading 4 bytes from

2009-07-09 Thread Travis Hegner
I am not extremely java savvy quite yet... is there an alternative way to access Hbase from PHP? I have read about the REST libraries, but haven't tried them yet. Are they sufficient for bulk import? Or, is a bulk import something that simply must be done from java, without exception? Thanks for t

ClassLoader issue - class not found yet classloader can load it

2009-07-09 Thread Saptarshi Guha
Hello, I'm using HBase(trying to) from another language (R) which has it's own classloader (call it RCL) RCL takes its own classpath (in which i've included all Hbase JARS and conf folder) and did the following cfg <- .jnew("org/apache/hadoop/hbase/HBaseConfiguration") ## Create a new Java objec

Re: ClassLoader issue - class not found yet classloader can load it

2009-07-09 Thread Ryan Rawson
You need to include everything in lib/*.jar. this includes the hadoop jar... The problem might be that rcl isn't registering itself so the recursive class loaders aren't using rcl? On Jul 9, 2009 2:15 PM, "Saptarshi Guha" wrote: Hello, I'm using HBase(trying to) from another language (R) which

Re: TSocket: timed out reading 4 bytes from

2009-07-09 Thread Jonathan Gray
It's not that it must be done from java, it's just that the other interfaces add a great deal of overhead and also do not let you do the same kind of batching that helps significantly with performance. If you don't care about the time it takes, then you could stick with thrift. Try to throttl

Re: ClassLoader issue - class not found yet classloader can load it

2009-07-09 Thread Saptarshi Guha
Hello, Thanks for the tip. I have added all the jar files in HBASE_HOME and HBASE_HOME/lib and HBASE_HOME/conf is in the classpath. Going through the code of Configuration.java and HBaseConfiguration.java, the latter is a simple subclass and setClassLoader replaces the classloader with a user suppl

Delete issue in HBase 0.20 alpha

2009-07-09 Thread Bryan Keller
I am having an issue after deleting a row in the HBase 0.20 alpha. After deleting the row using the Delete object, I cannot put a row back that uses the same key as the deleted row. No exceptions occur in my code. E.g. HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTabl

Re: ClassLoader issue - class not found yet classloader can load it

2009-07-09 Thread Ryan Rawson
The hbase code calls either classforname or uses the system implied classloader when it refers to other classes. Maybe there is something there? Its using the default java classloader (which doesn't have your classpath) maybe? On Jul 9, 2009 2:38 PM, "Saptarshi Guha" wrote: Hello, Thanks for the

Re: Delete issue in HBase 0.20 alpha

2009-07-09 Thread Ryan Rawson
Can you try with the latest trunk? Many bugs, including delete bugs, were fixed. On Jul 9, 2009 2:43 PM, "Bryan Keller" wrote: I am having an issue after deleting a row in the HBase 0.20 alpha. After deleting the row using the Delete object, I cannot put a row back that uses the same key as the

Using DBInputFormat in Map/Reduce

2009-07-09 Thread llpind
I'm using DBInputFormat to upload data into HBase table. The query takes a while to run, meaning the split is taking a while. I upped the timeout like so: mapred.task.timeout 180 This avoided my map tasks from being killed. I have 50 map tasks and 9 reduce tasks on a 5

Re: Using DBInputFormat in Map/Reduce

2009-07-09 Thread llpind
The other 9 show status as "initializing" for a long time, as the percentage of one task continues to increase. llpind wrote: > > I'm using DBInputFormat to upload data into HBase table. The query takes > a while to run, meaning the split is taking a while. I upped the timeout > like so: >

Re: Delete issue in HBase 0.20 alpha

2009-07-09 Thread Bryan Keller
I can't get the trunk to run. Looks like the way Zookeeper starts has change, and it tries to map my DHCP-assigned IP address to the list of quorum servers in the hbase-default.xml file (which is only "localhost" by default) and complains it can't be found. I tried adding my IP there but still can'

Re: Delete issue in HBase 0.20 alpha

2009-07-09 Thread Nitay
Hi Bryan, For the latest trunk, are you using your own zoo.cfg, or overwriting the options from hbase-default.xml? -n On Thu, Jul 9, 2009 at 4:20 PM, Bryan Keller wrote: > I can't get the trunk to run. > Looks like the way Zookeeper starts has change, and it tries to map my > DHCP-assigned IP

Re: Delete issue in HBase 0.20 alpha

2009-07-09 Thread Bryan Keller
I'm not changing anything, just using the default hbase-default.xml as-is (which appears to be setup for standalone mode), and not creating a zoo.cfg. The only thing I am changing is the JAVA_HOME in hbase-env.sh. On Thu, Jul 9, 2009 at 4:36 PM, Nitay wrote: > Hi Bryan, > > For the latest trunk,

HTablePool question

2009-07-09 Thread Vaibhav Puranik
Hi, It looks like HTablePool is designed to have one instance of HTablePool per table. I am confused by the static map inside HTablePool class. If we can instantiate one HTablePool per table, what's the use of the map? Furthermore, the map is static and there is no way to add multiple tables to i

Re: help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Qingyan(Evan) Liu
Thank JG a lot! I've just svn update and test the new codes, which setScannerCaching(30). Performance of scan is now very high: 5460ms at offset 0 for 1 rows. So, the conclusion is clear, switch on the prefetch will greatly boost the scan speed. Thank you all kind guys. sincerely, Evan

Re: help! hbase rev-792389, scan speed is as slower as randomRead!

2009-07-09 Thread Ryan Rawson
On large map-reduce runs with small rows, I set scanner caching to 1000-3000 rows. This seemingly minor change allows me to reach 4.5m row reads/sec (~ 40 bytes per row). Without that, single row fetch is stupid slow. I don't think we can set a reasonable value here for 2 reasons: - for those wh

Question about HBase

2009-07-09 Thread zsongbo
Hi all, 1. In this configuration property: hbase.hstore.compactionThreshold 3 If more than this number of HStoreFiles in any one HStore (one HStoreFile is written per flush of memcache) then a compaction is run to rewrite all HStoreFiles files as one. Larger numbers

Re: Question about HBase

2009-07-09 Thread Ryan Rawson
re: #2: in fact we don't know that... I know that I ran run 200-400 regions on a regionserver with a heap size of 4-5gb. More even. I bet I could have 1000 regions open on 4gb ram. Each region is ~ 1mb of all the time data, so there we go. As for compactions, they are fairly fast, 0-30s or so d

Re: Question about HBase

2009-07-09 Thread zsongbo
Hi Ryan, Thanks. If your regionsize is about 250MB, than 400 regions can store 100GB data on each regionserver. Now, if you have 100TB data, then you need 1000 regionservers. We are not google or yahoo who have so many nodes. Schubert On Fri, Jul 10, 2009 at 12:29 PM, Ryan Rawson wrote: > re:

Re: Question about HBase

2009-07-09 Thread Ryan Rawson
That size is not memory-resident, so the total data size is not an issue. The index size is what limits you with RAM, and its about 1 MB per region (256MB region). -ryan On Thu, Jul 9, 2009 at 9:51 PM, zsongbo wrote: > Hi Ryan, > > Thanks. > > If your regionsize is about 250MB, than 400 regions

Re: Delete issue in HBase 0.20 alpha

2009-07-09 Thread Bryan Keller
Something checked in yesterday (7/9/09) caused the startup problem for me. I rolled back (svn update -r {20090709}), and HBase started up. The delete problem I was having is fixed. So I'll use that rev (r792422) for now. Thanks, Bryan On Thu, Jul 9, 2009 at 4:41 PM, Bryan Keller wrote: