Table recovery options

2009-09-23 Thread elsif
We have a couple clusters running with lzo compression. When testing the new 0.20.1 release I setup a single node cluster and reused the compression jar and native libraries from the 0.20.0 release. The following session log shows a table being created with the lzo option and some rows being add

Re: Processing a large quantity of smaller XML files?

2009-09-23 Thread Ryan Rawson
Random writes is kind of the area where HBase is great at. Like you said, you cant just rewrite large sequence files every time you have a change, so you would consider using a system like HBase to help with that. I'd say if you want some map-reduce capacity, hdfs, hbase you will need 4 cpus per

Re: Processing a large quantity of smaller XML files?

2009-09-23 Thread stack
On Wed, Sep 23, 2009 at 2:01 PM, Andrzej Jan Taramina wrote: > I asked about the best way to process a large quantity of smaller XML files > using Hadoop mapred, on the main Hadoop > mailing list, and was advised that HBase would be a good alternative to > handle this. > > ... > > What I would li

Re: Optional memstore flush

2009-09-23 Thread stack
Looks like it was removed before the release of 0.19.0 by hbase-728 (Do svn diff -r705770:707247 conf/hbase-default.xml) so hasn't been working with a while? St.Ack On Wed, Sep 23, 2009 at 2:09 PM, Clint Morgan wrote: > >hbase.regionserver.optionalcacheflushinterval >180 > >

Re: Optional memstore flush

2009-09-23 Thread Clint Morgan
hbase.regionserver.optionalcacheflushinterval 180 Amount of time to wait since the last time a region was flushed before invoking an optional cache flush (An optional cache flush is a flush even though memcache is not at the memcache.flush.size). Default: 30 minu

Processing a large quantity of smaller XML files?

2009-09-23 Thread Andrzej Jan Taramina
I asked about the best way to process a large quantity of smaller XML files using Hadoop mapred, on the main Hadoop mailing list, and was advised that HBase would be a good alternative to handle this. More specifically, we need to start by processing about 250K XML files, each of which is in th

Re: Optional memstore flush

2009-09-23 Thread stack
Whats the option name Clint? I just checked out 0.19.0 and had a look in hbase-default to try and jog my memory but I'm not sure which setting it is (was). To force a flush you could do below in a cron job? echo "flush 'TABLENAME'" | ./bin/hbase shell ... or variations thereof. St.Ack On Wed,

Optional memstore flush

2009-09-23 Thread Clint Morgan
Is there no optional memstore flush anymore? I recall in 0.19 the memcache would flush every so-often and you could configure this period (optional cache flush interval). Digging through now, I don't see it in 0.20. Is this mechanism no longer supported? Due to a couple of mixups, our stop cluste

Re: Hbase and linear scaling with small write intensive clusters

2009-09-23 Thread stack
On Wed, Sep 23, 2009 at 9:56 AM, Molinari, Guy wrote: > Hi Stack (and others), > The reason for the small initial region size was intended to force > splits so that the load would be evenly distributed. If I could > pre-define the key ranges for the splits, then I could go to a much > larger

RE: Hbase and linear scaling with small write intensive clusters

2009-09-23 Thread Molinari, Guy
Hi Stack (and others), The reason for the small initial region size was intended to force splits so that the load would be evenly distributed. If I could pre-define the key ranges for the splits, then I could go to a much larger block size. So, say if I have 10 nodes and a 100MB data set,