Re: fast way to do random getRowOrAfter reads

2012-08-29 Thread Ferdy Galema
Ran some tests and it seems that single-use Scanner requests are not that bad after all. I guess the important part is to set row caching to 1 and correctly close every scanner afterwards. On Mon, Aug 27, 2012 at 4:33 PM, Ferdy Galema wrote: > I want to do a lot of random reads, but I need

fast way to do random getRowOrAfter reads

2012-08-27 Thread Ferdy Galema
I want to do a lot of random reads, but I need to get the first row after the requested key. I know I can make a scanner every time (with a specified startrow) and close it after a single result is fetched, but this seems like a lot overhead. Something like HTable's getRowOrBefore method, but then

Re: HBase MapReduce - Using mutiple tables as source

2012-08-06 Thread Ferdy Galema
Hi, Perhaps you want to take a look at MultipleInputs. I'm not sure if it works for TableInputFormat, but at least you can use it for inspiration. Ferdy. On Mon, Aug 6, 2012 at 3:02 PM, Amlan Roy wrote: > Hi, > > If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support > mul

Re: silently aborted scans when using hbase.client.scanner.max.result.size

2012-07-26 Thread Ferdy Galema
s set to 1241. The remaining size would still be > higher than zero and so would the countdown (its value would be 1). So > it's gonna try to get the nextScanner. If you have just one region it > would stop there. > > But that would be the case if you have 1 region and did not

silently aborted scans when using hbase.client.scanner.max.result.size

2012-07-25 Thread Ferdy Galema
I was experiencing aborted scans on certain conditions. In these cases I was simply missing so many rows that only a fraction was inputted, without warning. After lots of testing I was able to pinpoint and reproduce the error when scanning over a single region, single column family, single store fi

simple inputformat to ignore lease and timeout exceptions

2012-07-16 Thread Ferdy Galema
Some mapred jobs running scans on our HBase could not succeed because of the dreaded LeaseException or ScannerTimeoutException, even with hbase.client.scanner.caching set to 1 and long timeout properties. Mind you that no row is ever bigger than 5MB (sure it's bigger then most use cases but still i

Re: Mixing Puts and Deletes in a single RPC

2012-07-06 Thread Ferdy Galema
Does HBASE-3584 also allow buffering of the mutations? (With the 0.90 branch it is only possible to buffer Put operations). On Fri, Jul 6, 2012 at 1:50 AM, lars hofhansl wrote: > I'll let the Cloudera folks speak, but I has assumed CDH4 would include > HBase 0.94. > > -- Lars > > > > ___

Re: gc pause killing regionserver

2012-03-21 Thread Ferdy Galema
Sure: https://issues.apache.org/jira/browse/HBASE-5607 @Marcos: Thanks, that are useful links. On Tue, Mar 20, 2012 at 6:27 PM, Stack wrote: > 2012/3/20 Ferdy Galema : > > A nice solution server-side would be to dynamically adjust the > > scanner-caching value when the response

Re: gc pause killing regionserver

2012-03-20 Thread Ferdy Galema
at org.apache.hadoop.hbase.ipc.ByteBufferOutputStream.(ByteBufferOutputStream.java:44) at org.apache.hadoop.hbase.ipc.ByteBufferOutputStream.(ByteBufferOutputStream.java:37) Ferdy. 2012/3/16 Ferdy Galema > CPU resources never was a problem, munin shows there is enough idle time. &

Re: gc pause killing regionserver

2012-03-16 Thread Ferdy Galema
n my > > experience it's not very likely. > > > > Setting swappiness to 0 just means it's not going to page anything out > > until it really needs to do it, meaning it's possible to swap. The > > only way to guarantee no swapping whatsoever is g

Re: gc pause killing regionserver

2012-03-06 Thread Ferdy Galema
Correction 2300300300 is the current heap size for regionservers (I misread another process.) On Tue, Mar 6, 2012 at 11:37 AM, Ferdy Galema wrote: > Thanks for the replies. That is a lot of useful information. I admit that > I'm still running a bit behind when it comes to truly under

Re: gc pause killing regionserver

2012-03-06 Thread Ferdy Galema
ess size. > > This seems to occur most often in lightly loaded installations. It > might > > be interesting to have a look around your processes with pmap or some > tool > > like that. In many situations, you'll probably be able to account for > each > > Java heap

gc pause killing regionserver

2012-03-03 Thread Ferdy Galema
Hi, I'm running regionservers with 2GB heap and following tuning options: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:MaxGCPauseMillis=100 A regionserver aborted (YouAreDeadException) and this was printed in

Re: How to efficiently join HBase tables?

2011-05-31 Thread Ferdy Galema
As far as I can tell there is not yet a build-in mechanism you can use for this. You could implement your own InputFormat, something like MultiTableInputFormat. If you need different map functions for the two tables, perhaps something similar to Hadoop's MultipleInputs should do the trick. On

Re: Harvesting empty regions

2011-05-31 Thread Ferdy Galema
You can use the merge tool to combine adjacent regions. It requires a bit of manual work because you need to specify the regions by hand. The cluster also needs to be offline (I recommend to keep zookeeper running though). Check if merging succeeded with the hbck tool. There are some jira issu

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

2011-05-29 Thread Ferdy Galema
2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at / 127.0.0.1:60020 could not be reached after 1 tries, giving up. This means the regionserver could not be reached. Check the regionserver logs to see why. Perhaps it failed to start? Is the HDFS fully functional? Ferdy. On 05

Re: LZO Compression

2011-05-18 Thread Ferdy Galema
Not out of the box. I use the following resource for packaging lzo with the cloudera release: https://github.com/toddlipcon/hadoop-lzo-packager On 05/18/2011 08:35 AM, Pete Haidinyak wrote: Does the Cloudera VM have LZO data compression available? If not, since its a 32 bit system what's the be

Re: What is the recommended number of zookeeper server on 11 nodes cluster

2011-05-11 Thread Ferdy Galema
A rowcounter is a scan job, so you should use hbase.client.scanner.caching for better scan performance. (Depending on your value sizes, set to 1000 or something like that). For us, 1 zookeeper is able to manage our 15node cluster perfectly fine. On 05/11/2011 02:40 PM, byambajargal wrote: Hel

hbck lists a dead server, but it is only restarted

2011-04-08 Thread Ferdy Galema
With HBase 0.90.1 (CDH3B4), whenever a regionserver is shutdown because of an error, I'll restart it using start-hbase.sh. After that the node is up and running again, but the hbck tool still lists a dead server. (Though it does list the correct amount of live servers.) Is this a bug? Ferdy

Re: Compressing values before inserting them

2011-04-05 Thread Ferdy Galema
Thanks. This seems very useful. Just to add, in terms of compression/decompression speed, we're having very good performance with the lzf codec. It is Apache licensed and pure Java code with no external dependancies. See https://github.com/ning/compress/ Ferdy On 04/05/2011 12:55 AM, Jean-Dan

Re: importing dataset, some problems and performance issues

2011-03-22 Thread Ferdy Galema
iel Cryans wrote: I feel like I'm not understanding your need correctly, could you elicit what you think HBase you should be doing in order to give you a better life? Thx, J-D On Mon, Mar 21, 2011 at 5:22 PM, Ferdy Galema wrote: These methods are certainly helpful, whenever I ever nee

Re: importing dataset, some problems and performance issues

2011-03-21 Thread Ferdy Galema
min.html#createTable(org.apache.hadoop.hbase.HTableDescriptor, byte[][]) - use the bulk loader: http://hbase.apache.org/bulk-loads.html J-D On Fri, Mar 18, 2011 at 5:46 AM, Ferdy Galema wrote: On second thought, removing the obsolete regionfolders was easily done by hand. This way I can merge regions

Re: importing dataset, some problems and performance issues

2011-03-18 Thread Ferdy Galema
On second thought, removing the obsolete regionfolders was easily done by hand. This way I can merge regions with the merge tool. However, I'm still bothered by the (performance) issues I ran into. Any advice would be helpful. On 03/18/2011 11:06 AM, Ferdy Galema wrote: After export

importing dataset, some problems and performance issues

2011-03-18 Thread Ferdy Galema
After exporting a tabel of about 30M rows (each row has about 500 columns, totalling 400GB of data), there were several issues when trying to import it again on an empty HBase. (HBase version is 0.90.1-CDH3B4, deployed on 15 nodes. LZO is enabled.) The reason for this export/import is to both

Re: after upgrade, fatal error in regionserver compacter, LzoCompressor, "AbstractMethodError"

2011-03-17 Thread Ferdy Galema
=LZO+Compression St.Ack On Wed, Mar 16, 2011 at 10:28 AM, Ferdy Galema wrote: We upgraded to Hadoop 0.20.1 and Hbase 0.90.1 (both CDH3B4). We are using 64bit machines. Starting goes great, only right after the first compaction we get this error: Uncaught exception in service thread regionserver6

after upgrade, fatal error in regionserver compacter, LzoCompressor, "AbstractMethodError"

2011-03-16 Thread Ferdy Galema
We upgraded to Hadoop 0.20.1 and Hbase 0.90.1 (both CDH3B4). We are using 64bit machines. Starting goes great, only right after the first compaction we get this error: Uncaught exception in service thread regionserver60020.compactor java.lang.AbstractMethodError: com.hadoop.compression.lzo.Lz

sometimes more than 1 value stored, even though VERSIONS is 1

2010-07-29 Thread Ferdy Galema
Using Hbase 0.20.5 with Hadoop CDH2 0.20.1+169.89 I noticed something very strange. When overwriting a certain column in a column family with 1 VERSIONS, and removing that value later (for example after several minutes) the older value still shows when listing all the KeyValues of the row. Al