Re: Issues/Problems concerning hbase data insertion

2009-09-21 Thread Jonathan Gray
Guillaume, Thanks for providing more detail. So, as I understand it, you are already storing the URL -> Group relationship (1:1), but you need to store Group -> URLs relationship (1:N). My solution would be to have a "urls" family in your GROUPS table. And for each URL within a group, you w

RE: Issues/Problems concerning hbase data insertion

2009-09-21 Thread guillaume.viland
Hello, I recently posted concerning some issues about "big" data insertion. I would like to thank all the people who gave very interesting answers. I would like to precise one point as an answer to this question. > What exactly does your data look like / what are you trying to index? > IndexedTab

Re: Issues/Problems concerning hbase data insertion

2009-09-16 Thread Andrew Purtell
Bonjour Guillaume, Your issue #2 looks like two separate issues: 2a) Memcache flusher gating. This is better in 0.20.0. I encourage you to upgrade for this and any number of other reasons. 2b) HDFS-127. See https://issues.apache.org/jira/browse/HDFS-127. Upgrade to HBase 0.20.0 or patc

Re: Issues/Problems concerning hbase data insertion

2009-09-16 Thread stack
I took a look at your attached configuration files. You have very little customization in them. Given you are running 0.19.x, you are missing some critical configuration. See http://wiki.apache.org/hadoop/Hbase/Troubleshooting. In particular, #5, #6, and #7. What about file descriptor count?

Re: Issues/Problems concerning hbase data insertion

2009-09-16 Thread stack
On Wed, Sep 16, 2009 at 8:35 AM, wrote: ... > Our configuration is hadoop 0.19.1 and hbase 0.19.3, both > hadoop-default/site.xml and hbase-default/site.xml are attached, 15 nodes > (16 or 8 Go RAM and 1,3To disk, linux kernel 2.6.24-standard, java version > "1.6.0_12"). > As per Jon, please use

Re: Issues/Problems concerning hbase data insertion

2009-09-16 Thread Jonathan Gray
First, I would recommend you try upgrading to HBase 0.20.0. There are a number of significant improvements to performance and stability. Also, you have plenty of memory, so give more of it to the HBase Regionserver (especially if you upgrade to 0.20, give HBase 4GB or more) and you will see s