Guillaume,
Thanks for providing more detail.
So, as I understand it, you are already storing the URL -> Group
relationship (1:1), but you need to store Group -> URLs relationship (1:N).
My solution would be to have a "urls" family in your GROUPS table. And
for each URL within a group, you w
Hello, I recently posted concerning some issues about "big" data insertion. I
would like to thank all the people who gave very interesting answers.
I would like to precise one point as an answer to this question.
> What exactly does your data look like / what are you trying to index?
> IndexedTab
Bonjour Guillaume,
Your issue #2 looks like two separate issues:
2a) Memcache flusher gating. This is better in 0.20.0. I encourage you to
upgrade for this and any number of other reasons.
2b) HDFS-127. See https://issues.apache.org/jira/browse/HDFS-127. Upgrade to
HBase 0.20.0 or patc
I took a look at your attached configuration files. You have very little
customization in them. Given you are running 0.19.x, you are missing some
critical configuration. See
http://wiki.apache.org/hadoop/Hbase/Troubleshooting. In particular, #5, #6,
and #7. What about file descriptor count?
On Wed, Sep 16, 2009 at 8:35 AM, wrote:
...
> Our configuration is hadoop 0.19.1 and hbase 0.19.3, both
> hadoop-default/site.xml and hbase-default/site.xml are attached, 15 nodes
> (16 or 8 Go RAM and 1,3To disk, linux kernel 2.6.24-standard, java version
> "1.6.0_12").
>
As per Jon, please use
First, I would recommend you try upgrading to HBase 0.20.0. There are a
number of significant improvements to performance and stability. Also,
you have plenty of memory, so give more of it to the HBase Regionserver
(especially if you upgrade to 0.20, give HBase 4GB or more) and you will
see s