I'm guessing HBASE-4222 is not in that version of CDH HBase?
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via
Tom White)
- Original Message -
> From: Ted Yu
> To: user@hbase.apache.org
> Cc:
> Sent: Tuesday, February 7, 2012
Thanks Harsh
I had no problem with hbase-90.2 and hadoop-0.20.
Before move to hbase-92.0 I am just confirming that
it works perfectly.
On Tue, Feb 7, 2012 at 1:06 PM, Harsh J wrote:
> Saumita,
>
> You do not necessarily need to change Hadoop version with HBase
> upgrades (Or at least not for 0.9
Hi All,
We are happy to announce release of Kundera 2.0.5.
Kundera is a JPA 2.0 based, Object-Datastore Mapping Library for NoSQL
Datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun. It currently supports Cassandra, HBase,
MongoDB and relational d
Saumita,
You do not necessarily need to change Hadoop version with HBase
upgrades (Or at least not for 0.92 yet).
0.92 will work just fine with CDH3 (Which carries 0.20-append). Are
you facing any specific issues?
The guidelines at http://hbase.apache.org/book.html#hadoop would still
apply for 0
Sorry for the last line
Is there anyone who are using hbase-92.0 with CDH3 ? Should I continue
with that ?
On Tue, Feb 7, 2012 at 11:37 AM, Saumitra Chowdhury <
saumi...@smartitengineering.com> wrote:
> Dear all .
>
> We are going to setup our hbase cluster with hadoop . We were in test
> fo
On Mon, Feb 6, 2012 at 4:47 PM, Bryan Keller wrote:
> I increased the max region file size to 4gb so I should have fewer than 200
> regions per node now, more like 25. With 2 column families that will be 50
> memstores per node. 5.6gb would then flush files of 112mb. Still not close to
> the me
I increased the max region file size to 4gb so I should have fewer than 200
regions per node now, more like 25. With 2 column families that will be 50
memstores per node. 5.6gb would then flush files of 112mb. Still not close to
the memstore limit but shouldn't I be much better off than before?
Good but...
Keep in mind that if you just increase max filesize and memstore size
without changing anything else then you'll be in the same situation
except with 16GB it'll take just a bit more time to get there.
Here's the math:
200 regions of 2 families means 400 memstores to fill. Assuming a
Yes, insert pattern is random, and yes, the compactions are going through the
roof. Thanks for pointing me in that direction. I am going to try increasing
the region max filesize to 4gb (it was set to 512mb) and the memstore flush
size to 512mb (it was 128mb). I'm also going to increase the hea
Hi All,
We are happy to announce release of Kundera 2.0.5.
Kundera is a JPA 2.0 based, Object-Datastore Mapping Library for NoSQL
Datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun. It currently supports Cassandra, HBase,
MongoDB and relational da
Ok this helps, we're still missing your insert pattern regarding but I
bet it's pretty random considering what's happening to your cluster.
I'm guessing you didn't set up metrics else you would have told us
that the compaction queues are through the roof during the import, but
at this point I'm pr
This is happening during heavy update. I have a "wide" table with around 4
million rows that have already been inserted. I am adding billions of columns
to the rows. Each row can have 20+k columns.
I perform the updates in batch, i.e. I am using the HTable.put(List) API.
The batch size is 1000
So I restart one of the data nodes and everything continues to work just fine even though the local
one is no longer valid. Additionally I can restart n-1 nodes without any problem and hbase
continues to work. However as soon as I restart the last data node RSs start dying. hbck and fsck
say
This is the normal behavior of the sync-API (that when the first DN in
pipeline fails, the whole op is failed), correct me if am wrong.
The rule here I think was that you do not want RSes to go switch over
writing to a remote DN cause the first one in the pipeline (always the
local one) failed. He
What would "hadoop fsck /" that type of problem if there really were no nodes with that data? The
worst I've seen is: Target Replicas is 4 but found 3 replica(s).
~Jeff
On 2/6/2012 12:45 PM, Ted Yu wrote:
In your case Error Recovery wasn't successful because of:
All datanodes 10.49.29.92:500
I've been able to reproduce this on multiple clusters. I'm basically doing a rolling restart of
data nodes with 1 every 5-10+ minutes. However the region servers will just die. "hadoop fsck /"
shows it is healthy, the web interface says all the data nodes are up, and region servers logs seem
q
The number of regions is the first thing to check, then it's about the
actual number of blocks opened. Is the issue happening during a heavy
insert? In this case I guess you could end up with hundreds of opened
files if the compactions are piling up. Setting a bigger memstore
flush size would defin
In your case Error Recovery wasn't successful because of:
All datanodes 10.49.29.92:50010 are bad. Aborting...
On Mon, Feb 6, 2012 at 10:28 AM, Jeff Whiting wrote:
> I was increasing the storage on some of my data nodes and thus had to do a
> restart of the data node. I use cdh3u2 and ran "/etc
I am trying to resolve an issue with my cluster when I am loading a bunch of
data into HBase. I am reaching the "xciever" limit on the data nodes. Currently
I have this set to 4096. The data node is logging "xceiverCount 4097 exceeds
the limit of concurrent xcievers 4096". The regionservers even
If you didn't configure anything more than the heap, PE will by
default create a table with 1 region and a low (albeit default)
memstore size. This means it's spending its time waiting on splits and
it's recompacting your data all the time which wastes a lot of iops.
You didn't tell use which vers
At first, you can check how you configure the region servers, i.e. the host
names of your region servers.
It is in file regionservers. Then you can check which host name is not
properly configured in the /etc/hosts or DNS.
On Mon, Feb 6, 2012 at 10:30 AM, devrant wrote:
>
> Thanks for the res
Thanks for the response Jimmy. Do you know if this a error on the server side
(/etc/hosts etc) or config files for hbase? (ie. conf/hbase-site.xml etc).
Jimmy Xiang wrote:
>
> It may not be null actually. It is most likely because the hostname
> cannot
> be resolved to an IP address.
>
> Than
I was increasing the storage on some of my data nodes and thus had to do a restart of the data
node. I use cdh3u2 and ran "/etc/init.d/hadoop-0.20-datanode restart" (I don't think this is a cdh
problem). Unfortunately doing the restart caused region servers to go offline. Is this expected
beha
It may not be null actually. It is most likely because the hostname cannot
be resolved to an IP address.
Thanks,
Jimmy
On Mon, Feb 6, 2012 at 10:10 AM, devrant wrote:
>
> I received this error below...does anyone know why the hostname is "null"?
>
> HTTP ERROR 500
>
> Problem accessing /master
I received this error below...does anyone know why the hostname is "null"?
HTTP ERROR 500
Problem accessing /master-status. Reason:
hostname can't be null
Caused by:
java.lang.IllegalArgumentException: hostname can't be null
at java.net.InetSocketAddress.(InetSocketAddress.java:12
On Mon, Feb 6, 2012 at 8:58 AM, Jon Bender wrote:
> When you say it'll sort regions by you, does that mean I'll need to
> identify the regions before dividing up the maps? Or just deal with the
> fact that multiple maps might read from the same regionserver?
>
If you do a multiget on N rows, int
You can try turning on verbose garbage collection logs and see if the slow
times correspond to a GC pause. Cloudera has a series of blog posts regarding
GC pauses in HBase and how to avoid
them: http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffer
Thanks for the responses!
>What percentage of total data is the 300k new rows?
A constantly shrinking percentage--we may retain upwards of 5 years of data
here, so running against the full table will get very expensive going
forward. I think the second approach sounds best.
>If you have the lis
On Sun, Feb 5, 2012 at 8:56 PM, Jon Bender wrote:
> The two alternatives I am exploring are
>
> 1. Running a TableInputFormat MR job that filters for data added in the
> past day (Scan on the internal timestamp range of the cells)
You'll touch all your data when you do this.
What percentage
1. TableInputFormat splits the rows by regionLocation. with multiGet you should
do it yourself
2. Get is Scan, multiGet means multiScan ? (not sure)
3. with Scan, you can use the feature of batch & caching
Gill
-- Original --
From: "Jon Bender";
Date: Mon, Feb
It sounds to me that you are better off using Hive. HBase is suitable for
real time access to specific records. If you want to do batch processing
(Map Reduce) on your data, like you said yourself, then Hive removes all
the HBase overhead and gives you a powerful query language to search
through yo
Hi,
I'm tryng to optimize a hbase cluster (on hdfs) with the test randomWrite. I
have 7 nodes: 1 zookeeper/name/hbase-master/jobtracker and 6
region/data/tasktrackers. Each with 1 disk, 16G memory, 2 x 4 cores. I know
that I really should have more disks but for the time being I'm trying to do
Not easy to visualize...
Assuming your access path to the data is based on students, then you would
serialize your college data as a column in the student's table.
You need to forget your relational way of thinking.
You need to think not just in terms of data, but how you intend to use the
d
Hi all
Recently I have upgraded my cluster from Hbase 0.90.1 to 0.90.4 (using
cloudera from cdh3u0 to cdh3u2)
Everything was ok till I ran pig extract on the new cluster, from the old
cluster everything worked well.
Now each time i run the extract in conjunction to other work performed on
the clus
34 matches
Mail list logo