Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
Here's what I don't get -- how is this different than if I allocated a different table for each separate value of the leading field? If I did that and used the second field as the leading prefix instead, I know no one would argue that it's a key that won't distribute well. I don't plan on doing t

About Reloading Coprocessors

2012-09-04 Thread Aaron Wong
Hello all, I have an endpoint coprocessor running in HBase that I would like to modify. I previously loaded this coprocessor via the shell, without having to restart HBase. However, after some experimentation I have not found any way to reload a new version of the coprocessor without restarting

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Michael Segel
Uhm... This isn't very good. In terms of inserting, you will hit a single or small subset of regions. This may not be that bad if you have enough data and the rows not all inserting in to the same region. since you're hitting an index to pull rows one at a time, you could do this... if you

Fwd: Extremely slow when loading small amount of data from HBase

2012-09-04 Thread 某因幡
+HBase users. -- Forwarded message -- From: Dmitriy Ryaboy Date: 2012/9/4 Subject: Re: Extremely slow when loading small amount of data from HBase To: "u...@pig.apache.org" I think the hbase folks recommend something like 40 regions per node per table, but I might be misrememb

Re: Is there a way to replicate root and meta table in HBase?

2012-09-04 Thread Gen Liu
On 9/4/12 3:07 PM, "Stack" wrote: >On Tue, Sep 4, 2012 at 2:52 PM, Gen Liu wrote: >> We are running into a case that if the region server that serves meta >>table is down, all request will timeouts because region lookup is not >>available. > >Only requests to .META. fail (and most of the time,

Re: Is there a way to replicate root and meta table in HBase?

2012-09-04 Thread Stack
On Tue, Sep 4, 2012 at 2:52 PM, Gen Liu wrote: > We are running into a case that if the region server that serves meta table > is down, all request will timeouts because region lookup is not available. Only requests to .META. fail (and most of the time, .META. info is cached so should be relativ

Re: Is there a way to replicate root and meta table in HBase?

2012-09-04 Thread Stas Maksimov
Just today I saw this mentioned in the docs. They said they deliberately don't replicate those, otherwise "it gets very messy". Stas On Tue, Sep 4, 2012 at 10:52 PM, Gen Liu wrote: > Hi, > > We are running into a case that if the region server that serves meta > table is down, all request will

Is there a way to replicate root and meta table in HBase?

2012-09-04 Thread Gen Liu
Hi, We are running into a case that if the region server that serves meta table is down, all request will timeouts because region lookup is not available. At this time, master is also not able to update meta table. It seems that regions that serve root and meta are the single point of failure i

Re: hbase hbck -fixMeta error:RejectException

2012-09-04 Thread Ted Yu
Looks like you need the fix from HBASE-6018 On Mon, Sep 3, 2012 at 7:37 PM, abloz...@gmail.com wrote: > [zhouhh@h185 ~]$ hbase hbck -fixMeta > ... > Number of Tables: 1731 > Number of live region servers: 4 > Number of dead region servers: 0 > Master: h185,61000,1346659732168 > Number of backup m

Re: hbase hbck -fixMeta error:RejectException

2012-09-04 Thread Jonathan Hsieh
Is there anymore stack exception information? Also what version is this? Jon. On Mon, Sep 3, 2012 at 7:37 PM, abloz...@gmail.com wrote: > [zhouhh@h185 ~]$ hbase hbck -fixMeta > ... > Number of Tables: 1731 > Number of live region servers: 4 > Number of dead region servers: 0 > Master: h185,6100

Re: Fixing badly distributed table manually.

2012-09-04 Thread David Koch
Hello, Thank you for your replies. We are using CDH4 HBase 0.92. Good call on the web interface. The port is blocked so I never really got a chance to test it. As far as manual re-balancing is concerned I will check the book. /David On Tue, Sep 4, 2012 at 5:34 PM, Guillaume Gardey < guillaume.g

Re: batch update question

2012-09-04 Thread Christian Schäfer
Hi Lin, checkout the slides about high update workloads and HBaseHUT at: http://blog.sematext.com/?s=hbasehut Maybe you could ask Alex Baranau about details here on the list to share it. regards Chris Von: Lin Ma An: user@hbase.apache.org; syrious3...@yah

Re: batch update question

2012-09-04 Thread Stack
On Sun, Sep 2, 2012 at 2:13 AM, Lin Ma wrote: > Hello guys, > > I am reading the book "HBase, the definitive guide", at the beginning of > chapter 3, it is mentioned in order to reduce performance impact for > clients to update the same row (lock contention issues for automatic > write), batch upd

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Stack
On Tue, Sep 4, 2012 at 8:17 AM, Ioakim Perros wrote: > Hello, > > I would be grateful if someone could shed a light to the following: > > Each M/R map task is reading data from a separate region of a table. > From the jobtracker 's GUI, at the map completion graph, I notice that > although data re

Re: connection error to remote hbase node

2012-09-04 Thread Stack
On Sun, Sep 2, 2012 at 6:38 AM, Richard Tang wrote: > Hi, I have a connection problem on setting up hbase on remote node. The > ``hbase`` instance is on a machine ``nodeA``. when I am trying to use hbase > on ``nodeA`` from another machine (say ``nodeB``), it complains > >> Session 0x0 for server

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
*How does the data flow in to the system? One source at a time?* Generally, it will be one source at a time where these rows are index entries built from MapReduce jobs *The second field. Is it sequential?* No, the index writes from the MapReduce jobs should dump some relatively small number of ro

Re: example - hbase-site.xml - fully distributed

2012-09-04 Thread Elliott Clark
There are serveral different ways. Running jps as the user that hbase should start as will show you what's running. You should be able to see HMaster or HRegionServer running. If things are running well the master should have a status http server up. Going to that should tell you that things are

Re: example - hbase-site.xml - fully distributed

2012-09-04 Thread Jean-Marc Spaggiari
Here is mine. But I can't garanteed that it's correct... hbase.rootdir hdfs://node3:9000/hbase The directory shared by RegionServers. hbase.cluster.distributed true The mode the cluster will be in. Possible values are false: standalone and pseudo-distr

example - hbase-site.xml - fully distributed

2012-09-04 Thread Igor Muzetti
hello! would like an example of the file *hbase-site.xml* configured for a fully distributed. carefully. -- [image: terraLab logo] *Igor Muzetti Pereira * TerraLAB - Earth System Modelling and Simulation Laboratory Computer Science Department, UFOP - Federal University of Ouro Preto *Campus Univ

Re: example - hbase-site.xml - fully distributed

2012-09-04 Thread Igor Muzetti
how do to know that the hbase is running correctly? 2012/9/4 Igor Muzetti > hello! would like an example of the file *hbase-site.xml* configured for > a fully distributed. > carefully. > > -- > [image: terraLab logo] *Igor Muzetti Pereira * > TerraLAB - Earth System Modelling and Simulation Labo

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Michael Segel
Eric, So here's the larger question... How does the data flow in to the system? One source at a time? The second field. Is it sequential? If not sequential, is it going to be some sort of incremental larger than a previous value? (Are you always inserting to the left side of the queue? How a

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
Longer term .. what's really going to happen is more like I'll have a first field value of 1, 2, and maybe 3. I won't know 4 - 10 for a while and the *second *value after each initial value will be, although highly unique, relatively exclusive for a given first value. This means that even if I di

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Michael Segel
I think you have to understand what happens as a table splits. If you have a composite key where the first field has the value between 0-9 and you pre-split your table, you will have all of your 1's going to the single region until it splits. But both splits will start on the same node until th

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
You're the man Jean-Marc .. info is much appreciated. On Tue, Sep 4, 2012 at 1:22 PM, Jean-Marc Spaggiari wrote: > Hi Eric, > > Yes you can split and existing region. You can do that easily with the > web interface. After the split, at some point, one of the 2 regions > will be moved to another

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Jean-Marc Spaggiari
Hi Eric, Yes you can split and existing region. You can do that easily with the web interface. After the split, at some point, one of the 2 regions will be moved to another server to balanced the load. You can also move it manually. JM 2012/9/4, Eric Czech : > Thanks again, both of you. > > I'll

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
Thanks again, both of you. I'll look at pre splitting the regions so that there isn't so much initial contention. The issue I'll have though is that I won't know all the prefix values at first and will have to be able to add them later. Is it possible to split regions on an existing table? Or i

Re: connection error to remote hbase node

2012-09-04 Thread Richard Tang
Thanks, Harsh J, but I have checked /etc/ dir and hbase's root directory, there is no zoo.cfg file present in both places... I am aware that hbase client will first check zookeeper before contacting hbase itself (for -ROOT- table and .META table ...). is there anyway - to test if zookeeper can be

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Ioakim Perros
Jerry thank you very much for the links. Regards, Ioakim On 09/04/2012 08:05 PM, Jerry Lam wrote: Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Jerry Lam
Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_desc

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Ioakim Perros
I understood that locking is at a row-level (and that my initial hypothesis is hopefully false) , but I was trying to clarify if there is some job configuration I am missing. Perhaps you 're right and I am misinterpreting the jobtracker's map completion graph. Thanks for answering. On 09/04/2

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Michael Segel
I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Ioakim Perros
Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Jerry Lam
Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the "Learning HBase Internals" by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Ioakim Perros
Thank you very much for responding, but this was not exactly what I was looking for. I have understood the splitting process when M/R jobs read from HBase tables (that each M/R task reads from exactly one region). What I would like to clarify if possible is, if there is indeed some "locking"

Re: Fixing badly distributed table manually.

2012-09-04 Thread Guillaume Gardey
Hello, > a) What is the easiest way to get an overview of how a table is distributed > across regions of a cluster? I guess I could search .META. but I haven't > figured out how to use filters from shell. > b) What constitutes a "badly distributed" table and how can I re-balance > manually? > c) I

Re: Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Doug Meil
Hi there- Yes, there is an input split for each region of the source table of a MR job. There is a blurb on that in the RefGuide... http://hbase.apache.org/book.html#splitter On 9/4/12 11:17 AM, "Ioakim Perros" wrote: >Hello, > >I would be grateful if someone could shed a light to the fo

Help with troubleshooting the HBase replication setup

2012-09-04 Thread Stas Maksimov
Hi there, I'm trying to set up replication in master-slave mode between two clusters, and when this works set up master-master replication. Following the replication FAQ step-by-step, but I can't make it work and have no idea how to troubleshoot. There seem to be given only one way to find out whe

Re: batch update question

2012-09-04 Thread Lin Ma
Hi Christian, I read through the link you referred. It seems HBaseHUT is exactly the solution I am looking for. Before making the technology choice decision, I want to learn a bit more about its internal design and the general idea of HBaseHUT of how throughput of write is improved. From the discu

RE: Fixing badly distributed table manually.

2012-09-04 Thread Pablo Musa
> a) What is the easiest way to get an overview of how a table is distributed > across regions of a cluster? I usually see by the web interface (host:60010). Click on a table and scroll down. There will be a region count of this table across the cluster. > b) What constitutes a "badly distribut

Reading in parallel from table's regions in MapReduce

2012-09-04 Thread Ioakim Perros
Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - li

Re: Fixing badly distributed table manually.

2012-09-04 Thread Ted Yu
Can you tell us the version of HBase you're using. The following feature (per table region balancing) isn't in 0.92.x: https://issues.apache.org/jira/browse/HBASE-3373 On table.jsp page, you should see region count per region server. Cheers On Tue, Sep 4, 2012 at 7:56 AM, David Koch wrote: >

Fixing badly distributed table manually.

2012-09-04 Thread David Koch
Hello, A couple of questions regarding balancing of a table's data in HBase. a) What is the easiest way to get an overview of how a table is distributed across regions of a cluster? I guess I could search .META. but I haven't figured out how to use filters from shell. b) What constitutes a "badly