Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote: 2) With that same randomWrite command line above, I would expect a resulting table with 10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows). Instead what I'm seeing is that the

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
On 2012-02-15, at 9:09 AM, Oliver Meyn (GBIF) wrote: On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote: 2) With that same randomWrite command line above, I would expect a resulting table with 10 * (1024 * 1024) rows (so 10485700 = roughly

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread yuzhihong
Oliver: Thanks for digging. Please file Jira's for these issues. On Feb 15, 2012, at 1:53 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: On 2012-02-15, at 9:09 AM, Oliver Meyn (GBIF) wrote: On 2012-02-15, at 7:32 AM, Stack wrote: On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net

Re: Improving HBase read performance (based on YCSB)

2012-02-15 Thread Bharath Ravi
Thanks a lot for the help Todd! On 14 February 2012 22:39, Todd Lipcon t...@cloudera.com wrote: Yep, definitely bound on seeks - see the 100% util, and the r/s 100. The bandwidth provided by random IO from a disk is going to be much smaller than the sequential IO you see from hdparm -Todd

Hive hbase handler (0.92.0)

2012-02-15 Thread Kaluskar, Sanjay
I am new to hbase, I can't get the Hive handler working. I downloaded the latest Hive (0.8.1) which has a handler for 0.89, and based on the instructions on https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration I recompiled hive after updating the hbase, zookeeper and guava versions in

Re: multiple partial scans in the row

2012-02-15 Thread NNever
Hi James, I'm new to HBase too. How about this: with a range of orderIds, select the first id. Step1 : set this ID as startRow, then checkout the closest id(Only fetch one), Step2:then with this fetched ID, setStartRow(fetchedID-startTimestamp), setEndRow(fetchedID-endTimestamp), Step3:

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Oliver Meyn (GBIF)
Okie: 10x # of mappers: https://issues.apache.org/jira/browse/HBASE-5401 wrong row count: https://issues.apache.org/jira/browse/HBASE-5402 Oliver On 2012-02-15, at 11:50 AM, yuzhih...@gmail.com wrote: Oliver: Thanks for digging. Please file Jira's for these issues. On Feb 15,

Re: how get() works

2012-02-15 Thread Vamshi Krishna
Thank you for your reply Doug.. that is what i wanted to know. On Tue, Feb 14, 2012 at 9:39 PM, Doug Meil doug.m...@explorysmedical.comwrote: I say basically because inside a Region there are Stores, and for each Store there are StoreFiles. For more info see:

RE: Hive hbase handler (0.92.0)

2012-02-15 Thread jcfolsom
What version of Hadoop are you running? There are many erroneous instructions for how to get this up and running all over the internet. You do not need to rebuild hive in order to get it to work. You only need to do the following: 1. It will only work if HBase is running in distributed or

Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread Stack
On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger ustaudin...@activequant.com wrote: Hi St.Ack, i don't wanna be a pain in the back, but any progress on this? You are not being a pain. I'm fumbling the mvn publishing, repeatedly. Its a little embarrassing which is why I'm not talking to

Re: strange PerformanceEvaluation behaviour

2012-02-15 Thread Stack
On Wed, Feb 15, 2012 at 1:53 AM, Oliver Meyn (GBIF) om...@gbif.org wrote: So hacking around reveals that key collision is indeed the problem.  I thought the modulo part of the getRandomRow method was suspect but while removing it improved the behaviour (I got ~8M rows instead of ~6.6M) it

Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread N Keywal
You cannot use the option -D*skipTests* ? On Wed, Feb 15, 2012 at 5:27 PM, Stack st...@duboce.net wrote: On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger ustaudin...@activequant.com wrote: Hi St.Ack, i don't wanna be a pain in the back, but any progress on this? You are not being

Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread Stack
On Wed, Feb 15, 2012 at 8:43 AM, N Keywal nkey...@gmail.com wrote: You cannot use the option -D*skipTests* ? Not on the release plugin apparently (its ignored -- I should fix it). St.Ack

Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread Daniel Iancu
I deployed it pretty easy on our internal repo by checking out the tag 0.92.0 (I assume this is the release) and *mvn deploy -DskipTests=true*. Or you can move tests to a separate module eg hbase-test and add a dependency to hbase. If all tests in hbase-test pass then you can release the

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Jean-Daniel Cryans
You would have to grep the lease's id, in your first email it was -7220618182832784549. About the time it takes to process each row, I meant client (pig) side not in the RS. J-D On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk mikael.sit...@gmail.com wrote: Please see answer inline Thanks

Problem with mutateRow() using python

2012-02-15 Thread Ezequiel Golub
Hey guys, im a hbase and python newbie, and im stuck with the mutateRow() command. I'm using Centos 5.5, python 2.6 Hbase 0.90.4-cdh3u3. This is running in a virtualbox, the original image file for the VM is the one provided by Cloudera. I've downloaded the hbase-0.90.4-cdh3u3.tar.gz file from

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
Ok, I don't have this log anymore but since the problem was reproduced in other log (which i keep), here is the grep 2012-02-08 14:13:02,970 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-6992210222685255354' does not exist

Pluggable storage?

2012-02-15 Thread Otis Gospodnetic
Hi, Was just reading about SSTable and LevelDB (http://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/), which has some HBase references.  Somebody pointed out in comments Riak supports LevelDB as a storage engine option, which made me wonder whether pluggable backend

Re: investigating replacing RDBMS with HBase based solution - spliting daily data inflow?

2012-02-15 Thread Igor Lautar
Hi, I did look more into this and have a better idea how it could be implemented. As values are looked-up by dates (and sometimes additionally by source ID), it would make sense to store each value in separate row. rowkey would be some kind of timeseries, like: timestamp_sourceID However, docs

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Andrew Purtell
Hmm... Does something like the below help? diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java index f9627ed..0cee8e3 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java +++

Scans and Bloom Filter

2012-02-15 Thread Bryan Beaudreault
Hello, We are looking at Bloom Filters and wondering if they are helpful when doing a sequential read (multi-row scan) or only when doing a Get for a single row. It logically makes sense that it would only affect (or to greater affect) getting a single row since it is a way for determining if

Re: Hbase schema design help

2012-02-15 Thread Raj N
Thanks Mikael. I will try the first solution. To answer your question, I am evaluating both RDBMS and NoSQL and trying to find best solution. On Tue, Feb 14, 2012 at 8:03 PM, Mikael Sitruk mikael.sit...@gmail.comwrote: Why don't you prefix the columns with an execution date (reverse order so

region server died after inserting big data

2012-02-15 Thread Tianwei
Hi, all, I have two region servers setup and each machine have around 32G memory. For each region server, I started it with 12G JVM limit. Recently I have one map-reduce job which will write big chunk of data into a hbase table. The job will run around 10 hours and the final hbase table will

Re: Hbase schema design help

2012-02-15 Thread T Vinod Gupta
i am really intrigued to know why you are thinking of NoSQL for this use case.. thanks On Wed, Feb 15, 2012 at 10:39 PM, Raj N objectli...@gmail.com wrote: Thanks Mikael. I will try the first solution. To answer your question, I am evaluating both RDBMS and NoSQL and trying to find best

Re: LeaseException while extracting data via pig/hbase integration

2012-02-15 Thread Mikael Sitruk
Andy hi Not sure what you mean by Does something like the below help? The current code running is pasted below, line number are sightly different than yours. It seems very close to the first file (revision a) in your extract. Mikael.S public Result[] next(final long scannerId, int nbRows)