Re: namenode failures

2011-10-03 Thread Suraj Varma
It's a little bit of a moving target. See http://www.cloudera.com/blog/2011/02/hadoop-availability/ for a summary of efforts that are in motion See http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/ for one approach that uses DRDB. --Suraj On Mon, Oct 3, 2011 at 6:25 PM, Saurabh Seh

Adjusting column value size.

2011-10-03 Thread edward choi
Hi, I have a question regarding the performance and column value size. I need to store per row several million integers. ("Several million" is important here) I was wondering which method would be more beneficial performance wise. 1) Store each integer to a single column so that when a row is cal

namenode failures

2011-10-03 Thread Saurabh Sehgal
Hi, I am looking into HBase and would like to know if there are any best practices for recovering from namenode failures. I found this while doing some research online: http://wiki.apache.org/hadoop/NameNodeFailover , but I would also like to hear opinions from the HBase community. Is configur

Re: Spaces disappear in HBase?

2011-10-03 Thread Andrew Purtell
Keys and values need to be base64 encoded in all non-binary representations, XML and JSON currently.   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > >From: Ben West >To: "user@hbase.apache.o

Re: Hbase-Hive integration performance issues

2011-10-03 Thread Matthew Tovbin
Thanks Sandy, I'll try it too! Best regards, Matthew Tovbin =) On Mon, Oct 3, 2011 at 22:36, Sandy Pratt wrote: > I've been working on this issue lately. I am beginning to deploy a > modified version of the stock HBase serde to my own clusters. For one > thing, it contains the code to pu

RE: Hbase-Hive integration performance issues

2011-10-03 Thread Sandy Pratt
I've been working on this issue lately. I am beginning to deploy a modified version of the stock HBase serde to my own clusters. For one thing, it contains the code to push down scan ranges to HBase (see jira), and I've also adapted it to read my single-cell protobuf records via reflection. O

Re: question about writing to columns with lots of versions in map task

2011-10-03 Thread Jean-Daniel Cryans
I would advise against setting the timestamps yourself and instead reduce in order to prune the versions you don't need to insert in HBase. J-D On Sat, Oct 1, 2011 at 11:05 AM, Christopher Dorner wrote: > Hi again, > > i think i solved my issue. > > I simply use the byte offset of the row curren

Re: Backing-up HBASE

2011-10-03 Thread Vinod Gupta Tankala
yeah i also asked the same question few days and ago and started exploring this blog. as you rightly said, i got similar inputs from other people as well. there is no right/good way. most ways have limitations, so you have to live with what you have. what i also learned in the process is using hdf

Re: Backing-up HBASE

2011-10-03 Thread Jean-Daniel Cryans
I saw one broken link (the Mozilla backup tool), but the rest works and the explanations are there. Currently there's not a single perfect way that's fast yet secure to use, so it would be very difficult to know which one to recommend without first knowing which tradeoffs you're willing to make.

Re: Best way to write to multiple tables in one map-only job

2011-10-03 Thread Jean-Daniel Cryans
Option a) and b) are the same since MultiTableOutputFormat internally uses multiple HTables. See for yourself: https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java Also you can set the write buffer but setting hbase.client.write.bu

Re: HBase put.heapSize()

2011-10-03 Thread Jean-Daniel Cryans
Have you looked at the code? You should also take a look at TestHeapSize where we compare the estimated size versus the heapSize and AFAIK it passes. J-D On Mon, Oct 3, 2011 at 1:18 AM, lakshmi ponnapalli wrote: > Hi, > > I noticed that put.heapSize() is bloating data by 10x of the original. > >

Re: Strange behavior on scan while writing

2011-10-03 Thread Lars
Might be related to HBASE-4335 Placido Revilla schrieb: >Sorry, resent because I messed up the previous mail. > >Hi, > >we are experiencing a strange behavior in some tests we are currently >performing. What we are seeing is that scans on a table that is being >written to at the same time someti

HBase put.heapSize()

2011-10-03 Thread lakshmi ponnapalli
Hi, I noticed that put.heapSize() is bloating data by 10x of the original. If I add up the bytes in rowkey, family, columName and columnValue of my data in Put object, if it constitutes 1 MB, put.heapSize() for the same is 10 MB. Wondering which component exactly is causing this overhead and whet

Re: Strange behavior on scan while writing

2011-10-03 Thread Placido Revilla
Sorry, resent because I messed up the previous mail. Hi, we are experiencing a strange behavior in some tests we are currently performing. What we are seeing is that scans on a table that is being written to at the same time sometimes end prematurely, with no error. This seems to be heavily depen

Protecting NN & JT UI with password

2011-10-03 Thread Shahnawaz Saifi
Hi, I am looking to know, how to protect the Hadoop Web UIs running on ports 50030, 50070 with password including HMASTER/60010? -- Thanks, Shah

RE: storefileIndexsize

2011-10-03 Thread Steinmaurer Thomas
Hi! Thanks for your comments and the link. We will have a mix of bulk processing via Map/Reduce and random reads through the RowKey via a Thrift/Java API client. Thanks, Thomas -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent:

Re: Percentile calculation using HBase Coprocessors

2011-10-03 Thread Mayuresh
Any one done this kind of stuff before? On Wed, Sep 28, 2011 at 5:44 PM, Mayuresh wrote: > Hi, > > I am trying to find out an algorithm which could fit the coprocessor > architecture to find out pth-percentiles value from the values > distributed among the hbase regions. Has anyone worked on some