Re: Avro Gateway - "Production Quality"

2010-12-29 Thread Jeff Hammerbacher
Hey Michael, Wow, very cool. I am a bit embarrassed that I didn't get pyhbase in a better state before you got to hacking on it, but super happy to have you join the fun. I've opened a ticket at https://issues.apache.org/jira/browse/AVRO-721 to discuss the inclusion of your Tornado client in Avro

Re: Long GC pause question

2010-12-29 Thread Stack
OK. There is nothing enlightening there. There didn't seem to be master log in the attachment? I should have asked you include that. I see that one server thought the filesystem had gone away. Did you pull HDFS out from under it at around this time per chance? St.Ack On Tue, Dec 28, 2010 at 1

Re: HBase Bulk Load script

2010-12-29 Thread Todd Lipcon
Also, docs patches welcome :) On Tue, Dec 28, 2010 at 1:29 AM, Lars George wrote: > Hi Marc, > > Actually, HFileOutputFormat is what you need to target, the below is > for other file formats and their compression. HFOF has support for > compressing the data as it is written, so either add this t

Re: Bulk load questions

2010-12-29 Thread Nanheng Wu
I am trying a different approach right now: the MR job I am running uses the identity mapper and a custom comparator to randomize the keys (input keys are sorted). The inserts happen in the reducer, which does very little work. My job is still running very slowly. All my nodes seem to be under util

RE: What is the fastest way to get a large amount of data into the Hadoop HDFS file system (or Hbase)?

2010-12-29 Thread Hiller, Dean (Contractor)
I wonder if having linux mount hdfs would help here so as people put the file on your linux /hdfs directory, it was actually writing to hdfs and not linux ;) (yeah, you still have that one machine bottle neck as the files come in unless that can be clustered too somehow). Just google mounting hdfs

Re: What is the fastest way to get a large amount of data into the Hadoop HDFS file system (or Hbase)?

2010-12-29 Thread Ted Dunning
The problem there is that HDFS isn't a first class file system. That means that the nice and easy ways of mounting will lead to problems (notably NFS which maintains no state will require random write capabilities). On Wed, Dec 29, 2010 at 1:16 PM, Hiller, Dean (Contractor) < dean.hil...@broadrid

RE: What is the fastest way to get a large amount of data into the Hadoop HDFS file system (or Hbase)?

2010-12-29 Thread Hiller, Dean (Contractor)
Thanks of the info, missed that at the bottom of that page. Dean -Original Message- From: Fox, Kevin M [mailto:kevin@pnl.gov] Sent: Wednesday, December 29, 2010 2:21 PM To: Hiller, Dean (Contractor); gene...@hadoop.apache.org; Patrick Angeles Cc: user@hbase.apache.org; Brown, David M

upload 1.8 gig file turns into 13 gig(no replication)

2010-12-29 Thread Hiller, Dean (Contractor)
I have dfs.replication set to 1, and have a 1.8 gig file on the hdfs and after my map reduct which just pretty much puts each row in the file to a row in the database, I end up with a 14.8 gigs of usage-1.8 = 13 gigs used by hbase??? I think this is starting to seem normal maybe now after think

Re: upload 1.8 gig file turns into 13 gig(no replication)

2010-12-29 Thread Stack
You could take a look at the files in HDFS. Use the HFile tool to look at one of the HBase StoreFile/HFiles. See http://people.apache.org/~stack/hbase-0.90.0-candidate-2/docs/ch08s02.html#hfile_tool for how to use. See how each cell entry includes the row+column+timestamp. Are you using LZO?

How about to give more flexibility to RowKey (customized comparator and serializer)

2010-12-29 Thread Schubert Zhang
In our application, we want to build a index htable to a core htable, and the key of the index includes multiple columns in the core htable. for example: The core table: RowKey -> column1, column2, column3, column4 Note: The length of column1 and column2 is irregular. The index table: RowKey ->

HBase Client error

2010-12-29 Thread King JKing
Hi all, I have exception with HBase Client. Here is my code HTable client = new HTable(conf, this.table); Put put = new Put(rowid); put.add(cf, columnkey, columnval); hTable.put(put); client.close(); Data put well. But some time client raise the error. 10/12/30 11:48:33 INFO zookeeper.ClientC