BytesWritable get() returns more bytes then what's stored

2009-04-08 Thread bzheng
I tried to store protocolbuffer as BytesWritable in a sequence file Text, BytesWritable. It's stored using SequenceFile.Writer(new Text(key), new BytesWritable(protobuf.convertToBytes())). When reading the values from key/value pairs using value.get(), it returns more then what's stored.

Re: BytesWritable get() returns more bytes then what's stored

2009-04-08 Thread bzheng
this in the protobuffer API - perhaps you can use a ByteArrayInputStream here to your advantage. Hope that helps -Todd On Wed, Apr 8, 2009 at 4:59 PM, bzheng bing.zh...@gmail.com wrote: I tried to store protocolbuffer as BytesWritable in a sequence file Text, BytesWritable. It's stored using

Re: OutOfMemory error processing large amounts of gz files

2009-03-02 Thread bzheng
once this error happens. we are currently using 0.18.3 and are holding off changing to a different version because we don't want to lose the existing files on HDFS. bzheng wrote: I have about 24k gz files (about 550GB total) on hdfs and has a really simple java program to convert them

Re: OutOfMemory error processing large amounts of gz files

2009-02-26 Thread bzheng
Arun C Murthy-2 wrote: On Feb 24, 2009, at 4:03 PM, bzheng wrote: 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: java.lang.OutOfMemoryError: Java heap space That tells that that your TaskTracker is running out of memory, not your reduce tasks. I think you

Re: OutOfMemory error processing large amounts of gz files

2009-02-25 Thread bzheng
. (This is one of those fun JVM situations where having more heap space may make OOMEs more likely: less heap memory pressure leaves more un-GCd or un-finalized heap objects around, each of which is holding a bit of native memory.) - Gordon @ IA bzheng wrote: I have about 24k gz files (about

Re: distcp port for 0.17.2

2008-10-23 Thread bzheng
On Wed, Oct 22, 2008 at 3:47 PM, bzheng [EMAIL PROTECTED] wrote: Thanks. The fs.default.name is file:/// and dfs.http.address is 0.0.0.0:50070. I tried: hadoop dfs -ls /path/file to make sure file exists on cluster1 hadoop distcp file:///cluster1_master_node_ip:50070/path/file file

distcp port for 0.17.2

2008-10-22 Thread bzheng
What's the port number for distcp in 0.17.2? I can't find any documentation on distcp for version 0.17.2. For version 0.18, the documentation says it's 8020. I'm using a standard install and the only open ports associated with hadoop are 50030, 50070, and 50090. None of them work with

Re: distcp port for 0.17.2

2008-10-22 Thread bzheng
, respectively. Nicholas Sze - Original Message From: bzheng [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, October 22, 2008 11:57:43 AM Subject: distcp port for 0.17.2 What's the port number for distcp in 0.17.2? I can't find any documentation