Re: zip files as input

2009-07-06 Thread Kris Jirapinyo
How big are the zip files? I am not sure if this is what you want, but for my scenario, I had a lot of smaller zip files (not gzip) that need to be processed. I put these into a SequenceFile outside of hadoop and then upload to hdfs. Once in hdfs, I have the mapper read the SequenceFile with eac

Re: zip files as input

2009-07-07 Thread Kris Jirapinyo
> how did you put the zips into SequenceFiles? For me, binary writes to > SequenceFiles are very slow. It does not have to be zip files: I create > them > myself out of my data, and I do anything - tar, gzip... > > Thank you, > Mark > > On Tue, Jul 7, 2009 at 12:28 AM, K

Re: zip files as input

2009-07-07 Thread Kris Jirapinyo
have to be zip files: I create > > > them > > > myself out of my data, and I do anything - tar, gzip... > > > > > > Thank you, > > > Mark > > > > > > On Tue, Jul 7, 2009 at 12:28 AM, Kris Jirapinyo > > > wrote: > > > > &

Scala and Hadoop

2009-07-07 Thread Kris Jirapinyo
Hi all, I did a google search on "scala" and "hadoop", and several articles from 2008 came up. Just curious, is anyone currently using Scala+Hadoop extensively in production? -- Kris.

Re: Hardware Manufacturer

2009-07-14 Thread Kris Jirapinyo
Why don't you try Amazon EC2? :) On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith wrote: > I'm having problems dealing with my server mfgr atm. Is there a good mfgr > to go with? > > Any advice is helpful, thanks. > > -Ryan >

Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Kris Jirapinyo
Hi all, I was wondering if anyone's encountered 4 extra bytes at the beginning of the serialized object file using MultipleOutputFormat. Basically, I am using BytesWritable to write the serialized byte arrays in the reducer phase. My writer is a generic one: public class GenericOutputFormat e

Re: Extra 4 bytes at beginning of serialized file

2009-08-11 Thread Kris Jirapinyo
d Lipcon wrote: > BytesWritable serializes itself by first outputting the array length, and > then outputting the array itself. The 4 bytes at the top of the file are > the > length of the value itself. > > Hope that helps > -Todd > > On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo &

Re: Extra 4 bytes at beginning of serialized file

2009-08-12 Thread Kris Jirapinyo
oose to do this, just model it after BytesWritable but drop the 4 >> byte length header. >> >> -Todd >> >> On Tue, Aug 11, 2009 at 7:23 PM, Kris Jirapinyo >> wrote: >> >> Ah that explains it, thanks Todd. Is there a way to serialize an object

Re: Two output files?

2009-08-14 Thread Kris Jirapinyo
Hi John, If you have the Hadoop O'Reilly book, look at pg 206 for an example. But basically, you just create a subclass of MultipleTextOutputFormat and then inside it you override generateFileNameForKeyValue (for example) to have the reducer emit the desired filenames. For each key in the red

Re: Customized InputFormat

2009-08-18 Thread Kris Jirapinyo
Do you ever close your DataOutputBuffer? -- Kris J. On Tue, Aug 18, 2009 at 7:35 AM, Wasim Bari wrote: > > Hi, >I tried anotherway to implement the InputFileFormat which returns > as record to mapper. > > I used this logic: Used a LineRecordReader to read file line by line and > keep stori

Intra-datanode balancing?

2009-08-25 Thread Kris Jirapinyo
Hi all, I know this has been filed as a JIRA improvement already http://issues.apache.org/jira/browse/HDFS-343, but is there any good workaround at the moment? What's happening is I have added a few new EBS volumes to half of the cluster, but Hadoop doesn't want to write to them. When I try to

Re: Intra-datanode balancing?

2009-08-25 Thread Kris Jirapinyo
The order matters? On Tue, Aug 25, 2009 at 1:16 PM, Ted Dunning wrote: > Change the ordering of the volumes in the ocnfig files. > > On Tue, Aug 25, 2009 at 12:51 PM, Kris Jirapinyo >wrote: > > > Hi all, > >I know this has been filed as a JIRA i

Re: Intra-datanode balancing?

2009-08-25 Thread Kris Jirapinyo
don't think HDFS-343 is directly related to this or is likely to be > fixed. There is another jira that makes placement policy at NameNode > pluggable (does not affect Datanode). > > Raghu. > > > Kris Jirapinyo wrote: > >> Hi all, >>I know this has been f

Re: Intra-datanode balancing?

2009-08-26 Thread Kris Jirapinyo
that just a placeholder name and whatever directory is under that parent directory will be scanned and picked up by the datanode? Kris. On Tue, Aug 25, 2009 at 6:24 PM, Raghu Angadi wrote: > Kris Jirapinyo wrote: > >> How does copying the subdir work? What if that partition alread

Re: Intra-datanode balancing?

2009-08-26 Thread Kris Jirapinyo
Hmm then in that case, it is possible for me to manually balance load those datanodes by moving most of the files onto the new, larger partition. I will try it. Thanks! -- Kris J. On Wed, Aug 26, 2009 at 10:13 AM, Raghu Angadi wrote: > Kris Jirapinyo wrote: > >> But I mean, then h

Copying directories out of HDFS

2009-09-04 Thread Kris Jirapinyo
Hi all, What is the best way to copy directories from HDFS to local disk in 0.19.1? Thanks, Kris.

Re: Copying directories out of HDFS

2009-09-04 Thread Kris Jirapinyo
ly or command line ? > > > > Command line : > > > > bin/hadoop -get /path/to/dfs/dir /path/to/local/dir > > > > Arvind > > > > > > > > > > > > From: Kris Jirapinyo > > To: common-user

Re: Copying directories out of HDFS

2009-09-05 Thread Kris Jirapinyo
ell... > > Arvind > > > > > ____ > From: Kris Jirapinyo > To: common-user@hadoop.apache.org > Sent: Friday, September 4, 2009 11:41:22 PM > Subject: Re: Copying directories out of HDFS > > I thought -get and -copyToLocal don

distcp questions

2010-08-15 Thread Kris Jirapinyo
than using the flag -i to "ignore" this, is there another workaround? I tried to download that file to local, and it works fine, so it's not that the data does not exist. Is this in any way related to https://issues.apache.org/jira/browse/MAPREDUCE-968? Thanks! Kris Jirapiny