How big are the zip files? I am not sure if this is what you want, but for
my scenario, I had a lot of smaller zip files (not gzip) that need to be
processed. I put these into a SequenceFile outside of hadoop and then
upload to hdfs. Once in hdfs, I have the mapper read the SequenceFile with
eac
> how did you put the zips into SequenceFiles? For me, binary writes to
> SequenceFiles are very slow. It does not have to be zip files: I create
> them
> myself out of my data, and I do anything - tar, gzip...
>
> Thank you,
> Mark
>
> On Tue, Jul 7, 2009 at 12:28 AM, K
have to be zip files: I create
> > > them
> > > myself out of my data, and I do anything - tar, gzip...
> > >
> > > Thank you,
> > > Mark
> > >
> > > On Tue, Jul 7, 2009 at 12:28 AM, Kris Jirapinyo
> > > wrote:
> > >
> &
Hi all,
I did a google search on "scala" and "hadoop", and several articles from
2008 came up. Just curious, is anyone currently using Scala+Hadoop
extensively in production?
-- Kris.
Why don't you try Amazon EC2? :)
On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith wrote:
> I'm having problems dealing with my server mfgr atm. Is there a good mfgr
> to go with?
>
> Any advice is helpful, thanks.
>
> -Ryan
>
Hi all,
I was wondering if anyone's encountered 4 extra bytes at the beginning of
the serialized object file using MultipleOutputFormat. Basically, I am
using BytesWritable to write the serialized byte arrays in the reducer
phase. My writer is a generic one:
public class GenericOutputFormat e
d Lipcon wrote:
> BytesWritable serializes itself by first outputting the array length, and
> then outputting the array itself. The 4 bytes at the top of the file are
> the
> length of the value itself.
>
> Hope that helps
> -Todd
>
> On Tue, Aug 11, 2009 at 6:33 PM, Kris Jirapinyo &
oose to do this, just model it after BytesWritable but drop the 4
>> byte length header.
>>
>> -Todd
>>
>> On Tue, Aug 11, 2009 at 7:23 PM, Kris Jirapinyo
>> wrote:
>>
>> Ah that explains it, thanks Todd. Is there a way to serialize an object
Hi John,
If you have the Hadoop O'Reilly book, look at pg 206 for an example.
But basically, you just create a subclass of MultipleTextOutputFormat and
then inside it you override generateFileNameForKeyValue (for example) to
have the reducer emit the desired filenames. For each key in the red
Do you ever close your DataOutputBuffer?
-- Kris J.
On Tue, Aug 18, 2009 at 7:35 AM, Wasim Bari wrote:
>
> Hi,
>I tried anotherway to implement the InputFileFormat which returns
> as record to mapper.
>
> I used this logic: Used a LineRecordReader to read file line by line and
> keep stori
Hi all,
I know this has been filed as a JIRA improvement already
http://issues.apache.org/jira/browse/HDFS-343, but is there any good
workaround at the moment? What's happening is I have added a few new EBS
volumes to half of the cluster, but Hadoop doesn't want to write to them.
When I try to
The order matters?
On Tue, Aug 25, 2009 at 1:16 PM, Ted Dunning wrote:
> Change the ordering of the volumes in the ocnfig files.
>
> On Tue, Aug 25, 2009 at 12:51 PM, Kris Jirapinyo >wrote:
>
> > Hi all,
> >I know this has been filed as a JIRA i
don't think HDFS-343 is directly related to this or is likely to be
> fixed. There is another jira that makes placement policy at NameNode
> pluggable (does not affect Datanode).
>
> Raghu.
>
>
> Kris Jirapinyo wrote:
>
>> Hi all,
>>I know this has been f
that
just a placeholder name and whatever directory is under that parent
directory will be scanned and picked up by the datanode?
Kris.
On Tue, Aug 25, 2009 at 6:24 PM, Raghu Angadi wrote:
> Kris Jirapinyo wrote:
>
>> How does copying the subdir work? What if that partition alread
Hmm then in that case, it is possible for me to manually balance load those
datanodes by moving most of the files onto the new, larger partition. I
will try it. Thanks!
-- Kris J.
On Wed, Aug 26, 2009 at 10:13 AM, Raghu Angadi wrote:
> Kris Jirapinyo wrote:
>
>> But I mean, then h
Hi all,
What is the best way to copy directories from HDFS to local disk in
0.19.1?
Thanks,
Kris.
ly or command line ?
> >
> > Command line :
> >
> > bin/hadoop -get /path/to/dfs/dir /path/to/local/dir
> >
> > Arvind
> >
> >
> >
> >
> >
> > From: Kris Jirapinyo
> > To: common-user
ell...
>
> Arvind
>
>
>
>
> ____
> From: Kris Jirapinyo
> To: common-user@hadoop.apache.org
> Sent: Friday, September 4, 2009 11:41:22 PM
> Subject: Re: Copying directories out of HDFS
>
> I thought -get and -copyToLocal don
than using the flag -i to "ignore" this, is there another workaround? I
tried to download that file to local, and it works fine, so it's not that the
data does not exist. Is this in any way related to
https://issues.apache.org/jira/browse/MAPREDUCE-968?
Thanks!
Kris Jirapiny
19 matches
Mail list logo