Re: how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
Another question here is in getDefaultWorkFile() is that, how is it possible to find out the mapper number that is used in output. For eg, if you have 30 mappers, how can I add to output file( ) of 30th mapper - _30 ? On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn wrote: > Thanks ! > I will try t

Re: How to split a big file in HDFS by size

2011-06-19 Thread Mapred Learn
Hi Christopher, If I get all 60 Gb on HDFs, can I then split it into 60 1 Gb files and then run a map-red job on those 60 text fixed length files ? If yes, do you have any idea how to do this ? On Sun, Jun 19, 2011 at 11:28 PM, Christoph Schmitz < christoph.schm...@1und1.de> wrote: > JJ, > > u

AW: How to split a big file in HDFS by size

2011-06-19 Thread Christoph Schmitz
JJ, uploading 60 GB single-threaded (i.e. hadoop fs -copyFromLocal etc.) will be slow. If possible, try to get the files in smaller chunks where they are created, and upload them in parallel with a simple MapReduce job that only passes the data through (i.e. uses the standard Mapper and Reducer

Re: how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
Thanks ! I will try this ! On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz < christoph.schm...@1und1.de> wrote: > Hi JJ, > > you can do that by subclassing TextOutputFormat (or whichever output format > you're using) and overloading the getDefaultWorkFile method: > > public class MyOutputFor

AW: how to change default name of a sequnce file

2011-06-19 Thread Christoph Schmitz
Hi JJ, you can do that by subclassing TextOutputFormat (or whichever output format you're using) and overloading the getDefaultWorkFile method: public class MyOutputFormat extends TextOutputFormat { // ... public Path getDefaultWorkFile(TaskAttemptContext context, String exte

how to change default name of a sequnce file

2011-06-19 Thread Mapred Learn
Hi, I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format. Has anyone ever tried to over-ride the default filename and give output file name per map-red ? Thanks, -JJ

How to split a big file in HDFS by size

2011-06-19 Thread Mapred Learn
Hi, I am trying to upload text files in size 60 GB or more. I want to split these files into smaller files of say 1 GB each so that I can run further map-red jobs on it. Anybody has any idea how can I do this ? Thanks a lot in advance ! Any ideas are greatly appreciated ! -JJ