Re: specific block size for a file

2009-05-05 Thread Christian Ulrik Søttrup
Cheers, that worked. jason hadoop wrote: Please try -D dfs.block.size=4096000 The specification must be in bytes. On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote: Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible

specific block size for a file

2009-05-05 Thread Christian Ulrik Søttrup
Hi all, I have a job that creates very big local files so i need to split it to as many mappers as possible. Now the DFS block size I'm using means that this job is only split to 3 mappers. I don't want to change the hdfs wide block size because it works for my other jobs. Is there a way to g

Re: joining two large files in hadoop

2009-04-05 Thread Christian Ulrik Søttrup
jason hadoop wrote: This is discussed in chapter 8 of my book. What book? Is it out? In short, If both data sets are: - in same key order - partitioned with the same partitioner, - the input format of each data set is the same, (necessary for this simple example only) A map si

Re: joining two large files in hadoop

2009-04-05 Thread Christian Ulrik Søttrup
atrices and the combined matrix. Here I can use your trick. cheers, Christian Todd Lipcon wrote: On Sat, Apr 4, 2009 at 2:11 PM, Christian Ulrik Søttrup wrote: Hello all, I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set co

joining two large files in hadoop

2009-04-04 Thread Christian Ulrik Søttrup
Hello all, I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set contains a set of "means" and the second a set of objects tied to a mean. Normally I would send the set of means using the distributed cache, but the set has bec

hdfs output for both mapper and reducer

2008-09-25 Thread Christian Ulrik Søttrup
Hi all, I am interested in saving the the output of both the mapper and the reducer in HDFS, is there an efficient way of doing this? Of course i could just run the mapper followed by the identity reducer, and then an identity mapper with my reducer. However, it seems like a waste to run the fr

Re: streaming question

2008-09-16 Thread Christian Ulrik Søttrup
ennis Christian Ulrik Søttrup wrote: Ok, so I added the JAR to the cacheArchive option and my command looks like this: hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/ -output /cout/ -mapper MyProg -combiner testlink/combiner.class -reducer testlink/reduce.class -file /home/hadoop/M

Re: streaming question

2008-09-15 Thread Christian Ulrik Søttrup
Ok, so I added the JAR to the cacheArchive option and my command looks like this: hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/ -output /cout/ -mapper MyProg -combiner testlink/combiner.class -reducer testlink/reduce.class -file /home/hadoop/MyProg -cacheFile /shared/part-0

Re: iterative map-reduce

2008-07-29 Thread Christian Ulrik Søttrup
Hi Shirley, I am basically doing as Qin suggested. I am running a job iteratively until some condition is met. My main looks something like:(in pseudo code) main: while (!converged): make new jobconf setup jobconf run jobconf check reporter for statistics decide if converged I use a c

Re: Scandinavian user group?

2008-07-22 Thread Christian Ulrik Søttrup
Sure i'm interested. Copenhagen is fine for me. Cheers, Christian Mads Toftum wrote: On Mon, Jul 21, 2008 at 03:52:01PM +0200, tim robertson wrote: Is there a user base in Scandinavia that would be interested in meeting to exchange feedback / ideas ? (in English...) Yeah, I'd be inter

Re: question about Counters

2008-07-21 Thread Christian Ulrik Søttrup
Hi, I use a counter in my reducer to check whether another iteration (of map reduce cycle) is necessary. I have a similar declaration as yours. Then in my main program i have: *** client.setConf(conf); RunningJob rj = JobClient.runJob(conf); Counters cs = rj.getCounters(); long swaps=cs.getCou

Re: Can a MapReduce task only consist of a Map step?

2008-07-21 Thread Christian Ulrik Søttrup
Hi, you can simply use the built in reducer that just copies the map output: conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); Cheers, Christian Zhou, Yunqing wrote: I only use it to do something in parallel,but the reduce step will cost me additional several days, is