Cheers, that worked.
jason hadoop wrote:
Please try -D dfs.block.size=4096000
The specification must be in bytes.
On Tue, May 5, 2009 at 4:47 AM, Christian Ulrik Søttrup wrote:
Hi all,
I have a job that creates very big local files so i need to split it to as
many mappers as possible
Hi all,
I have a job that creates very big local files so i need to split it to
as many mappers as possible. Now the DFS block size I'm
using means that this job is only split to 3 mappers. I don't want to
change the hdfs wide block size because it works for my other jobs.
Is there a way to g
jason hadoop wrote:
This is discussed in chapter 8 of my book.
What book? Is it out?
In short,
If both data sets are:
- in same key order
- partitioned with the same partitioner,
- the input format of each data set is the same, (necessary for this
simple example only)
A map si
atrices and the combined matrix. Here I can use your trick.
cheers,
Christian
Todd Lipcon wrote:
On Sat, Apr 4, 2009 at 2:11 PM, Christian Ulrik Søttrup wrote:
Hello all,
I need to do some calculations that has to merge two sets of very large
data (basically calculate variance).
One set co
Hello all,
I need to do some calculations that has to merge two sets of very large
data (basically calculate variance).
One set contains a set of "means" and the second a set of objects tied
to a mean.
Normally I would send the set of means using the distributed cache, but
the set has bec
Hi all,
I am interested in saving the the output of both the mapper and the
reducer in HDFS, is there an efficient way of doing this?
Of course i could just run the mapper followed by the identity reducer,
and then an identity mapper with my reducer. However,
it seems like a waste to run the fr
ennis
Christian Ulrik Søttrup wrote:
Ok, so I added the JAR to the cacheArchive option and my command
looks like this:
hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/
-output /cout/ -mapper MyProg -combiner testlink/combiner.class
-reducer testlink/reduce.class -file /home/hadoop/M
Ok, so I added the JAR to the cacheArchive option and my command looks
like this:
hadoop jar streaming/hadoop-0.17.0-streaming.jar -input /store/ -output
/cout/ -mapper MyProg -combiner testlink/combiner.class -reducer
testlink/reduce.class -file /home/hadoop/MyProg -cacheFile
/shared/part-0
Hi Shirley,
I am basically doing as Qin suggested.
I am running a job iteratively until some condition is met.
My main looks something like:(in pseudo code)
main:
while (!converged):
make new jobconf
setup jobconf
run jobconf
check reporter for statistics
decide if converged
I use a c
Sure i'm interested. Copenhagen is fine for me.
Cheers,
Christian
Mads Toftum wrote:
On Mon, Jul 21, 2008 at 03:52:01PM +0200, tim robertson wrote:
Is there a user base in Scandinavia that would be interested in meeting to
exchange feedback / ideas ?
(in English...)
Yeah, I'd be inter
Hi,
I use a counter in my reducer to check whether another iteration (of map
reduce cycle) is necessary. I have a similar declaration as yours.
Then in my main program i have:
***
client.setConf(conf);
RunningJob rj = JobClient.runJob(conf);
Counters cs = rj.getCounters();
long swaps=cs.getCou
Hi,
you can simply use the built in reducer that just copies the map output:
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
Cheers,
Christian
Zhou, Yunqing wrote:
I only use it to do something in parallel,but the reduce step will cost me
additional several days, is
12 matches
Mail list logo