Hi
I've been trying to embed MiniDFSCluster into my unit tests for a long time,
always giving up because it always failed, until yesterday I gave it another
try and accidentally ran the test with an Oracle JVM (my default is IBM's),
and it passed !
I run on Windows 7 64-bit, w/ hadoop-0.20.2.jar.
Hi
I've checked out Hadoop-0.20.2 from
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.2, and from
cygwin I run 'ant test-core -Dtestcase=TestLocalDFS'. The test fails.
Nothing is printed to the console, but
build/test/TEST-org.apache.hadoop.hdfs.TestLocalDFS.txt shows errors like
from the OutputFormats bundled with Hadoop. You might
> start there.
>
> Again, it's not clear what your goal is or what you mean by "index".
> Are the input records changed before being written by the reduce? Or
> is the purpose of this job only to concatenate index fil
bq. If you can change your job to handle metadata backed by a store in HDFS
I have two Mappers, one that works with HDFS and one with GPFS. The GPFS one
does exactly that -- it stores the indexes in GPFS (which all Mappers and
Reducers see, as a shared location) and outputs just the pointer to tha
s? I don't mind
writing some classes if that's what it takes ...
Shai
On Thu, Apr 14, 2011 at 9:50 PM, Harsh J wrote:
> Hello Shai,
>
> On Fri, Apr 15, 2011 at 12:01 AM, Shai Erera wrote:
> > Hi
> > I'm running on Hadoop 0.20.2 and I have a job with the follow
Hi
I'm running on Hadoop 0.20.2 and I have a job with the following nature:
* Mapper outputs very large records (50 to 200 MB)
* Reducer (single) merges all those records together
* Map output key is a constant (could be a NullWritable, but currently it's
a LongWritable(1))
* Reducer doesn't care
has your input format.
> You may find
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCount.html
> useful.
>
>
> On Thu, Nov 25, 2010 at 12:01 PM, Shai Erera wrote:
>
>> I wasn't talking about how to configure
t critical if
it can't be done, but it can improve the performance of my job if it can be
done.
Thanks
Shai
On Thu, Nov 25, 2010 at 9:55 PM, Niels Basjes wrote:
> Hi,
>
> 2010/11/25 Shai Erera :
> > Is there a way to make MapReduce create exactly N Mappers? More
> > speci
Hi
Is there a way to make MapReduce create exactly N Mappers? More
specifically, if say my data can be split to 200 Mappers, and I have only
100 cores, how can I ensure only 100 Mappers will be created? The number of
cores is not something I know in advance, so writing a special InputFormat
might
Hi
I need to implement a Writable, which contains a lot of data, and
unfortunately I cannot break it down to smaller pieces. The output of a
Mapper is potentially a large record, which can be of any size ranging from
few 10s of MBs to few 100s of MBs.
Is there a way for me to de-serialize the Wri
Thursday, July 29, 2010, Ferdy Galema wrote:
>
>
>
>
>
>
> Very well. Could you keep us informed on how your instant merging plans
> work out? We're actually running a similar indexing process.
>
> It's very interesting to be able to start merging Lucene index
ing 400 maps and 10
>> reduces followed by another job with a single reducer will not benefit if
>> the single reducer has to process the same amount of data that the previous
>> reducers have been outputting. Therefore it completely depends on what your
>> reducer actually does.
un intended) the
> amount of data in the pipeline. For example running 400 maps and 10 reduces
> followed by another job with a single reducer will not benefit if the single
> reducer has to process the same amount of data that the previous reducers
> have been outputting. Therefore it c
t; merged, you could use "hadoop fs -getmerge ..." to pull a merged copy of the
> DFS.
>
> Btw I share your opinion on keeping your Map/Reduce functions
> singlethreaded (thus simple) when possible. The Hadoop framework will be
> able to run your application concurrently by usin
y question, I
> believe that the copy stage may start before all mappers are finished.
> However, the sorting and application of your reduce function can not proceed
> until each mapper is finished.
>
> Could you describe your problem in more detail?
>
> Regards,
> Greg Lawre
Thanks for the prompt response Amogh !
I'm kinda rookie w/ Hadoop, so please forgive my perhaps "too rookie"
questions :).
Check the property mapred.reduce.slowstart.completed.maps
>
>From what I read here (
http://hadoop.apache.org/common/docs/current/mapred-default.html), this
parameter contro
Hi
I have a scenario for which I'd like to write a MR job in which Mappers do
some work and eventually the output of all mappers need to be combined by a
single Reducer. Each Mapper outputs that is distinct from all
other Mappers, meaning the Reducer.reduce() method always receives a single
eleme
17 matches
Mail list logo