On 05/25/2011 04:27 PM, Giridhar Addepalli wrote:
Hi,
We have MapReduce program which writes data to mysql database using
DBOutputFormat.
Our program has one reducer.
I understand that all the inserts happen during the close() operation
of the reducer.
Is it gauranteed that this operatio
Hi,
We have MapReduce program which writes data to mysql database using
DBOutputFormat.
Our program has one reducer.
I understand that all the inserts happen during the close() operation of
the reducer.
Is it gauranteed that this operation is atomic ? i.e; what happens if
the writes fail in
Sorry it is working,, i was not giving right value with
-Dmapred.max.split.size.
Thanks for your help !
On Wed, May 25, 2011 at 11:34 AM, Mapred Learn wrote:
> Hi Harsh,
> I just implemented a combineFile InputFormat and its record reader for my
> case.
>
> Now my input has 10 files each of 233
Hi,
> There is lots of SequenceFile in HDFS, how can I merge them into one
> SequenceFile?
The simplest way to do that is to create a job that
- input format = sequence file
- map = identity mapper
- reduce = identity reduce
- output = sequence file
and
job.setNumReduceTasks(1)
However: I think
Thanks a lot! Your help was invaluable! Those guys like you, who answer to
anyone are heroes! Thanks mate! Hope to talk again! :D
Hi Harsh,
I just implemented a combineFile InputFormat and its record reader for my
case.
Now my input has 10 files each of 233 MB and by using this, My job just runs
1 mapper that processes them.
How can I control it by split size i.e. if i say make every split 1 GB i.e.
run 3 mappers for these
Yes, that's a good idea!!! You 've got a point!
Configuration is basically serialized to an XML file and shipped to
the worker machines on submission of a job. What are you looking to do
exactly, and why can't you instantiate the class again in the tasks?
On Wed, May 25, 2011 at 11:30 PM, Michael Giannakopoulos
wrote:
> Does anyone knows how t
Does anyone knows how to save and how to retrieve an instance of a class
using the Configuration class?
I gave mapred.min.size=10L i.e. 1 GB and each input file is 233 MB
and block size = 64 MB.
With all these values, i thought my split size would work and 4 input files
would be combined to get 1 GB input split but somehow this does not happen
and I get 10 mappers , each corresponding to 233
Thanks a lot! I 'll try it...
I haven't gone through the whole thing, but getting the Configuration
object via a static member "conf" set only during submission (main())
will not work - and is probably why there's an NPE.
Use the Context object in the map() call to get a configuration
instance. That is the only right way I kno
Alright, I 'll send you the code (it's an amateur application). Any help is
appreciated! (Don't bother with the Flickrj API)... And something else, how
do you debug a map/reduce app so as to be sure what happens. I use eclipse
and hadoop's plugin for eclipse (Galileo). Thanks a lot!
MetaFlickrPro
Based on your stacktrace, the 'Task' did begin alright. (This is
post-configuration/setup)
You're getting an NPE on
metaFlickrPro\.PhotosDownload$MapClass\.map(PhotosDownload\.java:124)
Its not possible for us to tell why since the point that was thrown is
from your custom code - and we do not hav
Hello guys,
I have written an application that downloads metadata from 3 groups of
Flickr and i implement a map/reduce task so as metadata to be processed by 3
different mappers (each corresponds to one group...). My app runs on single
mode, but when i try to run it in a pseudo-distributed mode had
Thanks Juwei !
I will go through this..
Sent from my iPhone
On May 25, 2011, at 7:51 AM, Juwei Shi wrote:
> The following are suitable for hadoop 0.20.2.
>
> 2011/5/25 Juwei Shi
> The input split size is detemined by map.min.split.size, dfs.block.size and
> mapred.map.tasks.
>
> goalSize
The following are suitable for hadoop 0.20.2.
2011/5/25 Juwei Shi
> The input split size is detemined by map.min.split.size, dfs.block.size and
> mapred.map.tasks.
>
> goalSize = totalSize / mapred.map.tasks
> minSize = max {mapred.min.split.size, minSplitSize}
> splitSize= max (minSize, min(goa
The input split size is detemined by map.min.split.size, dfs.block.size and
mapred.map.tasks.
goalSize = totalSize / mapred.map.tasks
minSize = max {mapred.min.split.size, minSplitSize}
splitSize= max (minSize, min(goalSize, dfs.block.size))
minSplitSize is determined by each InputFormat such as
Resending >
> Hi,
> I have few input splits that are few MB in size.
> I want to submit 1 GB of input to every mapper. Does anyone know how can I do
> it ?
> Currently each mapper gets one input split that results in many small
> map-output files.
>
> I tried setting -Dmapred.map.min.spli
19 matches
Mail list logo