Hi,
I’m trying to model an embarrassingly parallel problem as a map-reduce job.
The amount of data is small -- about 100MB per job, and about 0.25MB per work
item -- but the reduce phase is very CPU-intensive, requiring about 30 seconds
to reduce each mapper's output to a single value. The
100MB is very small, so the overhead of putting the data in hdfs is also
very small. Does it even make sense to optimize this? (reading/writing will
only take a second or so) If you don't want to stream data to hdfs and you
have very little data then you should look in to alternative high
Hello everyone,
I have a question about the io.sort.mb property. The document says that
io.sort.mb is the total amount of buffer memory to use while sorting files.
My question is that does it include both the keys and values of the records
or just keys (and perhaps some pointers to the values)?
You should consider writing a custom InputFormat which reads directly from
the database - while FileInputformat is the most common class for
InputFormat, the specification for InputFormat or what the critical method
getSplits does not require HDFS -
A custom version can return database entries as
It accounts for both keys and values.
+Vinod
Hortonworks Inc.
http://hortonworks.com/
On Sun, Nov 9, 2014 at 11:54 AM, Muhuan Huang mhhu...@cs.ucla.edu wrote:
Hello everyone,
I have a question about the io.sort.mb property. The document says that
io.sort.mb is the total amount of buffer
You’re right, 100MB is small, but if there are 100,000 jobs, the overhead of
copying data to HDFS adds up. I guess my main concern was whether allowing
mappers to fetch the input data would violate some technical rule or map-reduce
principle.
I have considered alternative solutions like
Ah, yes, I remember reading about custom InputFormats but did not realize they
could bypass HDFS entirely. Sounds like a good solution, I will look into it.
Thanks,
Trevor
On Nov 9, 2014, at 12:48 PM, Steve Lewis lordjoe2...@gmail.com wrote:
You should consider writing a custom InputFormat
Hi Pumudu,
Thanks for your reply.
The command that was used to mount on windows was
mount -o nolock mtype=hard 10.66.27.171:/! W:
I was able to browse, but unable to write files. When I try to create/write
a file, it gives me the following error,
ERROR nfs3.RpcProgramNfs3: Setting file size
Hi Gautam,
Did you set access privileges in hdfs-site.xml ? If access privileges are
not set, by default it's read only as i can remember. Please add following
property in hdfs-site.xml and restart hdfs and nfs gateway.
--
property