Is it wrong to bypass HDFS?

2014-11-09 Thread Trevor Harmon
Hi, I’m trying to model an embarrassingly parallel problem as a map-reduce job. The amount of data is small -- about 100MB per job, and about 0.25MB per work item -- but the reduce phase is very CPU-intensive, requiring about 30 seconds to reduce each mapper's output to a single value. The

Re: Is it wrong to bypass HDFS?

2014-11-09 Thread Dieter De Witte
100MB is very small, so the overhead of putting the data in hdfs is also very small. Does it even make sense to optimize this? (reading/writing will only take a second or so) If you don't want to stream data to hdfs and you have very little data then you should look in to alternative high

Does io.sort.mb count in the records or just the keys?

2014-11-09 Thread Muhuan Huang
Hello everyone, I have a question about the io.sort.mb property. The document says that io.sort.mb is the total amount of buffer memory to use while sorting files. My question is that does it include both the keys and values of the records or just keys (and perhaps some pointers to the values)?

Re: Is it wrong to bypass HDFS?

2014-11-09 Thread Steve Lewis
You should consider writing a custom InputFormat which reads directly from the database - while FileInputformat is the most common class for InputFormat, the specification for InputFormat or what the critical method getSplits does not require HDFS - A custom version can return database entries as

Re: Does io.sort.mb count in the records or just the keys?

2014-11-09 Thread Vinod Kumar Vavilapalli
It accounts for both keys and values. +Vinod Hortonworks Inc. http://hortonworks.com/ On Sun, Nov 9, 2014 at 11:54 AM, Muhuan Huang mhhu...@cs.ucla.edu wrote: Hello everyone, I have a question about the io.sort.mb property. The document says that io.sort.mb is the total amount of buffer

Re: Is it wrong to bypass HDFS?

2014-11-09 Thread Trevor Harmon
You’re right, 100MB is small, but if there are 100,000 jobs, the overhead of copying data to HDFS adds up. I guess my main concern was whether allowing mappers to fetch the input data would violate some technical rule or map-reduce principle. I have considered alternative solutions like

Re: Is it wrong to bypass HDFS?

2014-11-09 Thread Trevor Harmon
Ah, yes, I remember reading about custom InputFormats but did not realize they could bypass HDFS entirely. Sounds like a good solution, I will look into it. Thanks, Trevor On Nov 9, 2014, at 12:48 PM, Steve Lewis lordjoe2...@gmail.com wrote: You should consider writing a custom InputFormat

Re: Error while writing to HDFS NFS from windows

2014-11-09 Thread Gautam Hegde
Hi Pumudu, Thanks for your reply. The command that was used to mount on windows was mount -o nolock mtype=hard 10.66.27.171:/! W: I was able to browse, but unable to write files. When I try to create/write a file, it gives me the following error, ERROR nfs3.RpcProgramNfs3: Setting file size

Re: Error while writing to HDFS NFS from windows

2014-11-09 Thread Pumudu ruhunage
Hi Gautam, Did you set access privileges in hdfs-site.xml ? If access privileges are not set, by default it's read only as i can remember. Please add following property in hdfs-site.xml and restart hdfs and nfs gateway. -- property