job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Chris MacKenzie
Hi, What is the anticipated usage of the above with the new api ? Is there another way to remove the empty part-r files When using it with MultipleOutputs to remove empty part-r files I have no output ;O) Regards, Chris MacKenzie http://www.chrismackenziephotography.co.uk/

The future of MapReduce

2014-07-01 Thread Adaryl Bob Wakefield, MBA
“The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce.” Does this mean that learning MapReduce is a waste of time? Is Storm the future or are both technologies necessary?

Re: The future of MapReduce

2014-07-01 Thread Marco Shaw
It depends... It seems most are evolving from needing lots of data crunched, to lots of data crunched right now. Most are looking for *real-time* fraud detection or recommendations, for example, which MapReduce is not ideal for. Marco On Tue, Jul 1, 2014 at 12:00 PM, Adaryl Bob Wakefield, MBA

Jr. Hadoop positions or internships

2014-07-01 Thread Adam Pritchard
Hi everyone, Does anyone know of any Jr. Hadoop roles or big data related internships in the Bay Area? I am very motivated to learn Hadoop and big data related technologies and have quit my full time job as a web developer and been teaching myself for the last 5 months. I have been studying

Re: The future of MapReduce

2014-07-01 Thread Adaryl Bob Wakefield, MBA
From your answer, it sounds like you need to be able to do both. From: Marco Shaw Sent: Tuesday, July 01, 2014 10:24 AM To: user Subject: Re: The future of MapReduce It depends... It seems most are evolving from needing lots of data crunched, to lots of data crunched right now. Most are

Re: Jr. Hadoop positions or internships

2014-07-01 Thread Publius
you poor soul   there are internships out there,  hadoop internship, san francisco - Google Search hadoop internship, san francisco - Google Search Search Options Any time Past hour Past 24 hours Past week Past month Past year All results Verbatim About 162,000 results 10 Myths About Hadoop -

Re: The future of MapReduce

2014-07-01 Thread kartik saxena
Spark https://spark.apache.org/ is also getting a lot attention with its in-memory computations and caching features. Performance wise it is being touted better than mahout because machine learning involves iterative computations and Spark could cache these computations in-memory for faster

Compressing map output

2014-07-01 Thread Mohit Anchlia
I am trying to compress mapoutput but when I add the following code I get errors. Is there anything wrong that you can point me to? conf.setBoolean( mapreduce.map.output.compress, *true*); conf.setClass( mapreduce.map.output.compress.codec, GzipCodec.*class*, CompressionCodec. *class*);

Re: The future of MapReduce

2014-07-01 Thread Marco Shaw
Sorry, not sure if that's a question. Hadoop v1=HDFS+MapReduce Hadoop v2=HDFS+YARN (+ MapReduce part of the core, but now considered optional to get work done) v2 adds a better resourcing framework. Now you can run Storm, Spark, MapReduce, etc. on Hadoop and mix-and-match jobs/tasks with

Re: The future of MapReduce

2014-07-01 Thread Adaryl Bob Wakefield, MBA
It was a declarative statement designed to elicit further explanation. If someone is brand new and trying to figure out how to eat the elephant as it were, you kind of want to burn things down to their essentials. If MapReduce isn’t going to be part of the ecosystem in the future, one does not

Re: The future of MapReduce

2014-07-01 Thread Marco Shaw
Interesting timing: http://java.dzone.com/articles/there-future-mapreduce Google declared last week that MapReduce was dead more or less, but there are very few that process data at Google's level. Makes me wonder what Yahoo has for a tech mix these days... On Tue, Jul 1, 2014 at 6:01 PM,

Re: The future of MapReduce

2014-07-01 Thread snehil wakchaure
Heard about Google dataflow from last week On Jul 1, 2014 4:42 PM, Marco Shaw marco.s...@gmail.com wrote: Interesting timing: http://java.dzone.com/articles/there-future-mapreduce Google declared last week that MapReduce was dead more or less, but there are very few that process data at

Re: job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread M. Dale
NullOutputFormat does not generate any output. Good for jobs where counters or some other I/O are your output (for example, http://stackoverflow.com/questions/12707726/run-a-hadoop-job-without-output-file). From Tom White's book it sounds like

Re: Compressing map output

2014-07-01 Thread M. Dale
That looks right. Do you consistently get the error below and the total job fails? Does it go away when you comment out the map compression? On 07/01/2014 03:23 PM, Mohit Anchlia wrote: I am trying to compress mapoutput but when I add the following code I get errors. Is there anything wrong

Re: job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Shahab Yunus
To get rid of empty *part files while using MultipleOutputs in the new API, LazyOutputFormat class' static method should be used to set the output format. Details are here at the official Java docs for MultipleOutputs :

Re: Compressing map output

2014-07-01 Thread Mohit Anchlia
Yes it goes away when I comment the map output compression. On Tue, Jul 1, 2014 at 6:38 PM, M. Dale medal...@yahoo.com wrote: That looks right. Do you consistently get the error below and the total job fails? Does it go away when you comment out the map compression? On 07/01/2014 03:23 PM,

Re: Downloading a jar to hadoop's lib folder (classpath)

2014-07-01 Thread Tsuyoshi OZAWA
I added it in the pom.xml file (inside hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-resourcemanager/pom.xml) mvn package -Pdist,native -Dtar How about editing hadoop-project/pom.xml and add your dependency to it? I think it will work. Thanks, - Tsuyoshi On Wed, Jul

Re: How to make hdfs data rack aware

2014-07-01 Thread hadoop hive
Try running fsck, it will also validate the block placement as well as replication. On Jun 27, 2014 6:49 AM, Kilaru, Sambaiah sambaiah_kil...@intuit.com wrote: My topology script is working fine for data I am writing to hdfs. My question is how to make the Existing data topoloy compaliant?

Re: job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Chris MacKenzie
Hi Markus And Shahab, Thanks for getting back to me, I really appreciate it. LazyOutputFormat did the trick. I tried NUllOutputFormat (job.setOutputFormatClass(NullOutputFormat.class);) before writing to the group but was getting an empty folder. I looked at LazyOutputFormat, in fact, my mos is