Hi,
What is the anticipated usage of the above with the new api ? Is there
another way to remove the empty part-r files
When using it with MultipleOutputs to remove empty part-r files I have no
output ;O)
Regards,
Chris MacKenzie
http://www.chrismackenziephotography.co.uk/
“The Mahout community decided to move its codebase onto modern data processing
systems that offer a richer programming model and more efficient execution than
Hadoop MapReduce.”
Does this mean that learning MapReduce is a waste of time? Is Storm the future
or are both technologies necessary?
It depends... It seems most are evolving from needing lots of data
crunched, to lots of data crunched right now. Most are looking for
*real-time* fraud detection or recommendations, for example, which
MapReduce is not ideal for.
Marco
On Tue, Jul 1, 2014 at 12:00 PM, Adaryl Bob Wakefield, MBA
Hi everyone,
Does anyone know of any Jr. Hadoop roles or big data related internships in
the Bay Area?
I am very motivated to learn Hadoop and big data related technologies and
have quit my full time job as a web developer and been teaching myself for
the last 5 months.
I have been studying
From your answer, it sounds like you need to be able to do both.
From: Marco Shaw
Sent: Tuesday, July 01, 2014 10:24 AM
To: user
Subject: Re: The future of MapReduce
It depends... It seems most are evolving from needing lots of data crunched,
to lots of data crunched right now. Most are
you poor soul
there are internships out there,
hadoop internship, san francisco - Google Search
hadoop internship, san francisco - Google Search
Search Options Any time Past hour Past 24 hours Past week Past month Past year
All results Verbatim About 162,000 results 10 Myths About Hadoop -
Spark https://spark.apache.org/ is also getting a lot attention with its
in-memory computations and caching features. Performance wise it is being
touted better than mahout because machine learning involves iterative
computations and Spark could cache these computations in-memory for faster
I am trying to compress mapoutput but when I add the following code I get
errors. Is there anything wrong that you can point me to?
conf.setBoolean(
mapreduce.map.output.compress, *true*);
conf.setClass(
mapreduce.map.output.compress.codec, GzipCodec.*class*,
CompressionCodec.
*class*);
Sorry, not sure if that's a question.
Hadoop v1=HDFS+MapReduce
Hadoop v2=HDFS+YARN (+ MapReduce part of the core, but now considered
optional to get work done)
v2 adds a better resourcing framework. Now you can run Storm, Spark,
MapReduce, etc. on Hadoop and mix-and-match jobs/tasks with
It was a declarative statement designed to elicit further explanation.
If someone is brand new and trying to figure out how to eat the elephant as it
were, you kind of want to burn things down to their essentials. If MapReduce
isn’t going to be part of the ecosystem in the future, one does not
Interesting timing:
http://java.dzone.com/articles/there-future-mapreduce
Google declared last week that MapReduce was dead more or less, but there
are very few that process data at Google's level.
Makes me wonder what Yahoo has for a tech mix these days...
On Tue, Jul 1, 2014 at 6:01 PM,
Heard about Google dataflow from last week
On Jul 1, 2014 4:42 PM, Marco Shaw marco.s...@gmail.com wrote:
Interesting timing:
http://java.dzone.com/articles/there-future-mapreduce
Google declared last week that MapReduce was dead more or less, but
there are very few that process data at
NullOutputFormat does not generate any output. Good for jobs where
counters or some other I/O are your output (for example,
http://stackoverflow.com/questions/12707726/run-a-hadoop-job-without-output-file).
From Tom White's book it sounds like
That looks right. Do you consistently get the error below and the total
job fails? Does it go away when you comment out the map compression?
On 07/01/2014 03:23 PM, Mohit Anchlia wrote:
I am trying to compress mapoutput but when I add the following code I
get errors. Is there anything wrong
To get rid of empty *part files while using MultipleOutputs in the new API,
LazyOutputFormat class' static method should be used to set the output
format.
Details are here at the official Java docs for MultipleOutputs :
Yes it goes away when I comment the map output compression.
On Tue, Jul 1, 2014 at 6:38 PM, M. Dale medal...@yahoo.com wrote:
That looks right. Do you consistently get the error below and the total
job fails? Does it go away when you comment out the map compression?
On 07/01/2014 03:23 PM,
I added it in the pom.xml file (inside
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-resourcemanager/pom.xml)
mvn package -Pdist,native -Dtar
How about editing hadoop-project/pom.xml and add your dependency to
it? I think it will work.
Thanks,
- Tsuyoshi
On Wed, Jul
Try running fsck, it will also validate the block placement as well as
replication.
On Jun 27, 2014 6:49 AM, Kilaru, Sambaiah sambaiah_kil...@intuit.com
wrote:
My topology script is working fine for data I am writing to hdfs. My
question is how to make the
Existing data topoloy compaliant?
Hi Markus And Shahab,
Thanks for getting back to me, I really appreciate it. LazyOutputFormat did
the trick. I tried NUllOutputFormat
(job.setOutputFormatClass(NullOutputFormat.class);) before writing to the
group but was getting an empty folder.
I looked at LazyOutputFormat, in fact, my mos is
19 matches
Mail list logo