Hi guys,
I've a question in my head for a long time, what's the biggest cluster running
YARN? I just heard some rumor about some biggest cluster running map-reduce
1.0 with 10,000+ nodes, but rarely heard about such rumor about YARN.
Welcome any message about this, like inside information or
Well... It all depends on where is your bottleneck. Do a benchmark for your
use case if it is critical. Multi-threading might be useful not always. And
you would rather want to avoid having a locally shared mutable state
because it can become a pain to manage. But it doesn't mean you can't do
Never mind, depends on plantform, in my case would work fine. Thanks guys!
Mark
On Mon, Jan 14, 2013 at 12:23 PM, Mark Olimpiati markq2...@gmail.comwrote:
Thanks Bertrand, I shall try it and hope to gain some speed. One last
question though, do you think the threads used are user-level or
Hi,
I wanted to write the my input file which is in Text format to TupleWritable
but when I collect the output using output.collect(intWritable,
TupleWritable) its giving me empty tuple.
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable key, Text value,
As the doc on
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/join/TupleWritable.htmlstates,
I don't think you should be relying on the join package's
TupleWritable directly and should instead either use your own tested
implementation or an alternative, better serializer such as
I was referring to https://issues.apache.org/jira/browse/MAPREDUCE-2374
Dave Shine
Sr. Software Engineer
321.939.5093 direct | 407.314.0122 mobile
CI Boost™ Clients Outperform Online™ www.ciboost.com
-Original Message-
From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
Sent:
Thanks Harsh,
Actually I wanted tuplewritable for the MatrixMultiplication Job in Mahout
which takes (intWritable, TupleWritable) as input but now I have created
input file with (intWritable, VectorWritable) which worked perfectly fine
for me.
Thanks
Stuti
From: Harsh J
Hi all,
I need some help to get the Sao Paulo Hadoop users group in the
HadoopUserGroups wiki page.
Either giving me the rights or making the update will do.
My id: PauloMagalhaes
What I want to add:
=== South America ===
* [http://www.meetup.com/SaoPauloHUG/| Sao Paulo HUG]: Hadoop Users
Hi,
No. I didn't find any reference to a working sample. I also didn't find any
JIRA that asks for a migration of this package to the new API. Not sure
why. I have asked on the dev list.
Thanks
hemanth
On Mon, Jan 14, 2013 at 6:25 PM, Michael Forage
michael.for...@livenation.co.uk wrote:
Hi all,
I need help to get the Sao Paulo Hadoop users group in the HadoopUserGroups
wiki page. Either giving me the rights or making the update will do.
My id: PauloMagalhaes
What I want to add:
=== South America ===
* [http://www.meetup.com/SaoPauloHUG/| Sao Paulo HUG]: Hadoop Users Group in
Thanks. I will see if I can migrate from 1.0.3 to something more recent...
JM
2013/1/14, Dave Shine dave.sh...@channelintelligence.com:
I was referring to https://issues.apache.org/jira/browse/MAPREDUCE-2374
Dave Shine
Sr. Software Engineer
321.939.5093 direct | 407.314.0122 mobile
CI
I've added you in as a wiki contributor - feel free to edit the HUGs page
by self. Thanks for taking the time to contribute!
On Mon, Jan 14, 2013 at 7:01 PM, Paulo E. V. Magalhaes
paulo.magalh...@gmail.com wrote:
Hi all,
I need help to get the Sao Paulo Hadoop users group in the
Done. You can proceed and edit it in now.
On Mon, Jan 14, 2013 at 7:21 PM, Paulo E. V. Magalhaes
paulo.magalh...@gmail.com wrote:
Hi all,
I need some help to get the Sao Paulo Hadoop users group in the
HadoopUserGroups wiki page.
Either giving me the rights or making the update will do.
Hi ESGLinux,
In production, you need to run QJM on at least 3 nodes. You also need
to run ZKFC on at least 3 nodes. You can run them on the same nodes
if you like, though.
Of course, none of this is needed to set up an example cluster. If
you just want to try something out, you can run
On Mon, Jan 14, 2013 at 11:49 AM, Colin McCabe cmcc...@alumni.cmu.edu wrote:
Hi ESGLinux,
In production, you need to run QJM on at least 3 nodes. You also need
to run ZKFC on at least 3 nodes. You can run them on the same nodes
if you like, though.
Er, this should read You also need to run
Hi Pavel,
There's some information about the need for 127.0.1.1 here:
http://www.leonardoborda.com/blog/127-0-1-1-ubuntu-debian/
If you have more questions about this, I recommend asking on a
Kerberos-related mailing list (perhaps the specific one for your
Kerberos implementation).
best,
Colin
I'm trying to run a Map-only job using .gz input format. For testing, I
have one compressed log file in the input directory. If the file is
un-zipped, the code works fine.
Watching the jobs with .gz input via the job tracker shows that the
mapper apparently has read the correct number of records
It's hard to speculate on the minimal information you've provided so
far. Seems like it's time to break out the performance analysis
toolkit and see what's going on differently between the fast and the
slow nodes. I'd look at raw performance with dd, then watch behavior
during mapper runs with
Hi,
Probably a very lame question.
I have two documents and I want to find the overlap of both documents in
map reduce fashion and then compare the overlap (lets say I have some
measure to do that)
SO this is what I am thinking:
1) Run the normal wordcount job on one document (
Hello All,
I am trying to partition data and sort it in hadoop streaming.
Most of the time the data is sorted and partitioned correctly but if I run
multiple times sometimes data goes to other partition
The data looks like
asdas 0 ada
asdas 1 asd
12123 1 ccc
12123 0 xxx
hadoop jar
Hello,
Is there a standard way to prevent the failure of Namenode crash in a
Hadoop cluster?
or what is the standard or best practice for overcoming the Single point
failure problem of Hadoop.
I am not ready to take chances on a production server with Hadoop 2.0 Alpha
release, which claims to
Hi Panshul,
Usually for reliability there will be multiple dfs.name.dir configured. Of
which one would be a remote location such as a nfs mount.
So that even if the NN machine crashes on a whole you still have the fs image
and edit log in nfs mount. This can be utilized for reconstructing the
Hi Jamal
I believe a reduce side join is what you are looking for.
You can use MultipleInputs and achieve a reduce side join to achieve this.
http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html
Regards
Bejoy KS
Sent from remote device, Please excuse typos
Hi Terry
When the file is unzipped and zipped, what is the number of map tasks running
in each case?
If the file is large, I assume the below should be the case.
gz is not splttable compression codec so the whole file would be processed by a
single mapper. And this might be causing the job to
thank you for the reply.
Is there a way with which I can configure my cluster to switch to the
Secondary Name Node automatically in case of the Primary Name Node failure?
When I run my current Hadoop, I see the primary and secondary both Name
nodes running. I was wondering what is that Secondary
Hi Panshul
SecondaryNameNode is rather known as check point node. At periodic intervals it
merges the editlog from NN with FS image to prevent the edit log from growing
too large. This is its main functionality.
At any point the SNN would have the latest fs image but not the updated edit
log.
Hello Bejoy,
Thank you for the information.
about the Hadoop HA 2.x releases, they are in Alpha phase and I cannot use
them for production. For my requirements, the cluster is supposed to be
extremely Available. Availability is of highest concern. I have looked into
different distributions as
Hello,
I have another idea regarding solving the single point failure of
Hadoop...
What If I have multiple Name Nodes setup and running behind a load balancer
in the cluster. So this way I can have multiple Name Nodes at the same IP
Address of the load balancer. Which resolves the problem of
I am not sure if this is possible as in 0.2X or 1.0 releases of Hadoop .
On Tuesday, January 15, 2013, Panshul Whisper wrote:
Hello,
I have another idea regarding solving the single point failure of
Hadoop...
What If I have multiple Name Nodes setup and running behind a load
balancer in
Inline
On Mon, Jan 14, 2013 at 7:48 PM, Panshul Whisper ouchwhis...@gmail.comwrote:
Hello,
I have another idea regarding solving the single point failure of
Hadoop...
What If I have multiple Name Nodes setup and running behind a load
balancer in the cluster. So this way I can have
Its very rare to observe an NN crash due to a software bug in production.
Most of the times its a hardware fault you should worry about.
On 1.x, or any non-HA-carrying release, the best you can get to safeguard
against a total loss is to have redundant disk volumes configured, one
preferably over
The following log tells you the exact error:
*JVMDUMP013I Processed dump event systhrow, detail
java/lang/OutOfMemoryError.*
* *
*Exception in thread Thread-7 java.lang.OutOfMemoryError*
*at ApplicationMaster.readMessage(**ApplicationMaster.java:241)*
*at
see if this helps you
https://github.com/nitinpawar/hadoop/tree/master/installation
not concrete but should be able to get a working hadoop setup
On Tue, Jan 15, 2013 at 11:55 AM, Shagun Bhardwaj shagun...@gmail.comwrote:
Hi,
I am not able to install Apache Hadoop in a Pseudo-distributed
Check this out it maybe helpful , but this illustrates on ubuntu hope they are
almost same .
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Shagun Bhardwaj shagun...@gmail.com wrote:
Hi,
I am not able to install Apache Hadoop in a
34 matches
Mail list logo