I have a need of reducers because i am writing multiple outputs/
you can write multiple outputs in mapper too - see multipleoutputformat
On Mon, Oct 8, 2012 at 4:29 PM, Arpit Gupta ar...@hortonworks.com wrote:
i would recommended using the oracle jdk.
oracle version didn't help
Also from your email below you mention that mapred.child.java.opts and
mapred.child.ulimit were added to try to solve this problem. Are you setting
Hi Suneel
You can get the latest stable versions of hadoop from the following url
http://hadoop.apache.org/releases.html#Download
to download choose a mirror and slect the stable versions (the ones Harsh
suggested) you like to go for. (the 1.0.x releases are the current stable
versions)
Hello
When we create a jar file for hadoop programs from command prompt it runs
faster. When we create a jar file from netbeans it runs slower. We could not
understand the problem.
This is important as we are trying to work with hadoop and CUDA (jcuda).We
could create a jar file only using
Hi,
If it is a runnable jar you are creating from netbeans Check only the
necessary dependencies are added.
Cheers!
Manoj.
On Tue, Oct 16, 2012 at 11:38 AM, sudha sadhasivam
sudhasadhasi...@yahoo.com wrote:
Hello
When we create a jar file for hadoop programs from command prompt it runs
The code executes, but time taken for execution is high
Does not show any advantages in two levels of parallelism
G Sudha
--- On Tue, 10/16/12, Manoj Babu manoj...@gmail.com wrote:
From: Manoj Babu manoj...@gmail.com
Subject: Re: Hadoop and CUDA
To: user@hadoop.apache.org
Date: Tuesday, October
Hi,
I've not tried this on S3. However, the directory mentioned in the
exception is based on the value of this particular configuration
key: mapreduce.jobtracker.staging.root.dir. This defaults
to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
location and try ?
Thanks
Thanks you so much for confirming that.
On Mon, Oct 15, 2012 at 9:25 PM, Harsh J ha...@cloudera.com wrote:
Patai,
My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.
On Tue, Oct 16,
Have bought it but now need to cancel trips.
can give away with 30% discount.
please contact this email or call 408-821-5915 asap.
Great! Glad the problem is solved.
You're right - the object returned by iterator.next() is re-used too.
So yes, you would need to clone in this case and you'd have no choice
but to create new objects.
Please be sure though that you really do need to store values in a
list to do what you're
Dne 16.10.2012 1:13, Vinod Kumar Vavilapalli napsal(a):
Which version are you running?
it was branch-0.23 but after i updated to trunk or latest branch-0.23 it
seems to work fine now.
Hi all,
I installed Hadoop HDFS on 3 nodes, a namenode and 2 datanodes,
when i want to start dfs processes, Only secondaryNameNode is launched but the
namenode datanodes processes doesn't work
there is a print screen to illustrate what a said
Thanks in advance
Regards
Khaled
Hi Khaled,
I cant' find any attachment. Also, could you please provide us the
logs??It seems there is some config related issue.
Regards,
Mohammad Tariq
On Tue, Oct 16, 2012 at 2:48 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote:
Hi all,
I installed Hadoop HDFS on 3 nodes, a
Could you please post the logs ?
Regards,
Mohammad Tariq
On Tue, Oct 16, 2012 at 2:52 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote:
Hi all,
I installed Hadoop HDFS on 3 nodes, a namenode and 2 datanodes,
when i want to start dfs processes, Only secondaryNameNode is launched but
Yes, I know that keeping an in-memory collection ins't a good idea.
The problem is that I need to perform a join, so there is no other
possibilities! :(
Cheers,
Alberto
On 16 October 2012 11:08, Dave Beech dbe...@apache.org wrote:
Great! Glad the problem is solved.
You're right - the object
There is also in attachement the core-site and hdfs-sites files
thanks in advance
Regards
Khaled
From: khaled-...@hotmail.com
To: user@hadoop.apache.org
Subject: RE: datanodes doesn't work in HDFS
Date: Tue, 16 Oct 2012 11:41:24 +0200
there is the prin screen is forgot it :)
and also
Change the permissions of the directories where you are planning to put
your dfs.data.dir and dfs.naem.dir to 755 and start again.
Regards,
Mohammad Tariq
On Tue, Oct 16, 2012 at 3:11 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote:
there is the prin screen is forgot it :)
and also
You are welcome. Also add hadoop.tmp.dir property in your core-site.xml
file.
Regards,
Mohammad Tariq
On Tue, Oct 16, 2012 at 3:31 PM, Khaled Ben Bahri khaled-...@hotmail.comwrote:
Thanks a lot
it work perfectly
Regards,
Khaled
--
From:
I think these blog posts will answer your question:
http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html
http://www.technology-mania.com/2011/05/s3-as-input-or-output-for-hadoop-mr.html
On Tue, Oct 16, 2012 at 1:30 PM, sudha sadhasivam sudhasadhasi...@yahoo.com
wrote:
Hi All,
Had anyone tried installing Hadoop on mac pc..if yes can u please share the
installation steps..
Thanks in advance..
Thanks,
Suneel
Sent from my iphone
Suneel,
What version are you trying to run? Following regular tarball
instructions on a Mac mostly works just fine.
On Tue, Oct 16, 2012 at 4:20 PM, suneel hadoop suneel.bigd...@gmail.com wrote:
Hi All,
Had anyone tried installing Hadoop on mac pc..if yes can u please share the
installation
Because you did not set defaultFS in conf, so you need to explicit indicate
the absolute path (include schema) of the file in S3 when you run a MR job.
2012/10/16 Rahul Patodi patodirahul.had...@gmail.com
I think these blog posts will answer your question:
Hi Harsh,
Thanks for ur quick turn around,
the mac version is 10.7.4
and hadoop version which im trying is hadoop-0.21.0,
please share if u have any instructions step by step..
Thanks,
Suneel
On Tue, Oct 16, 2012 at 4:26 PM, Harsh J ha...@cloudera.com wrote:
Suneel,
What version are you
+1
Installing from tarball by usual method is fine for mac os. One issue
to be aware of is https://issues.apache.org/jira/browse/HADOOP-7489
(but even that doesn't stop it working)
On 16 October 2012 11:56, Harsh J ha...@cloudera.com wrote:
Suneel,
What version are you trying to run? Following
Thanks a lot Dave..Bravooo..pat on ur back..
On Tue, Oct 16, 2012 at 4:35 PM, Dave Beech dbe...@apache.org wrote:
Instructions for single node operation:
http://hadoop.apache.org/docs/r0.21.0/single_node_setup.html
Instructions for cluster:
can u please share the link where I can download..so that i will be in
correct page..else I will be finding in wring page..
Thanks a lot dude..
On Tue, Oct 16, 2012 at 4:40 PM, Harsh J ha...@cloudera.com wrote:
Suneel,
Note though that the 0.21.0 version is unsupported and was abandoned.
Hi, there
Is there any chance set mapred.reducel.tasks=20 doesn't work in
hadoop 0.20.2?
Thanks
Yue
Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.
The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on
As far as I know weka cannot be run on hadoop directly.
What can be done is if your algorithm first generats a model based on a
training data initially, then you can run your training offline on your laptop
and serialize, i.e. write the trained model in a file. Now, put this model file
on hdfs
Hi Abhishek,
I have also tried using WEKA SMO, however it take too long (I waited
for more than 6 days ) for training for set of more than million instances.
However logistic regression could come out with model in 20 mins.
This is pretty fast!
My problem is I can use model as is in
Hello Hemanth,
I set the hadoop staging directory to s3 location. However, it
complains. Below is the error
12/10/16 10:22:47 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
Exception in thread main java.lang.IllegalArgumentException: Wrong FS:
One question,
Can I use both file systems at the same time (hdfs and s3)?
According to this
linkhttp://www.mail-archive.com/core-user@hadoop.apache.org/msg03481.html,
I cannot.
On Tue, Oct 16, 2012 at 10:32 AM, Parth Savani pa...@sensenetworks.comwrote:
Hello Hemanth,
I set
Please elaborate a bit more.
Are you asking about Hive? It should work but there are certain
queries you can't change the Num(1) reducer counts for - which is by
design in a few cases.
P.s. There's a typo there. Its mapred.reduce.tasks, not
mapred.reducel.tasks (no l).
On Tue, Oct 16, 2012 at
my Fault. I should ask on Hive mail list
On Tue, Oct 16, 2012 at 10:49 AM, Harsh J ha...@cloudera.com wrote:
Please elaborate a bit more.
Are you asking about Hive? It should work but there are certain
queries you can't change the Num(1) reducer counts for - which is by
design in a few
Parth,
I notice in the below stack trace that the LocalJobRunner, instead of the
JobTracker is being used. Are you sure this is a distributed cluster ?
Could you please check the value of mapred.job.tracker ?
Thanks
Hemanth
On Tue, Oct 16, 2012 at 8:02 PM, Parth Savani
Hi Everyone;
I have a windows hadoop cluster consists of 8 slaves 1 master node. My
hadoop program is a collection of recursive jobs. I create 14 map, 14
reduce tasks in each job. My files are up to 10mb.
My problem is that all jobs are waiting at the end of job. Map %100 Reduce
%100 seen on
Take a look at this for mac installation:
http://denizdemir.com/2012/01/18/setting-up-hadoop-on-macosx-lion-single-node/
On Oct 16, 2012, at 4:13 AM, suneel hadoop suneel.bigd...@gmail.com wrote:
can u please share the link where I can download..so that i will be in
correct page..else I
On Oct 15, 2012, at 12:27 PM, Dave Beech wrote:
This only happens in the new mapreduce API - in the older mapred
API you get the first key, and it appears to stay the same during the
loop.
It's sometimes useful behaviour, but it's confusing how the two APIs
don't act the same.
Yes, it is
Hi
I have read scattered documentation across the net which mostly say HDFS
doesn't go well with SAN being used to store data. While some say, it is an
emerging trend. I would love to know if there have been any tests performed
which hint on what aspects does a direct storage excels/falls
Hi,
I need to cancel my trip to the conference, and I have a pass for the two
days, Wednesday and Thursday, Oct. 24-25. My conference reservation was
$876.
If anybody is interested, please contact me directly.
Thank you. Sincerely,
Mark
I have a single linux node in which i installed 0.23.3 in pseudo-mode can i
test federation configuration and functionality using this or will i have
to install hadoop in a cluster with atleast 3 linux nodes ..incase of
my single linux box i have 3 ips with me.
Hi, all
When I tried to compile Hadoop 1.0.3, it tells me that
src/core/org/apache/hadoop/fs/kfs/KFSImpl.java:30: package
org.kosmix.kosmosfs.access does not exist
Can anyone tell me why this issue happen?
Best,
--
Nan Zhu
School of Computer Science,
McGill University
It will be difficult to make a SAN work well for Hadoop, but not impossible. I
have done direct comparisons (but not published them yet). Direct local
storage is likely to have much more capacity and more total bandwidth. But you
can do pretty well with a SAN if you stuff it with the
I've seen this happen when the native kfs libs aren't in your java library
path. Add them to both LD_LIBRARY_PATH and -Djava.library.path
Sent from my iPhone
On Oct 16, 2012, at 1:55 PM, Nan Zhu zhunans...@gmail.com wrote:
Hi, all
When I tried to compile Hadoop 1.0.3, it tells me that
Yes, I fix this issue manually by downloading kfs library and copy it to lib/,
but immediately fall into another package not found issue
why maven didn't download those depended jars to lib/ directory for me? several
days ago, maven did this very well...but from yesterday, it rejects to work,
Hi
not sure if my previous message made it as I just subscribed
I have read scattered documentation across the net which mostly say HDFS
doesn't go well with SAN being used to store data. While some say, it is an
emerging trend. I would love to know if there have been any tests performed
which
Adding to this. Locality is very important for MapReduce applications. One
might not see much of a difference for small MapReduce jobs running on
direct attached storage vs SAN, but when you cluster grows or you find jobs
which are heavy on IO, you would see quite a bit of difference. One thing
This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler. We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on
Yes, for MR, my impression is typically the n/w utilization is next to none
during map and reduce tasks but jumps during shuffle. With a SAN, I would
assume there is no such separation. There will be network activity all over the
job’s time window with shuffle probably doing more than what it
RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
other problems). Read this paper for details:
Disks are like Snowflakes: No Two Are Alike
www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
For best performance configure your storage as JBOD instead of RAID,
format
You can try out federation by create 3 different conf directories and
starting 3 different NameNodes out of those configurations. These
configurations should make sure they have different directories and port
numbers. If you want to just give it a try, it is easier to spawn 3 VMs and
use them as
Thanks everyone, Seem like i hit the dead end.
It's kind of funny when i read that jira; run it 4 time and everything
will work.. where that magic number from..lol
respects
On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta ar...@hortonworks.com wrote:
currently i run a map-reduce job that reads from a single path with a glob:
/data/*
i am considering replacing this one glob path with an explicit list of all
the paths (so that i can check for _SUCCESS files in the subdirs and
exclude the subdirs that don't have this file, to avoid reading from
There is no limit in the number of input path you can have for your job. The
more input paths you have the more time is spent in calculating job split and
hence startup cost of the job.
You could write your own InputFormat which can do the filtering base on your
use case. Take a look at
Hi Manoj,
If the data is the same for both tests and the number of mappers is
fewer, then each mapper has more (uncompressed) data to process. Thus
each mapper should take longer and overall execution time should
increase.
As a simple example: if your data is 128MB uncompressed it may use 2
55 matches
Mail list logo