@Jeff, I think JobConf is already deprecated
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob;
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl; can be used instead.
Regards
Akash Deep Shakya "OpenAK"
FOSS Nepal Community
akashakya at gmail dot com
~ Failure to prepare is preparing t
Hi peeps,
I'm trying to run elephant-bird code in eclipse, specifically (
http://github.com/kevinweil/elephant-bird/blob/master/examples/src/pig/json_word_count.pig),
but I'm not sure how to set the core-site.xml properties via eclipse. I
tried adding them to VM args but am still getting the foll
Hi,
I am trying to set up a development cluster for hadoop 0.20.1 in eclipse.
I used this url
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/ to
check out the build. I compiled "compile , compile-core-test , and
eclipse-files" using ant. Then when I build the project , I am gett
We're excited to announce Surge, the Scalability and Performance
Conference, to be held in Baltimore on Sept 30 and Oct 1, 2010. The
event focuses on case studies that demonstrate successes (and failures)
in Web applications and Internet architectures.
Our Keynote speakers include John Allspaw an
There's a class org.apache.hadoop.mapred.jobcontrol.Job which is a
wapper of JobConf. And You and dependent jobs to it. Then put it to
JobControl.
On Mon, Jun 14, 2010 at 9:55 AM, Gang Luo wrote:
> Hi,
> According to the doc, JobControl can maintain the dependency among different
> jobs and o
HI,
I am trying to use hadoop streaming and there seems to be a few bad records
in my data.
I'd like to use Skipbadrecords but I can't find how to use it in hadoop
streaming.
Is it at all possible?
Thanks in advance.
Is there something else I could read about setting up short-lived
Hadoop clusters on virtual machines? I have no experience with VMs at
all. I see there is quite a bit of material about using them to get
Hadoop up and running with a psuedo-cluster on a single machine, but I
don't follow how this st
Unless I am missing something, the Fair Share and Capacity schedulers
sound like a solution to a different problem: aren't they for a
dedicated Hadoop cluster that needs to be shared by lots of people? I
have a general purpose cluster that needs to be shared by lots of
people. Only one of them (me)
Thanks everyone for your replies.
Even though HOD looks like a dead-end I would prefer to use it. I am
just one user of the cluster among many, and currently the only one
using Hadoop. The jobs I need to run are pretty much one-off: they are
big jobs that I can't do without Hadoop, but I might nee
Nice, thanks Brian!
On Jun 14, 2010, at 7:39 AM, Brian Bockelman wrote:
Hey Owen, all,
I find this one handy if you have root access:
http://linux-mm.org/Drop_Caches
echo 3 > /proc/sys/vm/drop_caches
Drops the pagecache, dentries, and inodes. Without this, you can
still get caching effec
On Jun 14, 2010, at 10:57 AM, Russell Brown wrote:
> I'm a new user of Hadoop. I have a Linux cluster with both gigabit ethernet
> and InfiniBand communications interfaces. Could someone please tell me how
> to switch IP communication from ethernet (the default) to InfiniBand? Thanks.
Hado
I'm a new user of Hadoop. I have a Linux cluster with both gigabit
ethernet and InfiniBand communications interfaces. Could someone please
tell me how to switch IP communication from ethernet (the default) to
InfiniBand? Thanks.
--
--
On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann wrote:
> Hi,
>
> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
> a situation where every task scheduled on 2 of the 4 nodes failed.
> Seems like the child jvm crashes. There are no child logs under
> logs/userlogs. T
Hi,
i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into a
situation where every task scheduled on 2 of the 4 nodes failed.
Seems like the child jvm crashes. There are no child logs under logs/userlogs.
Tasktracker gives this:
2010-06-14 09:34:12,714 INFO org.apache.hado
Use ControlledJob class from Hadoop trunk. And run it through JobControl.
Regards
Akash Deep Shakya "OpenAK"
FOSS Nepal Community
akashakya at gmail dot com
~ Failure to prepare is preparing to fail ~
On Mon, Jun 14, 2010 at 10:40 PM, Gang Luo wrote:
> Hi,
> According to the doc, JobControl
Hi,
According to the doc, JobControl can maintain the dependency among different
jobs and only jobs without dependency can execute. How does JobControl maintain
the dependency and how can we indicate the dependency?
Thanks,
-Gang
Hi.
Should be out soon - Tom White is working hard on the release. Note that the
> first release, 0.21.0, will be somewhat of a "development quality" release
> not recommended for production use. Of course, the way it will become
> production-worthy is by less risk-averse people trying it and find
Edward Capriolo wrote:
I have not used it much, but I think HOD is pretty cool. I guess most people
who are looking to (spin up, run job ,transfer off, spin down) are using
EC2. HOD does something like make private hadoop clouds on your hardware and
many probably do not have that use case. As s
On Mon, Jun 14, 2010 at 4:28 AM, Stas Oskin wrote:
> By the way, what about an ability for node to read file which is being
> written by another node?
>
This is allowed, though there are some remaining bugs to be ironed out here.
See https://issues.apache.org/jira/browse/HDFS-1057 for example.
On Mon, Jun 14, 2010 at 4:00 AM, Stas Oskin wrote:
> Hi.
>
> Thanks for clarification.
>
> Append will be supported fully in 0.21.
> >
> >
> Any ETA for this version?
>
Should be out soon - Tom White is working hard on the release. Note that the
first release, 0.21.0, will be somewhat of a "deve
On Mon, Jun 14, 2010 at 8:37 AM, Amr Awadallah wrote:
> Dave,
>
> Yes, many others have the same situation, the recommended solution is
> either to use the Fair Share Scheduler or the Capacity Scheduler. These
> schedulers are much better than HOD since they take data locality into
> considerati
Hey Owen, all,
I find this one handy if you have root access:
http://linux-mm.org/Drop_Caches
echo 3 > /proc/sys/vm/drop_caches
Drops the pagecache, dentries, and inodes. Without this, you can still get
caching effects doing the normal "read and write large files" if the linux
pagecache outs
Indeed. On the terasort benchmark, I had to run intermediate jobs that
were larger than ram on the cluster to ensure that the data was not
coming from the file cache.
-- Owen
Hi Ted,
I mean the new API:
org.apache.hadoop.mapreduce.Job.setInputFormatClass(org.apache.hadoop.mapreduce.InputFormat)
"Job.setInputFormatClass()" only accepts
"org.apache.hadoop.mapreduce.InputFormat"(of which there are several
subclasses, while KeyValueTextInputFormat is not one of them) as it
Have you checked
src/mapred/org/apache/hadoop/mapred/KeyValueTextInputFormat.java ?
On Mon, Jun 14, 2010 at 6:51 AM, Kevin Tse wrote:
> Hi,
> I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the
> process I found that there was no KeyValueTextInputFormat class which
> exists
>
Hi,
I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the
process I found that there was no KeyValueTextInputFormat class which exists
in hadoop-0.19.2. It's so strange that this version of hadoop does not come
with this commonly used InputFormat. I have taken a look at the
"Second
Dave,
Yes, many others have the same situation, the recommended solution is
either to use the Fair Share Scheduler or the Capacity Scheduler. These
schedulers are much better than HOD since they take data locality into
consideration (they don't just spin up 20 TT nodes on machines that have
noth
By the way, what about an ability for node to read file which is being
written by another node?
Or the file must be written and closed completely, before it becomes
available for other nodes?
(AFAIK in 0.18.3 the file appeared as 0 size until it was closed).
Regards.
Hi.
Thanks for clarification.
Append will be supported fully in 0.21.
>
>
Any ETA for this version?
Will it work both with Fuse and HDFS API?
> Also, append does *not* add random write. It simply adds the ability to
> re-open a file and add more data to the end.
>
>
Just to clarify, even with a
Hi,
I changed the default log level of hadoop from INFO to ERROR by
setting the property hadoop.root.logger to error in
/conf/log4j.properties
But when I start namenode, the INFO logs are seen in the log
file. I did a workaround and found that HADOOP_ROOT_LOGGER is
30 matches
Mail list logo