Re: job execution

2010-06-14 Thread Akash Deep Shakya
@Jeff, I think JobConf is already deprecated org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob; org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl; can be used instead. Regards Akash Deep Shakya "OpenAK" FOSS Nepal Community akashakya at gmail dot com ~ Failure to prepare is preparing t

running elephant-bird in eclipse & codec property

2010-06-14 Thread Kim Vogt
Hi peeps, I'm trying to run elephant-bird code in eclipse, specifically ( http://github.com/kevinweil/elephant-bird/blob/master/examples/src/pig/json_word_count.pig), but I'm not sure how to set the core-site.xml properties via eclipse. I tried adding them to VM args but am still getting the foll

setting up hadoop 0.20.1 development environment

2010-06-14 Thread Vidur Goyal
Hi, I am trying to set up a development cluster for hadoop 0.20.1 in eclipse. I used this url http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1/ to check out the build. I compiled "compile , compile-core-test , and eclipse-files" using ant. Then when I build the project , I am gett

CFP for Surge Scalability Conference 2010

2010-06-14 Thread Jason Dixon
We're excited to announce Surge, the Scalability and Performance Conference, to be held in Baltimore on Sept 30 and Oct 1, 2010. The event focuses on case studies that demonstrate successes (and failures) in Web applications and Internet architectures. Our Keynote speakers include John Allspaw an

Re: job execution

2010-06-14 Thread Jeff Zhang
There's a class org.apache.hadoop.mapred.jobcontrol.Job which is a wapper of JobConf. And You and dependent jobs to it. Then put it to JobControl. On Mon, Jun 14, 2010 at 9:55 AM, Gang Luo wrote: > Hi, > According to the doc, JobControl can maintain the dependency among different > jobs and o

How do I configure 'Skipbadrecords' in Hadoop Streaming?

2010-06-14 Thread edward choi
HI, I am trying to use hadoop streaming and there seems to be a few bad records in my data. I'd like to use Skipbadrecords but I can't find how to use it in hadoop streaming. Is it at all possible? Thanks in advance.

Re: Problems with HOD and HDFS

2010-06-14 Thread David Milne
Is there something else I could read about setting up short-lived Hadoop clusters on virtual machines? I have no experience with VMs at all. I see there is quite a bit of material about using them to get Hadoop up and running with a psuedo-cluster on a single machine, but I don't follow how this st

Re: Problems with HOD and HDFS

2010-06-14 Thread David Milne
Unless I am missing something, the Fair Share and Capacity schedulers sound like a solution to a different problem: aren't they for a dedicated Hadoop cluster that needs to be shared by lots of people? I have a general purpose cluster that needs to be shared by lots of people. Only one of them (me)

Re: Problems with HOD and HDFS

2010-06-14 Thread David Milne
Thanks everyone for your replies. Even though HOD looks like a dead-end I would prefer to use it. I am just one user of the cluster among many, and currently the only one using Hadoop. The jobs I need to run are pretty much one-off: they are big jobs that I can't do without Hadoop, but I might nee

Re: Caching in HDFS C API Client

2010-06-14 Thread Arun C Murthy
Nice, thanks Brian! On Jun 14, 2010, at 7:39 AM, Brian Bockelman wrote: Hey Owen, all, I find this one handy if you have root access: http://linux-mm.org/Drop_Caches echo 3 > /proc/sys/vm/drop_caches Drops the pagecache, dentries, and inodes. Without this, you can still get caching effec

Re: Hadoop and IP on InfiniBand (IPoIB)

2010-06-14 Thread Allen Wittenauer
On Jun 14, 2010, at 10:57 AM, Russell Brown wrote: > I'm a new user of Hadoop. I have a Linux cluster with both gigabit ethernet > and InfiniBand communications interfaces. Could someone please tell me how > to switch IP communication from ethernet (the default) to InfiniBand? Thanks. Hado

Hadoop and IP on InfiniBand (IPoIB)

2010-06-14 Thread Russell Brown
I'm a new user of Hadoop. I have a Linux cluster with both gigabit ethernet and InfiniBand communications interfaces. Could someone please tell me how to switch IP communication from ethernet (the default) to InfiniBand? Thanks. -- --

Re: Task process exit with nonzero status of 1 - deleting userlogs helps

2010-06-14 Thread Edward Capriolo
On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann wrote: > Hi, > > i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into > a situation where every task scheduled on 2 of the 4 nodes failed. > Seems like the child jvm crashes. There are no child logs under > logs/userlogs. T

Task process exit with nonzero status of 1 - deleting userlogs helps

2010-06-14 Thread Johannes Zillmann
Hi, i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into a situation where every task scheduled on 2 of the 4 nodes failed. Seems like the child jvm crashes. There are no child logs under logs/userlogs. Tasktracker gives this: 2010-06-14 09:34:12,714 INFO org.apache.hado

Re: job execution

2010-06-14 Thread Akash Deep Shakya
Use ControlledJob class from Hadoop trunk. And run it through JobControl. Regards Akash Deep Shakya "OpenAK" FOSS Nepal Community akashakya at gmail dot com ~ Failure to prepare is preparing to fail ~ On Mon, Jun 14, 2010 at 10:40 PM, Gang Luo wrote: > Hi, > According to the doc, JobControl

job execution

2010-06-14 Thread Gang Luo
Hi, According to the doc, JobControl can maintain the dependency among different jobs and only jobs without dependency can execute. How does JobControl maintain the dependency and how can we indicate the dependency? Thanks, -Gang

Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
Hi. Should be out soon - Tom White is working hard on the release. Note that the > first release, 0.21.0, will be somewhat of a "development quality" release > not recommended for production use. Of course, the way it will become > production-worthy is by less risk-averse people trying it and find

Re: Problems with HOD and HDFS

2010-06-14 Thread Steve Loughran
Edward Capriolo wrote: I have not used it much, but I think HOD is pretty cool. I guess most people who are looking to (spin up, run job ,transfer off, spin down) are using EC2. HOD does something like make private hadoop clouds on your hardware and many probably do not have that use case. As s

Re: Appending and seeking files while writing

2010-06-14 Thread Todd Lipcon
On Mon, Jun 14, 2010 at 4:28 AM, Stas Oskin wrote: > By the way, what about an ability for node to read file which is being > written by another node? > This is allowed, though there are some remaining bugs to be ironed out here. See https://issues.apache.org/jira/browse/HDFS-1057 for example.

Re: Appending and seeking files while writing

2010-06-14 Thread Todd Lipcon
On Mon, Jun 14, 2010 at 4:00 AM, Stas Oskin wrote: > Hi. > > Thanks for clarification. > > Append will be supported fully in 0.21. > > > > > Any ETA for this version? > Should be out soon - Tom White is working hard on the release. Note that the first release, 0.21.0, will be somewhat of a "deve

Re: Problems with HOD and HDFS

2010-06-14 Thread Edward Capriolo
On Mon, Jun 14, 2010 at 8:37 AM, Amr Awadallah wrote: > Dave, > > Yes, many others have the same situation, the recommended solution is > either to use the Fair Share Scheduler or the Capacity Scheduler. These > schedulers are much better than HOD since they take data locality into > considerati

Re: Caching in HDFS C API Client

2010-06-14 Thread Brian Bockelman
Hey Owen, all, I find this one handy if you have root access: http://linux-mm.org/Drop_Caches echo 3 > /proc/sys/vm/drop_caches Drops the pagecache, dentries, and inodes. Without this, you can still get caching effects doing the normal "read and write large files" if the linux pagecache outs

Re: Caching in HDFS C API Client

2010-06-14 Thread Owen O'Malley
Indeed. On the terasort benchmark, I had to run intermediate jobs that were larger than ram on the cluster to ensure that the data was not coming from the file cache. -- Owen

Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
Hi Ted, I mean the new API: org.apache.hadoop.mapreduce.Job.setInputFormatClass(org.apache.hadoop.mapreduce.InputFormat) "Job.setInputFormatClass()" only accepts "org.apache.hadoop.mapreduce.InputFormat"(of which there are several subclasses, while KeyValueTextInputFormat is not one of them) as it

Re: No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Ted Yu
Have you checked src/mapred/org/apache/hadoop/mapred/KeyValueTextInputFormat.java ? On Mon, Jun 14, 2010 at 6:51 AM, Kevin Tse wrote: > Hi, > I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the > process I found that there was no KeyValueTextInputFormat class which > exists >

No KeyValueTextInputFormat in hadoop-0.20.2?

2010-06-14 Thread Kevin Tse
Hi, I am upgrading my code from hadoop-0.19.2 to hadoop-0.20.2, during the process I found that there was no KeyValueTextInputFormat class which exists in hadoop-0.19.2. It's so strange that this version of hadoop does not come with this commonly used InputFormat. I have taken a look at the "Second

Re: Problems with HOD and HDFS

2010-06-14 Thread Amr Awadallah
Dave, Yes, many others have the same situation, the recommended solution is either to use the Fair Share Scheduler or the Capacity Scheduler. These schedulers are much better than HOD since they take data locality into consideration (they don't just spin up 20 TT nodes on machines that have noth

Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
By the way, what about an ability for node to read file which is being written by another node? Or the file must be written and closed completely, before it becomes available for other nodes? (AFAIK in 0.18.3 the file appeared as 0 size until it was closed). Regards.

Re: Appending and seeking files while writing

2010-06-14 Thread Stas Oskin
Hi. Thanks for clarification. Append will be supported fully in 0.21. > > Any ETA for this version? Will it work both with Fuse and HDFS API? > Also, append does *not* add random write. It simply adds the ability to > re-open a file and add more data to the end. > > Just to clarify, even with a

changing my hadoop log level is not getting reflected in logs

2010-06-14 Thread Gokulakannan M
Hi, I changed the default log level of hadoop from INFO to ERROR by setting the property hadoop.root.logger to error in /conf/log4j.properties But when I start namenode, the INFO logs are seen in the log file. I did a workaround and found that HADOOP_ROOT_LOGGER is