Re: TaskTracker not starting on all nodes

2011-03-04 Thread MANISH SINGLA
Hii all, I am trying to setup a 2 node cluster...I have configured all the files as specified in the tutorial I am refering to...I copied the public key to the slave's machine...but when I ssh to the slave from the master, it asks for password everytime...kindly help... On Fri, Mar 4, 2011 at

Hadoop Developer Question

2011-03-04 Thread Brady Banks
Hi All, I have a question I need some help with. How can I find Hadoop developers in the Bay Area who are looking for jobs? I have multiple companies begging for good candidates and I'm having a very hard time tracking people down which usually isn't an issue. If you know of any forums or

k-means

2011-03-04 Thread MANISH SINGLA
Hey ppl... I need some serious help...I m not able to run kmeans code in hadoop...does anyone have a running code...that they would have tried... Regards MANISH

RE: Hadoop Developer Question

2011-03-04 Thread Habermaas, William
How come all the Hadoop jobs are in the Bay area? Doesn't anybody use Hadoop in NY? -Original Message- From: Brady Banks [mailto:br...@venatorventures.com] Sent: Thursday, March 03, 2011 12:49 PM To: common-user@hadoop.apache.org Subject: Hadoop Developer Question Hi All, I have a

Re: Hadoop Developer Question

2011-03-04 Thread Brian Bockelman
Try living in Nebraska... By time the fun stuff gets here, it's COBOL. :) On Mar 4, 2011, at 7:59 AM, Habermaas, William wrote: How come all the Hadoop jobs are in the Bay area? Doesn't anybody use Hadoop in NY? -Original Message- From: Brady Banks

Re: k-means

2011-03-04 Thread James Seigel
Mahout project? Sent from my mobile. Please excuse the typos. On 2011-03-04, at 6:41 AM, MANISH SINGLA coolmanishh...@gmail.com wrote: Hey ppl... I need some serious help...I m not able to run kmeans code in hadoop...does anyone have a running code...that they would have tried... Regards

Re: Hadoop Developer Question

2011-03-04 Thread Alex Dorman
I have multiple hadoop jobs open (developers, admin). We are using Hadoop in production and willing to train right candidates. Contact me directly. Thanks, Alex ador...@proclivitysystems.com On Mar 4, 2011, at 8:59 AM, Habermaas, William william.haberm...@fatwire.com wrote: How come all

Re: k-means

2011-03-04 Thread MANISH SINGLA
are u suggesting me that??? if yes can u plzzz tell me the steps to use that...because I havent used it yet...a quick reply will really be appreciated... Thanx Manish On Fri, Mar 4, 2011 at 7:39 PM, James Seigel ja...@tynt.com wrote: Mahout project? Sent from my mobile. Please excuse the

map/reduce job conf settings ignored? (bug?)

2011-03-04 Thread Brendan W.
I've passed in the following line through the code for my m/r job: conf.set(mapred.tasktracker.reduce.tasks.maximum, 4), wanting that to override the value of 8 set in my mapred-conf.site files on my ten-node cluster. Sure enough, when I look at the job configuration in the web UI for the

Re: k-means

2011-03-04 Thread James Seigel
I am not near a computer so I won't be able to give you specifics. So instead, I'd suggest Manning's mahout in action book which is in their early access form for some basic direction. Disclosure: I have no relation to the publisher or authors. Cheers James Sent from my mobile. Please excuse

Re: k-means

2011-03-04 Thread Mike Nute
James, Do you know how to get a copy of this book in early access form? Amazon doesn't release it until may. Thanks! Mike Nute --Original Message-- From: James Seigel To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Re: k-means Sent: Mar 4, 2011 9:46 AM

RE: Unable to use hadoop cluster on the cloud

2011-03-04 Thread praveen.peddi
Thanks Adarsh for the reply. Just to clarify the issue a bit, I am able to do all operations (-copyFromLocal, -get -rmr etc) from the master node. So I am confident that the communication between all hadoop machines is fine. But when I do the same operation from another machine that also has

Re: k-means

2011-03-04 Thread James Seigel
Manning site. You can download it and get a paper copy when it comes out if you'd like. James Sent from my mobile. Please excuse the typos. On 2011-03-04, at 7:53 AM, Mike Nute mike.n...@gmail.com wrote: James, Do you know how to get a copy of this book in early access form? Amazon

Re: k-means

2011-03-04 Thread Mike Nute
Awesome. Thanks so much. -Original Message- From: James Seigel ja...@tynt.com Date: Fri, 4 Mar 2011 07:54:39 To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Re: k-means Manning site. You can download it and get a paper copy

Re: TaskTracker not starting on all nodes

2011-03-04 Thread James Seigel
Sounds like just a bit more work on understanding ssh will get you there. What you are looking for is getting that public key into authorized_keys James Sent from my mobile. Please excuse the typos. On 2011-03-04, at 2:58 AM, MANISH SINGLA coolmanishh...@gmail.com wrote: Hii all, I am

Re: k-means

2011-03-04 Thread Ted Dunning
Since you asked so nicely: http://www.manning.com/owen/ On Fri, Mar 4, 2011 at 6:52 AM, Mike Nute mike.n...@gmail.com wrote: James, Do you know how to get a copy of this book in early access form? Amazon doesn't release it until may. Thanks! Mike Nute --Original Message-- From:

Re: hadoop balancer

2011-03-04 Thread He Chen
Thank you very much Icebergs. I rewrite the balancer. Now, given a directory like /user/foo/, I can balance the blocks under this directory evenly to every node in the cluster. Best wishes! Chen On Thu, Mar 3, 2011 at 11:14 PM, icebergs hkm...@gmail.com wrote: try this command hadoop fs

Re: map/reduce job conf settings ignored? (bug?)

2011-03-04 Thread Harsh J
As the name goes (mapred.tasktracker.*), that is a TaskTracker initialization property, not a Job-specific one. Can only be changed in mapred-site.xml (after which the TaskTracker may need to be restarted to reflect the change) -- Harsh J www.harshj.com

Re: map/reduce job conf settings ignored? (bug?)

2011-03-04 Thread Keith Wiley
Also note that the property may have been set up as final meaning you can't override it (without reconfiguring the cluster). I had a problem like this. On Mar 4, 2011, at 8:43 AM, Harsh J wrote: As the name goes (mapred.tasktracker.*), that is a TaskTracker initialization property, not a

HDFS file content restrictions

2011-03-04 Thread Kelly Burkhart
Hello, are the restrictions to the size or width of text files placed in HDFS? I have a file structure like this: text keytabtext datanl It would be helpful if in some circumstances I could make text data really large (large meaning many KB to one/few MB). I may have some rows that have a very

Re: HDFS file content restrictions

2011-03-04 Thread Harsh J
HDFS does not operate with records in mind. There shouldn't be too much of a problem with having a few MBs per record in text files (provided, 'few MBs' means a (very) small fraction of the file's blocksize value). On Sat, Mar 5, 2011 at 1:00 AM, Kelly Burkhart kelly.burkh...@gmail.com wrote:

RE: hadoop installation problem(single-node)

2011-03-04 Thread Tanping Wang
Try $HADOOP_HOME/bin/hadoop namenode -format or maybe consider export PATH=$HADOOP_HOME/bin:$PATH Regards, Tanping -Original Message- From: Manish Yadav [mailto:manish.ya...@orkash.com] Sent: Wednesday, March 02, 2011 1:00 AM To: core-u...@hadoop.apache.org Subject: hadoop installation

Re: HDFS file content restrictions

2011-03-04 Thread Kelly Burkhart
On Fri, Mar 4, 2011 at 1:42 PM, Harsh J qwertyman...@gmail.com wrote: HDFS does not operate with records in mind. So does that mean that HDFS will break a file at exactly blocksize bytes? Map/Reduce *does* operate with records in mind, so what happens to the split record? Does HDFS put the

Re: HDFS file content restrictions

2011-03-04 Thread Harsh J
The class responsible for reading records as lines off a file, seek in to the next block in sequence until the newline. This behavior, and how it affects the Map tasks, is better documented here (see the TextInputFormat example doc): http://wiki.apache.org/hadoop/HadoopMapReduce On Sat, Mar 5,

Re: HDFS file content restrictions

2011-03-04 Thread Brian Bockelman
If, for example, you have a record that contains 20MB in one block and 1MB in another, Map/Reduce will feed you the entire 21MB record. If you are lucky and the map is executing on a node with the 20MB block, MapReduce will transfer 1MB out of HDFS for you. This is glossing over some details,

Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
We are having difficulties running a Hadoop program making calls to external libraries - but this occurs only when we run the program on our cluster and not from within Eclipse where we are apparently running in Hadoop's standalone mode. This program invokes the Open Computer Vision libraries

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Harsh J
I'm only guessing here and might be grossly wrong about my hunch. Are you reusing your JVMs across tasks? Could you see if this goes away without reuse? Would be good if you can monitor your launched Tasks (JConsole/VisualVM/etc.) to affirm that there's either a code-based memory leak or some

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
I don't know if putting native-code .so files inside a jar works. A native-code .so is not classloaded in the same way .class files are. So the correct .so files probably need to exist in some physical directory on the worker machines. You may want to doublecheck that the correct directory on the

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
Actually, I just misread your email and missed the difference between your 2nd and 3rd attempts. Are you enforcing min/max JVM heap sizes on your tasks? Are you enforcing a ulimit (either through your shell configuration, or through Hadoop itself)? I don't know where these cannot allocate memory

RE: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Ratner, Alan S (IS)
Aaron, Thanks for the rapid responses. * ulimit -u unlimited is in .bashrc. * HADOOP_HEAPSIZE is set to 4000 MB in hadoop-env.sh * Mapred.child.ulimit is set to 2048000 in mapred-site.xml * Mapred.child.java.opts is set to -Xmx1536m in mapred-site.xml

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Brian Bockelman
Hi, Check your kernel's overcommit settings. This will prevent the JVM from allocating memory even when there's free RAM. Brian On Mar 4, 2011, at 3:55 PM, Ratner, Alan S (IS) wrote: Aaron, Thanks for the rapid responses. * ulimit -u unlimited is in .bashrc. *

Question on Error : could only be replicated to 0 nodes, instead of 1

2011-03-04 Thread Ted Pedersen
Greetings all, I get the following error at seemingly irregular intervals when I'm trying to do the following... hadoop fs -put /scratch1/tdp/data/* input (The data is a few hundred files of wikistats data, about 75GB in total). 11/03/04 15:55:05 WARN hdfs.DFSClient: DataStreamer Exception:

Re: EXT :Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Lance Norskog
I have never heard of putting a native code shared library in a Java jar. I doubt that it works. But it's a cool idea! A Unix binary program loads shared libraries from the paths given in the environment variable LD_LIBRARY_PATH. This has to be set to the directory with the OpenCV .so file

Using SequenceFile instead of TextFiles

2011-03-04 Thread maha
Hi, I have 2 questions: 1) Is a SequenceFile more efficient than TextFiles for input? ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ? 2) If I decided to use SequenceFiles as

Re: Using SequenceFile instead of TextFiles

2011-03-04 Thread Harsh J
Hi, On Sat, Mar 5, 2011 at 9:03 AM, maha m...@umail.ucsb.edu wrote: Hi, I have 2 questions: 1) Is a  SequenceFile more efficient than TextFiles for input?  ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input

Digital Signal Processing Library + Hadoop

2011-03-04 Thread Roger Smith
All - I wonder if any of you have integrated a DSP library with Hadoop. We are considering using Hadoop to processing time series data, but don't want to write standard DSP functions. Roger.

Re: Using SequenceFile instead of TextFiles

2011-03-04 Thread maha
Thanks again Harsh, I actually got the book 2 days ago, but didn't have time to read it yet. Maha On Mar 4, 2011, at 7:54 PM, Harsh J wrote: Hi, On Sat, Mar 5, 2011 at 9:03 AM, maha m...@umail.ucsb.edu wrote: Hi, I have 2 questions: 1) Is a SequenceFile more efficient than

Re: Digital Signal Processing Library + Hadoop

2011-03-04 Thread Ted Dunning
Come on over to the Apache Mahout mailing list for a warm welcome at least. We don't have a lot of time series stuff but would be very interested in hearing more about what you need and would like to see if there are some common issues that we might work on together. On Fri, Mar 4, 2011 at 9:05