understanding hadoop job submission

2012-04-25 Thread Arindam Choudhury
Hi, I am new to hadoop and I am trying to understand hadoop job submission. We submit the job using: hadoop jar some.jar name input output this in turn invoke the RunJar . But in RunJar I can not find any JobSubmit() or any call to JobClient. Then, how the job gets submitted to the

RE: understanding hadoop job submission

2012-04-25 Thread Devaraj k
Hi Arindam, hadoop jar jarFileName MainClassName The above command will not submit the job. This command only executes the jar file using the Main Class(Main-class present in manifest info if available otherwise class name(i.e MainClassName in the above command) passed as an argument. If

Re: understanding hadoop job submission

2012-04-25 Thread Jay Vyas
Yes, the job is submitted by the api calls in map reduce code On Wed, Apr 25, 2012 at 3:56 AM, Devaraj k devara...@huawei.com wrote: Hi Arindam, hadoop jar jarFileName MainClassName The above command will not submit the job. This command only executes the jar file using the Main

Re: understanding hadoop job submission

2012-04-25 Thread Arindam Choudhury
Hi, The code is: public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println(Usage: wordcount in out);

RE: understanding hadoop job submission

2012-04-25 Thread Devaraj k
You can submit the job using any one of the below ways, 1. If you submit the job using JobClient, you need to create JobConf and submit the job using JobClient.runJob(JobConf conf) API. 2. Also you can submit the job by creating instance for Job by passing Configuration object and submit(using

Text Analysis

2012-04-25 Thread karanveer.singh
Hi, I wanted to know if there are any existing API's within Hadoop for us to do some text analysis like sentiment analysis, etc. OR are we to rely on tools like R, etc. for this. Regards, Karanveer This e-mail and any attachments are confidential and intended solely for the addressee and

Re: Distributing MapReduce on a computer cluster

2012-04-25 Thread Merto Mertek
For distribution of load you can start reading some chapters from different types of hadoop scheduler. I have not yet studied other implementation like hadoop, however a very simplified version of distribution concept is the following: a) Tasktracker ask for work (heartbeat consist of a status

The meaning of FileSystem in context of OutputFormat storage

2012-04-25 Thread Jay Vyas
I just saw this line in the javadocs for OutputFormat: Output files are stored in a FileSystemhttp://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html. Seems like an odd sentence. What is the implication here -- is this implying anything other than the obvious ?

Re: Text Analysis

2012-04-25 Thread Robert Evans
Hadoop itself is the core Map/Reduce and HDFS functionality. The higher level algorithms like sentiment analysis are often done by others. Cloudera has a video from HadoopWorld 2010 about it http://www.cloudera.com/resource/hw10_video_sentiment_analysis_powered_by_hadoop/ And there are

Re: Text Analysis

2012-04-25 Thread Jagat
There are Api which you can use , offcourse they are third party. --- Sent from Mobile , short and crisp. On 25-Apr-2012 8:57 PM, Robert Evans ev...@yahoo-inc.com wrote: Hadoop itself is the core Map/Reduce and HDFS functionality. The higher level algorithms like sentiment analysis

Re: The meaning of FileSystem in context of OutputFormat storage

2012-04-25 Thread John George
I think what it means is that the output files can be stored in any of the possible implementation of the FileSystem abstract class depending on the user requirement. So, it could be stored in DistributedFileSystem, LocalFileSystem etc... Regards, John George -Original Message- From:

No Space left on device

2012-04-25 Thread Nuthalapati, Ramesh
Strangely isee the tmp folder has enough space. What else could be the problem ? How much should my tmp space be ? Error: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at

Re: Text Analysis

2012-04-25 Thread Harsh J
I do not know about the implementation existence of something as specific as sentiment analysis, but if you're generally looking at MapReduce for Text processing I highly recommend visiting http://cloud9lib.org On Wed, Apr 25, 2012 at 6:20 PM, karanveer.si...@barclays.com wrote: Hi, I wanted

Re: No Space left on device

2012-04-25 Thread Alexander Lorenz
looks like the hadoop partition is full. sent via my mobile device On Apr 25, 2012, at 9:13 PM, Nuthalapati, Ramesh ramesh.nuthalap...@mtvstaff.com wrote: Strangely isee the tmp folder has enough space. What else could be the problem ? How much should my tmp space be ? Error:

Re: No Space left on device

2012-04-25 Thread Harsh J
This is from your mapred.local.dir (which by default may reuse hadoop.tmp.dir). Do you see free space available when you do the following?: df -h /opt/hadoop On Thu, Apr 26, 2012 at 12:43 AM, Nuthalapati, Ramesh ramesh.nuthalap...@mtvstaff.com wrote: Strangely isee the tmp folder has enough

Re: Text Analysis

2012-04-25 Thread Charles Earl
If you've got existing R code, you might want to look at this http://www.quora.com/How-can-R-and-Hadoop-be-used-together. Quora posting, also by Cloudera, or the rhipe R Hadoop package https://github.com/saptarshiguha/RHIPE/wiki Mahout and Lucene/Solr offer some level of text analysis, although

RE: No Space left on device

2012-04-25 Thread Nuthalapati, Ramesh
I have lot of space available FilesystemSize Used Avail Use% Mounted on /dev/mapper/sysvg-opt 14G 1.2G 12G 9% /opt My input files are around 10G, is there a requirement that the hadoop tmp dir should be at certain % of the input files or something ?

Re: No Space left on device

2012-04-25 Thread Harsh J
Ramesh, That explains it then. Going from Map to Reduce requires disk storage worth at least the amount of data you're gonna be sending between them. If you're running your 'cluster' on a single machine, the answer to your question is yes. On Thu, Apr 26, 2012 at 1:01 AM, Nuthalapati, Ramesh

Re: Text Analysis

2012-04-25 Thread Devi Kumarappan
RHaddop package allows you to do statistical anlysis.  we were able to do word cloud on the text files using rmr and rhdfs packages. Installtion details for these packages is available in the following link. https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr Devi

hadoop on rackspace

2012-04-25 Thread Xiaomeng Wan
Hi, I am wondering what kind of hadoop cluster I can get on rackspace for a 2k/month budget (#servers, size of each server)? btw, any pointer to instruction on how to set up hadoop cluster or cloudera version on rackspace is really appreciated. Regards, Shawn

Re: hadoop on fedora 15

2012-04-25 Thread john cohen
I had the same issue. My problem was the use of VPN connected to work, and at the same time working with M/R jobs on my Mac. It occurred to me that maybe Hadoop was binding to the wrong IP (the IP given to you after connecting through VPN), bottom line, I disconnect from the VPN, and the M/R