Re: 1st Hadoop India User Group meet

2009-11-10 Thread Amandeep Khurana
Sanjay, Congratulations for holding the first meetup. All the best with it. Its exciting to see work being done in India involving Hadoop. I've been a part of some projects in the Hadoop ecosystem and have done some research work during my graduate studies as well as for a project at Cisco

Re: Where is the eclipse plug-in for hadoop 0.20.1

2009-11-10 Thread Jeff Zhang
Hi Stephen, Thank you. It works Jeff Zhang On Mon, Nov 9, 2009 at 10:31 PM, Stephen Watt sw...@us.ibm.com wrote: Hi Jeff That is correct. The plugin for 0.20.1 exists only in the src/contrib as it has some build and runtime issues. It is presently being tracked here -

[Ask for help]: IOException: Expecting a line not the end of stream, hadoop-0.20.1 in Daemn Small Linux

2009-11-10 Thread Neo Tan
Dear all, I am new in learning hadoop, encountered a problem while complying Hadoop/Quick Start(http://hadoop.apache.org/common/docs/current/quickstart.html) tutorial. Everything in cygwin is okay, but in Daemn Small Linux(DSL). In Daemn Small Linux, after executing the command:

Lucene + Hadoop

2009-11-10 Thread Hrishikesh Agashe
Hi, I am trying to use Hadoop for Lucene index creation. I have to create multiple indexes based on contents of the files (i.e. if author is hrishikesh, it should be added to a index for hrishikesh. There has to be a separate index for every author). For this, I am keeping multiple IndexWriter

Automate EC2 cluster termination

2009-11-10 Thread John Clarke
Hi, I use EC2 to run my Hadoop jobs using Cloudera's 0.18.3 AMI. It works great but I want to automate it a bit more. I want to be able to: - start cluster - copy data from S3 to the DFS - run the job - copy result data from DFS to S3 - verify it all copied ok - shutdown the cluster. I guess

Re: Automate EC2 cluster termination

2009-11-10 Thread Edmund Kohlwey
You should be able to detect the status of the job in your java main() method, just do either: job.waitForCompletion(), and, when the job finishes running, use job.isSuccessful(), or if you want to you can write a custom watcher thread to poll job status manually; this will allow you to, for

Hadoop NameNode not starting up

2009-11-10 Thread Kaushal Amin
I am running Hadoop on single server. The issue I am running into is that start-all.sh script is not starting up NameNode. Only way I can start NameNode is by formatting it and I end up losing data in HDFS. Does anyone have solution to this issue? Kaushal

Next Boston Hadoop Meetup, Tuesday, November 24th

2009-11-10 Thread Dan Milstein
After a packed, energetic first Boston Hadoop Meetup, we're having another. Next one will be in two weeks, on Tuesday, November 24th, 7 pm, at the HubSpot offices: http://www.meetup.com/bostonhadoop/calendar/11834241/ (HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will

Re: Cross Join

2009-11-10 Thread Edmund Kohlwey
Thanks to all who commented on this. I think there was some confusion over what I was trying to do: indeed there was no common key between the two tables to join on, which made all the methods I investigated either inappropriate or inefficient. In the end I decided to write my own join class.

Re: Hadoop NameNode not starting up

2009-11-10 Thread Edmund Kohlwey
Is there error output from start-all.sh? On 11/9/09 11:10 PM, Kaushal Amin wrote: I am running Hadoop on single server. The issue I am running into is that start-all.sh script is not starting up NameNode. Only way I can start NameNode is by formatting it and I end up losing data in HDFS.

Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
On the actual datanodes I see the following exception: I am not sure what the namespaceID is or how to sync them. Thanks for any advice! / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = pingo-3.poly.edu/128.238.55.33

java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
Hello everyone, I am getting this error java.io.IOException: Could not obtain block:, when running on my new cluster. When I ran the same job on the single node it worked perfectly, I then added in the second node, and receive this error. I was running the grep sample job. I am running

Hadoop User Group (Bay Area) - next Wednesday (Nov 18th) at Yahoo!

2009-11-10 Thread Dekel Tankel
Hi all, We are one week away from the next Bay Area Hadoop User Group - Yahoo! Sunnyvale Campus, next Wednesday (Nov 18th) at 6PM We have an exciting evening planed: *Katta, Solr, Lucene and Hadoop - Searching at scale, Jason Rutherglen and Jason Venner *Walking through the New File system

Re: Re: how to read file in hadoop

2009-11-10 Thread Gang Luo
It is because the content I read from the file is encoded in UTF8, I use Text.decode to decode it back to plain text string, the problem is gone now. -Gang - 原始邮件 发件人: Gang Luo lgpub...@yahoo.com.cn 收件人: common-user@hadoop.apache.org 发送日期: 2009/11/10 (周二) 12:14:44 上午 主 题: Re: Re: how

stdout logs ?

2009-11-10 Thread Siddu
Hi all In src/contrib/data_join/src/java/org/apache/hadoop/contrib/utils/join/DataJoinJob.java i found a couple of println statements (shown below )which are getting executed when submitted for a job . I am not sure to which stdout they are printing ? I searched in logs/* but dint find it ?

Re: Error with replication and namespaceID

2009-11-10 Thread Edmund Kohlwey
Hi Ray, You'll probably find that even though the name node starts, it doesn't have any data nodes and is completely empty. Whenever hadoop creates a new filesystem, it assigns a large random number to it to prevent you from mixing datanodes from different filesystems on accident. When you

Re: stdout logs ?

2009-11-10 Thread Gang Luo
Hi Siddu, I asked this question couple of days ago. You should use you browser access the jobtracker. Click a job id -map-pick a map task-click the link at the column task log, you will see the output at stdout and stderr. -Gang --Original Message- In

Re: Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
Thanks!!! That worked! I guess I can edit the number on the datanodes as well but if there is an even more official way to resolve this I would be interested in hearing about it. --- On Tue, 11/10/09, Edmund Kohlwey ekohl...@gmail.com wrote: From: Edmund Kohlwey ekohl...@gmail.com Subject:

Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Mark Kerzner
Hi, I've been working on my project for about a year, and I decided to upgrade from 0.18.3 (which was stable and already old even back then). I have started, but I see that many classes have changed, many are deprecated, and I need to re-write some code. Is it worth it? What are the advantages of

error setting up hdfs?

2009-11-10 Thread zenkalia
had...@hadoop1:/usr/local/hadoop$ bin/hadoop dfs -ls ls: Cannot access .: No such file or directory. anyone else get this one? i started changing settings on my box to get all of my cores working, but immediately hit this error. since then i started from scratch and have hit this error again.

Re: error setting up hdfs?

2009-11-10 Thread Stephen Watt
You need to specify a path. Try bin/hadoop dfs -ls / Steve Watt From: zenkalia zenka...@gmail.com To: core-u...@hadoop.apache.org Date: 11/10/2009 03:04 PM Subject: error setting up hdfs? had...@hadoop1:/usr/local/hadoop$ bin/hadoop dfs -ls ls: Cannot access .: No such file or directory.

Re: Automate EC2 cluster termination

2009-11-10 Thread Hitchcock, Andrew
Hi John, Have you considered Amazon Elastic MapReduce? (Disclaimer: I work on Elastic MapReduce) http://aws.amazon.com/elasticmapreduce/ It waits for your job to finish and then automatically shuts down the cluster. With a simple command like: elastic-mapreduce --create --num-instances 10

Re: Hadoop NameNode not starting up

2009-11-10 Thread Stephen Watt
You need to go to your logs directory and have a look at what is going on in the namenode log. What version are you using ? I'm going to take a guess at your issue here and say that you used the /tmp as a path for some of your hadoop conf settings and you have rebooted lately. The /tmp dir is

Re: Hadoop NameNode not starting up

2009-11-10 Thread Sagar
did u format it for the first time another quick way to fugure out is ${HADOOP_HOME}/bin/hadoop start namenode see wht error it gives -Sagar Stephen Watt wrote: You need to go to your logs directory and have a look at what is going on in the namenode log. What version are you using ? I'm

Re: error setting up hdfs?

2009-11-10 Thread zenkalia
ok, things are working.. i must have forgotten what i did when first setting up hadoop... should these responses be considered inconsistent/an error? hmm. hadoop dfs -ls error hadoop dfs -ls / irrelevant stuff about the path you're in hadoop dfs -mkdir lol works fine hadoop dfs -ls Found 1

Anyone using Hadoop in Austin, Texas ?

2009-11-10 Thread Stephen Watt
Just curious to see if there are any hadoop compatriots around and if there are, maybe we could organize a meetup. Regards Steve Watt

Re: Anyone using Hadoop in Austin, Texas ?

2009-11-10 Thread Mark Kerzner
Me in Houston :) Mark On Tue, Nov 10, 2009 at 3:32 PM, Stephen Watt sw...@us.ibm.com wrote: Just curious to see if there are any hadoop compatriots around and if there are, maybe we could organize a meetup. Regards Steve Watt

Re: error setting up hdfs?

2009-11-10 Thread Aaron Kimball
You don't need to specify a path. If you don't specify a path argument for ls, then it uses your home directory in HDFS (/user/yourusernamehere). When you first started HDFS, /user/hadoop didn't exist, so 'hadoop fs -ls' -- 'hadoop fs -ls /user/hadoop' -- directory not found. When you mkdir'd

Re: Lucene + Hadoop

2009-11-10 Thread Otis Gospodnetic
I think that sounds right. I believe that's what I did when I implemented this type of functionality for http://simpy.com/ I'm not sure why this is a Hadoop thing, though. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA,

Re: Hadoop User Group Maryland/DC Area

2009-11-10 Thread Jeff Hammerbacher
Hey Abhi, Check out http://www.meetup.com/Hadoop-DC/. Regards, Jeff On Tue, Nov 10, 2009 at 9:26 AM, Abhishek Pratap abhishek@gmail.comwrote: Hi Guys Just wondering if there is any Hadoop group functioning in the Maryland/DC area. I would love to be a part and learn few things along

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Edmund Kohlwey
The new API in 0.20.x is likely not what you'll see in the final Hadoop 1.0 release, which I've heard some people forecast within the next 18 months or so (we'll see). There will likely be a 0.21.x series, and then the final release. That having been said, its much more similar to what you'll

Re: NameNode/DataNode JobTracker/TaskTracker

2009-11-10 Thread Todd Lipcon
On Mon, Nov 9, 2009 at 1:04 PM, John Martyniak j...@beforedawnsolutions.com wrote: Thanks Todd. I wasn't sure if that is possible. But you pointed out an important point and that is it is just NN and JT that would run remotely. So in order to do this would I just install the complete

Re: java.io.IOException: Could not obtain block:

2009-11-10 Thread Edmund Kohlwey
I've not encountered an error like this, but here's some suggestions: 1. Try to make sure that your two node cluster is setup correctly. Querying the web interface, using any of the included dfs utils (eg. hadoop dfs -ls), or looking in your log directory may yield more useful stack traces or

Re: java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
Edmund, Thanks for the advice. It turns out that it was the firewall running on the second cluster node. So I stopped that and all is working correctly. Now that I have the second node working the way that it is supposed to probably, going to bring another couple of nodes online.