Re: java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
Edmund, Thanks for the advice. It turns out that it was the firewall running on the second cluster node. So I stopped that and all is working correctly. Now that I have the second node working the way that it is supposed to probably, going to bring another couple of nodes online. Wish

Re: java.io.IOException: Could not obtain block:

2009-11-10 Thread Edmund Kohlwey
I've not encountered an error like this, but here's some suggestions: 1. Try to make sure that your two node cluster is setup correctly. Querying the web interface, using any of the included dfs utils (eg. hadoop dfs -ls), or looking in your log directory may yield more useful stack traces or

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Mark Kerzner
Thank you to all who answered on this thread. From your answers, it feels like I will be OK if I run on 0.20.1 on my workstation, but I will not change the code and not remove the deprecated API calls. Then I will get the performance improvements of 0.20.1 but avoid additional work. My API calls ar

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Matt Massie
Hi Mark- Currently Amazon's EMR only runs Hadoop 0.18.3. Cloudera Distribution for Hadoop has patched/tested packages for both Hadoop 0.18.3 and Hadoop 0.20.1 (as well as Pig, Hive, HBase and Zookeeper). CDH2 was released August of this year as a "testing" release. We expect to promote is to "s

Re: NameNode/DataNode & JobTracker/TaskTracker

2009-11-10 Thread Todd Lipcon
On Mon, Nov 9, 2009 at 1:04 PM, John Martyniak wrote: > Thanks Todd. > > I wasn't sure if that is possible. But you pointed out an important point > and that is it is just NN and JT that would run remotely. > > So in order to do this would I just install the complete hadoop instance on > each on

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Scott Carey
The old API may be deprecated, but it still works and you don't have to change your code yet. A later release will remove the old API altogether. 0.20.1+ is a good place to run your old code and learn some of the newer stuff. There may be many other features you can get good use of (Schedulers

Re: Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Edmund Kohlwey
The new API in 0.20.x is likely not what you'll see in the final Hadoop 1.0 release, which I've heard some people forecast within the next 18 months or so (we'll see). There will likely be a 0.21.x series, and then the final release. That having been said, its much more similar to what you'll

Re: Lucene + Hadoop

2009-11-10 Thread Eason.Lee
I think you'd better using map to group all the file belong to the same author together and using reduce to index the files? 2009/11/11 Otis Gospodnetic > I think that sounds right. > I believe that's what I did when I implemented this type of functionality > for http://simpy.com/ > > I'm not su

Apache Hadoop Get Together Berlin - December 2009

2009-11-10 Thread Isabel Drost
As announced at ApacheCon US, the next Apache Hadoop Get Together Berlin is scheduled for December 2009. When: Wednesday December 16, 2009 at 5:00pm Where: newthinking store, Tucholskystr. 48, Berlin As always there will be slots of 20min each for talks on your Hadoop topic. After each talk

Re: Hadoop User Group Maryland/DC Area

2009-11-10 Thread Jeff Hammerbacher
Hey Abhi, Check out http://www.meetup.com/Hadoop-DC/. Regards, Jeff On Tue, Nov 10, 2009 at 9:26 AM, Abhishek Pratap wrote: > Hi Guys > > Just wondering if there is any Hadoop group functioning in the Maryland/DC > area. I would love to be a part and learn few things along the way. > > Cheers,

Re: Lucene + Hadoop

2009-11-10 Thread Otis Gospodnetic
I think that sounds right. I believe that's what I did when I implemented this type of functionality for http://simpy.com/ I'm not sure why this is a Hadoop thing, though. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP,

Re: error setting up hdfs?

2009-11-10 Thread Aaron Kimball
You don't "need" to specify a path. If you don't specify a path argument for ls, then it uses your home directory in HDFS ("/user/"). When you first started HDFS, /user/hadoop didn't exist, so 'hadoop fs -ls' --> 'hadoop fs -ls /user/hadoop' --> directory not found. When you mkdir'd 'lol', you were

Re: Anyone using Hadoop in Austin, Texas ?

2009-11-10 Thread Mark Kerzner
Me in Houston :) Mark On Tue, Nov 10, 2009 at 3:32 PM, Stephen Watt wrote: > Just curious to see if there are any hadoop compatriots around and if > there are, maybe we could organize a meetup. > > Regards > Steve Watt >

Anyone using Hadoop in Austin, Texas ?

2009-11-10 Thread Stephen Watt
Just curious to see if there are any hadoop compatriots around and if there are, maybe we could organize a meetup. Regards Steve Watt

Re: error setting up hdfs?

2009-11-10 Thread zenkalia
ok, things are working.. i must have forgotten what i did when first setting up hadoop... should these responses be considered inconsistent/an error? hmm. hadoop dfs -ls error hadoop dfs -ls / irrelevant stuff about the path you're in hadoop dfs -mkdir lol works fine hadoop dfs -ls Found 1 item

Re: Hadoop NameNode not starting up

2009-11-10 Thread Sagar
did u format it for the first time another quick way to fugure out is ${HADOOP_HOME}/bin/hadoop start namenode see wht error it gives -Sagar Stephen Watt wrote: You need to go to your logs directory and have a look at what is going on in the namenode log. What version are you using ? I'm goi

Re: Hadoop NameNode not starting up

2009-11-10 Thread Stephen Watt
You need to go to your logs directory and have a look at what is going on in the namenode log. What version are you using ? I'm going to take a guess at your issue here and say that you used the /tmp as a path for some of your hadoop conf settings and you have rebooted lately. The /tmp dir is

Re: Automate EC2 cluster termination

2009-11-10 Thread Hitchcock, Andrew
Hi John, Have you considered Amazon Elastic MapReduce? (Disclaimer: I work on Elastic MapReduce) http://aws.amazon.com/elasticmapreduce/ It waits for your job to finish and then automatically shuts down the cluster. With a simple command like: elastic-mapreduce --create --num-instances 10 --

Re: error setting up hdfs?

2009-11-10 Thread Stephen Watt
You need to specify a path. Try "bin/hadoop dfs -ls / " Steve Watt From: zenkalia To: core-u...@hadoop.apache.org Date: 11/10/2009 03:04 PM Subject: error setting up hdfs? had...@hadoop1:/usr/local/hadoop$ bin/hadoop dfs -ls ls: Cannot access .: No such file or directory. anyone else get

error setting up hdfs?

2009-11-10 Thread zenkalia
had...@hadoop1:/usr/local/hadoop$ bin/hadoop dfs -ls ls: Cannot access .: No such file or directory. anyone else get this one? i started changing settings on my box to get all of my cores working, but immediately hit this error. since then i started from scratch and have hit this error again. w

Should I upgrade from 0.18.3 to the latest 0.20.1?

2009-11-10 Thread Mark Kerzner
Hi, I've been working on my project for about a year, and I decided to upgrade from 0.18.3 (which was stable and already old even back then). I have started, but I see that many classes have changed, many are deprecated, and I need to re-write some code. Is it worth it? What are the advantages of

Re: Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
Thanks!!! That worked! I guess I can edit the number on the datanodes as well but if there is an even more "official" way to resolve this I would be interested in hearing about it. --- On Tue, 11/10/09, Edmund Kohlwey wrote: > From: Edmund Kohlwey > Subject: Re: Error with replication and n

Re: How to set identityreducer in hadoop 0.2

2009-11-10 Thread vipul sharma
thanks Tim! On Tue, Nov 10, 2009 at 11:35 AM, Tim Robertson wrote: > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html > > "The default implementation is an identity function." > > Cheers, > Tim > > On Tue, Nov 10, 2009 at 8:32 PM, vipul sharma > wrote:

Re: How to set identityreducer in hadoop 0.2

2009-11-10 Thread Tim Robertson
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html "The default implementation is an identity function." Cheers, Tim On Tue, Nov 10, 2009 at 8:32 PM, vipul sharma wrote: > IdentityReducer class is deprecated in hadoop 0.2. What can I use for > similar func

How to set identityreducer in hadoop 0.2

2009-11-10 Thread vipul sharma
IdentityReducer class is deprecated in hadoop 0.2. What can I use for similar functionality? -- Vipul Sharma sharmavipul AT gmail DOT com

Re: stdout logs ?

2009-11-10 Thread Gang Luo
Hi Siddu, I asked this question couple of days ago. You should use you browser access the jobtracker. Click a job id ->map->pick a map task->click the link at the column "task log", you will see the output at stdout and stderr. -Gang --Original Message- In src/contrib/data_join/s

hadoop set up and clean up tasks

2009-11-10 Thread Samprita Hegde
Hi , I was looking to through Hadoop's logs and found that it runs two extra tasks set up task and a clean up task. I was just curious to know what are these tasks for? Can someone please explain me why these tasks are needed? Thanks! Samprita

Re: Error with replication and namespaceID

2009-11-10 Thread Edmund Kohlwey
Hi Ray, You'll probably find that even though the name node starts, it doesn't have any data nodes and is completely empty. Whenever hadoop creates a new filesystem, it assigns a large random number to it to prevent you from mixing datanodes from different filesystems on accident. When you re

stdout logs ?

2009-11-10 Thread Siddu
Hi all In src/contrib/data_join/src/java/org/apache/hadoop/contrib/utils/join/DataJoinJob.java i found a couple of println statements (shown below )which are getting executed when submitted for a job . I am not sure to which stdout they are printing ? I searched in logs/* but dint find it ?

Re: Re: how to read file in hadoop

2009-11-10 Thread Gang Luo
It is because the content I read from the file is encoded in UTF8, I use Text.decode to decode it back to plain text string, the problem is gone now. -Gang - 原始邮件 发件人: Gang Luo 收件人: common-user@hadoop.apache.org 发送日期: 2009/11/10 (周二) 12:14:44 上午 主 题: Re: Re: how to read file in hadoo

Hadoop User Group Maryland/DC Area

2009-11-10 Thread Abhishek Pratap
Hi Guys Just wondering if there is any Hadoop group functioning in the Maryland/DC area. I would love to be a part and learn few things along the way. Cheers, -Abhi

Hadoop User Group (Bay Area) - next Wednesday (Nov 18th) at Yahoo!

2009-11-10 Thread Dekel Tankel
Hi all, We are one week away from the next Bay Area Hadoop User Group - Yahoo! Sunnyvale Campus, next Wednesday (Nov 18th) at 6PM We have an exciting evening planed: *Katta, Solr, Lucene and Hadoop - Searching at scale, Jason Rutherglen and Jason Venner *Walking through the New File system A

java.io.IOException: Could not obtain block:

2009-11-10 Thread John Martyniak
Hello everyone, I am getting this error java.io.IOException: Could not obtain block:, when running on my new cluster. When I ran the same job on the single node it worked perfectly, I then added in the second node, and receive this error. I was running the grep sample job. I am running

Error with replication and namespaceID

2009-11-10 Thread Raymond Jennings III
On the actual datanodes I see the following exception: I am not sure what the namespaceID is or how to sync them. Thanks for any advice! / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = pingo-3.poly.edu/128.238.55.33 STARTUP_MS

Re: Hadoop NameNode not starting up

2009-11-10 Thread Edmund Kohlwey
Is there error output from start-all.sh? On 11/9/09 11:10 PM, Kaushal Amin wrote: I am running Hadoop on single server. The issue I am running into is that start-all.sh script is not starting up NameNode. Only way I can start NameNode is by formatting it and I end up losing data in HDFS. Doe

Re: Cross Join

2009-11-10 Thread Edmund Kohlwey
Thanks to all who commented on this. I think there was some confusion over what I was trying to do: indeed there was no common key between the two tables to join on, which made all the methods I investigated either inappropriate or inefficient. In the end I decided to write my own join class. I

Next Boston Hadoop Meetup, Tuesday, November 24th

2009-11-10 Thread Dan Milstein
After a packed, energetic first Boston Hadoop Meetup, we're having another. Next one will be in two weeks, on Tuesday, November 24th, 7 pm, at the HubSpot offices: http://www.meetup.com/bostonhadoop/calendar/11834241/ (HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will Be

Hadoop NameNode not starting up

2009-11-10 Thread Kaushal Amin
I am running Hadoop on single server. The issue I am running into is that start-all.sh script is not starting up NameNode. Only way I can start NameNode is by formatting it and I end up losing data in HDFS. Does anyone have solution to this issue? Kaushal

Re: Automate EC2 cluster termination

2009-11-10 Thread Edmund Kohlwey
You should be able to detect the status of the job in your java main() method, just do either: job.waitForCompletion(), and, when the job finishes running, use job.isSuccessful(), or if you want to you can write a custom "watcher" thread to poll job status manually; this will allow you to, for

Automate EC2 cluster termination

2009-11-10 Thread John Clarke
Hi, I use EC2 to run my Hadoop jobs using Cloudera's 0.18.3 AMI. It works great but I want to automate it a bit more. I want to be able to: - start cluster - copy data from S3 to the DFS - run the job - copy result data from DFS to S3 - verify it all copied ok - shutdown the cluster. I guess th

Moscow Hadoop User Group

2009-11-10 Thread Vladimir Klimontovich
Hello! Does anybody interested in creation of Hadoop User Group in Moscow (Russia)? If you are, please, go to the https://spreadsheets.google.com/viewform?formkey=dEFEYzhtUzQtRElzQ3VHTHNBY3gtaHc6MA form and fill a short survey. I'll contact all people who'll fill the survey in a next few

Lucene + Hadoop

2009-11-10 Thread Hrishikesh Agashe
Hi, I am trying to use Hadoop for Lucene index creation. I have to create multiple indexes based on contents of the files (i.e. if author is "hrishikesh", it should be added to a index for "hrishikesh". There has to be a separate index for every author). For this, I am keeping multiple IndexWri

[Ask for help]: IOException: Expecting a line not the end of stream, hadoop-0.20.1 in Daemn Small Linux

2009-11-10 Thread Neo Tan
Dear all, I am new in learning hadoop, encountered a problem while complying "Hadoop/Quick Start(http://hadoop.apache.org/common/docs/current/quickstart.html)" tutorial. Everything in "cygwin" is okay, but in "Daemn Small Linux(DSL)". In "Daemn Small Linux", after executing the command

Re: Where is the eclipse plug-in for hadoop 0.20.1

2009-11-10 Thread Jeff Zhang
Hi Stephen, Thank you. It works Jeff Zhang On Mon, Nov 9, 2009 at 10:31 PM, Stephen Watt wrote: > Hi Jeff > > That is correct. The plugin for 0.20.1 exists only in the src/contrib as > it has some build and runtime issues. It is presently being tracked here - > http://issues.apache.org/jira

Re: 1st Hadoop India User Group meet

2009-11-10 Thread Amandeep Khurana
Oops.. I was sending this email to Sanjay personally.. Sorry about sending it to the entire group. -Amandeep On Tue, Nov 10, 2009 at 12:00 AM, Amandeep Khurana wrote: > Sanjay, > > Congratulations for holding the first meetup. All the best with it. > > Its exciting to see work being done in I

Re: 1st Hadoop India User Group meet

2009-11-10 Thread Amandeep Khurana
Sanjay, Congratulations for holding the first meetup. All the best with it. Its exciting to see work being done in India involving Hadoop. I've been a part of some projects in the Hadoop ecosystem and have done some research work during my graduate studies as well as for a project at Cisco System