Re: Renaming all nodes in Hadoop cluster

2009-06-03 Thread Raghu Angadi
Renaming datanodes should not affect HDFS. HDFS does not depend on hostname or ip for consistency of data. You can try renaming a few of the nodes. Of course, if you rename NameNode, you need to update the config file to reflect that. Stuart White wrote: Is it possible to rename all

Re: problem getting map input filename

2009-06-03 Thread Sharad Agarwal
conf.get(map.input.file) should work. If not, then it is a bug in new mapreduce api in 0.20 - Sharad

Image indexing/searching with Hadoop and MPI

2009-06-03 Thread tog
Hi there, This is a kind of newbie question (at least as far as Hadoop is concerned). I was wondering if they were any Hadoop based project around dealing with Image indexing and searching ? We are working is this area and might be interesting to have a look in such a project. Second question is

Re: Image indexing/searching with Hadoop and MPI

2009-06-03 Thread Edward J. Yoon
This is a kind of newbie question (at least as far as Hadoop is concerned). I was wondering if they were any Hadoop based project around dealing with Image indexing and searching ? We are working is this area and might be interesting to have a look in such a project. There is a text-search

Hadoop ReInitialization.

2009-06-03 Thread b
Hello all. I need to process many Gigs of new data each 10 minutes. Each 10 minutes cron launches bash script do.sh that puts data into HDFS and launches processing. But... Hadoop isn't military software, so there is probability of errors with HDFS. So i need to watch LOG files to catch

Re: Hadoop ReInitialization.

2009-06-03 Thread Steve Loughran
b wrote: But after formatting and starting DFS i need to wait some time (sleep 60) before putting data into HDFS. Else i will receive NotReplicatedYetException. that means the namenode is up but there aren't enough workers yet.

Opera Software AS - Job Opening: Hadoop Engineer

2009-06-03 Thread Usman Waheed
Greetings All, Opera Software AS (www.opera.com) in Oslo/Norway is looking for an experienced Hadoop Engineer to join the Statistics Team in order to provide business intelligence metrics both internally and to our customers. If you have the experience and are willing to relocate to beautiful

Re: Subdirectory question revisited

2009-06-03 Thread David Rosenstrauch
OK, thanks for the pointer. If I wind up rolling our own code to handle this I'll make sure to contribute it. DR Aaron Kimball wrote: There is no technical limit that prevents Hadoop from operating in this fashion; it's simply the case that the included InputFormat implementations do not do

Re: problem getting map input filename

2009-06-03 Thread Rares Vernica
On 6/2/09, jason hadoop jason.had...@gmail.com wrote: you can always dump the entire property space and work it out that way. I dumped the property space and I could only find mapred.input.dir. There was no mapred.input.file. -- Rares

Re: Image indexing/searching with Hadoop and MPI

2009-06-03 Thread tog
On Wed, Jun 3, 2009 at 5:17 PM, Edward J. Yoon edwardy...@apache.orgwrote: This is a kind of newbie question (at least as far as Hadoop is concerned). I was wondering if they were any Hadoop based project around dealing with Image indexing and searching ? We are working is this area and

Re: problem getting map input filename

2009-06-03 Thread He Yongqiang
take a look at HADOOP-5368, :) On 09-6-4 上午12:27, Rares Vernica rvern...@gmail.com wrote: On 6/2/09, jason hadoop jason.had...@gmail.com wrote: you can always dump the entire property space and work it out that way. I dumped the property space and I could only find mapred.input.dir.

Command-line jobConf options in 0.18.3

2009-06-03 Thread Ian Soboroff
I'm backporting some code I wrote for 0.19.1 to 0.18.3 (long story), and I'm finding that when I run a job and try to pass options with -D on the command line, that the option values aren't showing up in my JobConf. I logged all the key/value pairs in the JobConf, and the option I passed

Sharing object between mappers on same node (reuse.jvm ?)

2009-06-03 Thread Tarandeep Singh
Hi, I want to share a object (Lucene Index Writer Instance) between mappers running on same node of 1 job (not across multiple jobs). Please correct me if I am wrong - If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks then all mappers of one job will be executed in the same jvm

Fastlz coming?

2009-06-03 Thread Kris Jirapinyo
Hi all, In the remove lzo JIRA ticket https://issues.apache.org/jira/browse/HADOOP-4874 Tatu mentioned he was going to port fastlz from C to Java and provide a patch. Has there been any updates on that? Or is anyone working on any additional custom compression codecs? Thanks, Kris J.

Re: problem getting map input filename

2009-06-03 Thread Rares Vernica
On 6/3/09, He Yongqiang heyongqi...@software.ict.ac.cn wrote: take a look at HADOOP-5368, :) There you set the map.input.file, I think it should already be set by Hadoop.

State of Eclipse Plugin

2009-06-03 Thread ANithian
Hi all, I am not sure if this is the right mailing list but I was wondering the state of the eclipse plugin for Hadoop. I have found it very valuable in my M/R development but have posted, seen and fixed a few bugs but I haven't seen any response in JIRA. Is anyone still using or maintaining the

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
Hey everyone! I just wanted to give a BIG THANKS for everyone who came. We had over a dozen people, and a few got lost at UW :) [I would have sent this update earlier, but I flew to Florida the day after the meeting]. If you didn't come, you missed quite a bit of learning and topics. Such as:

*.gz input files

2009-06-03 Thread Adam Silberstein
Hi, I have some hadoop code that works properly when the input files are not compressed, but it is not working for the gzipped versions of those files. My files are named with *.gz, but the format is not being recognized. I'm under the impression I don't need to set any JobConf parameters to

Re: *.gz input files

2009-06-03 Thread Alex Loddengaard
Hi Adam, Gzipped files don't play that nicely with Hadoop, because they aren't splittable. Can you use bzip2 instead? bzip2 files play more nicely with Hadoop, because they're splittable. If you're stuck with gzip, then take a look here: http://issues.apache.org/jira/browse/HADOOP-437. I

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bhupesh Bansal
Great Bradford, Can you post some videos if you have some ? Best Bhupesh On 6/3/09 11:58 AM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hey everyone! I just wanted to give a BIG THANKS for everyone who came. We had over a dozen people, and a few got lost at UW :) [I would have

streaming a binary processing file

2009-06-03 Thread openresearch
Hi all, I have a urgent question regarding processing binary (image) data using Hadoop streaming. I am looking for simplest solution, preferably without making change to hadoop and/or streaming package. I got some hints from this mailing list, including using customized InputFormat, or

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
Sorry, no videos this time. The conversation wasn't very structured... next month I'll record it :) On Wed, Jun 3, 2009 at 1:59 PM, Bhupesh Bansal bban...@linkedin.com wrote: Great Bradford, Can you post some videos if you have some ? Best Bhupesh On 6/3/09 11:58 AM, Bradford Stephens

Re: streaming a binary processing file

2009-06-03 Thread Zak Stone
One simple solution is to use Dumbo, a Python interface to Hadoop that supports binary streaming: http://wiki.github.com/klbostee/dumbo Zak On Wed, Jun 3, 2009 at 5:18 PM, openresearch qiming...@openresearchinc.com wrote: Hi all, I have a urgent question regarding processing binary (image)

Re: How do I convert DataInput and ResultSet to array of String?

2009-06-03 Thread Aaron Kimball
The text serializer will pull out an entire string by using a null terminator at the end. If you need to know the number of string objects, though, you'll have to serialize that before the strings, then use a for loop to decode the rest of them. - Aaron On Tue, Jun 2, 2009 at 6:01 PM, dealmaker

Re: Do I need to implement Readfields and Write Functions If I have Only One Field?

2009-06-03 Thread Aaron Kimball
If you can use an existing serializeable type to hold that field (e.g., if it's an integer, then use IntWritable) then you can just get away with that. If you are specifying your own class for a key or value class, then yes, the class must implement readFields() and write(). There's no concept of

Re: Hadoop ReInitialization.

2009-06-03 Thread Aaron Kimball
You can block for safemode exit by running 'hadoop dfsadmin -safemode wait' rather than sleeping for an arbitrary amount of time. More generally, I'm a bit confused what you mean by all this. Hadoop daemons may individually crash, but you should never need to reformat HDFS and start from scratch.

Re: Command-line jobConf options in 0.18.3

2009-06-03 Thread Aaron Kimball
Are you running your program via ToolRunner.run()? How do you instantiate the JobConf object? - Aaron On Wed, Jun 3, 2009 at 10:19 AM, Ian Soboroff ian.sobor...@nist.gov wrote: I'm backporting some code I wrote for 0.19.1 to 0.18.3 (long story), and I'm finding that when I run a job and try to

Re: Do I need to implement Readfields and Write Functions If I have Only One Field?

2009-06-03 Thread dealmaker
I have the following as my type of my value object. Do I need to implement readfields and write functions? private static class StringArrayWritable extends ArrayWritable { private StringArrayWritable (String [] aSString) { super (aSString); } } Aaron Kimball-3 wrote:

Re: Command-line jobConf options in 0.18.3

2009-06-03 Thread Ian Soboroff
Yes, and I get the JobConf via 'JobConf job = new JobConf(conf, the.class)'. The conf is the Configuration object that comes from getConf. Pretty much copied from the WordCount example (which this program used to be a long while back...) thanks, Ian On Jun 3, 2009, at 7:09 PM, Aaron

Re: Command-line jobConf options in 0.18.3

2009-06-03 Thread Ian Soboroff
If after I call getConf to get the conf object, I manually add the key/ value pair, it's there when I need it. So it feels like ToolRunner isn't parsing my args for some reason. Ian On Jun 3, 2009, at 8:45 PM, Ian Soboroff wrote: Yes, and I get the JobConf via 'JobConf job = new

Re: question about when shuffle/sort start working

2009-06-03 Thread Jianmin Woo
Thanks for your information, Sharad. Do you have some sample on the re-usage of static variables? Thanks, Jianmin From: sharad agarwal shara...@yahoo-inc.com To: core-user@hadoop.apache.org Sent: Wednesday, June 3, 2009 12:55:55 AM Subject: Re: question about

Re: question about when shuffle/sort start working

2009-06-03 Thread Jianmin Woo
Thanks a lot for your suggestions on the interplay between job and the driver, Chuck. Yes, the job may hold some , say, training data, which is needed in each round of the job. I will check the link you provided. Actually, I am thinking some really light-weight map/reduce jobs. For example,

Task files in _temporary not getting promoted out

2009-06-03 Thread Ian Soboroff
Ok, help. I am trying to create local task outputs in my reduce job, and they get created, then go poof when the job's done. My first take was to use FileOutputFormat.getWorkOutputPath, and create directories in there for my outputs (which are Lucene indexes). Exasperated, I then wrote a

Re: streaming a binary processing file

2009-06-03 Thread Sharad Agarwal
Binary support has been added for 0.21. One option is to wait for 0.21 to get released, or you might try applying the patch from HADOOP-1722. - Sharad

Re: question about when shuffle/sort start working

2009-06-03 Thread Sharad Agarwal
Jianmin Woo wrote: Do you have some sample on the re-usage of static variables? You can define static variables in your Mapper/Reducer class. Static variables will survive till the jvm is live. So multiple tasks of same job running in a single jvm would able to share those. - Sharad