Re: If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
thanks, the configure file format looks like below, @tag_name0 name0 {value00, value01, value02} @tag_name1 name1 {value10, value11, value12} and reading it from HDFS. Then how can I parse them ?

Re: Question on how to view the counters of jobs in the job tracker history

2008-04-03 Thread Arun C Murthy
On Apr 3, 2008, at 5:36 PM, Jason Venner wrote: For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not visible. Is it possi

Re: Hadoop: Multiple map reduce or some better way

2008-04-03 Thread Aayush Garg
HI Amar , Theodore, Arun, Thanks for your reply. Actaully I am new to hadoop so cant figure out much. I have written following code for inverted index. This code maps each word from the document to its document id. ex: apple file1 file123 Main functions of the code are:- public class HadoopProgr

Question on how to view the counters of jobs in the job tracker history

2008-04-03 Thread Jason Venner
For the first day or so, when the jobs are viewable via the main page of the job tracker web interface, the jobs specific counters are also visible. Once the job is only visible in the history page, the counters are not visible. Is it possible to view the counters of the older jobs? -- Jason

Re: one key per output part file

2008-04-03 Thread Ashish Venugopal
Thanks Yuri! I followed your pattern here and the version where you make the sytem call directly to -put onto DFS works for me. I did not set $ENV{HADOOP_HEAPSIZE}=300; and it seems to work fine (i didnt try setting this variable to see if it failed). I also used perl's built in File::Temp mechanis

Re: Is it possible in Hadoop to overwrite or update a file?

2008-04-03 Thread Ted Dunning
Interesting you should say this. I have been using this exact example (slightly modified) as an interview question lately. I have to admit I stole it from Doug's Hadoop slides. If you have a 1TB database with 100 B records and you want to update 1% of them, how long will it take? Assume for ar

Re: Quick jar deployment question...

2008-04-03 Thread Jason Venner
This only happens if you add a class from the jar to the JobConf creation line. JobConf conf = new JobConf(MyClass.class); JobConf public JobConf(Class exampleClass) Construct a map/reduce job configuration. Parameters: exampleClass - a class whose containing jar is used a

Re: Hadoop streaming performance problem

2008-04-03 Thread lin
You're right. Java isn't really that slow. I re-examined the Java code for the standalone program and found I was using an unbuffered output method. After I changed it to a buffered method, the Java code running time was comparable to the C++ one. This also means the 1000% speed-up I got was quite

Re: Is it possible in Hadoop to overwrite or update a file?

2008-04-03 Thread Andrzej Bialecki
Ted Dunning wrote: I sympathize fully with Owen's thoughts here, but Andrej's point that (essentially) users ought to be able to do it if they really, really want to is a good one. One particular scenario where having the ability to update blocks would be beneficial is when flipping flags in r

Re: Quick jar deployment question...

2008-04-03 Thread C G
Yeah, everything is packaged into one jar...I've been copying those jars everywhere which didn't seem right, hence the question. Thanks, C G Ted Dunning <[EMAIL PROTECTED]> wrote: The easiest way is to package all of your code (classes and jars) into a single jar file which you then

Re: distcp fails :Input source not found

2008-04-03 Thread s29752-hadoopuser
distcp supports multiple sources (link Unix cp) and if the specified source is a directory, it copies the entire directory. So, you could either do distcp src1 src2 ... src100 dst or first copy all srcs to srcdir, and then distcp srcdir dstdir I have no experience on S3 and EC2. Not sure

Re: Quick jar deployment question...

2008-04-03 Thread Ted Dunning
The easiest way is to package all of your code (classes and jars) into a single jar file which you then execute. When you instantiate a JobClient and run a job, your jar gets copied to all necessary nodes. The machine you use to launch the job need not even be in the cluster, just able to see th

Quick jar deployment question...

2008-04-03 Thread C G
Hi All: When deploying a jar file containing code for a Hadoop job, is it necessary to copy the jar to the same path on all nodes in the grid, or just on the node which will launch the job? Thanks, C G - You rock. That's why Blockbuster's

Re: Help: libhdfs SIGSEGV

2008-04-03 Thread Christian Kunz
Yingyuan , I cannot give you a detailed answer, just a guess. Maybe Arun can chime in and provide more details. My guess is that it has to do with the fact that the DFSClient returns the same (cached) FileSystem handle for every connection request to the same namenode, but libhdfs would return yo

Re: distcp fails :Input source not found

2008-04-03 Thread Prasan Ary
I found it was a slight oversight on my part. I was copying the files into S3 using Firefox EC2 UI, and then trying to access those files on S3 using hadoop. The S3 filesystem provided by hadoop doesn't work with standard files. When I used hadoop to upload the files into S3 instead of Firefox

Re: one key per output part file

2008-04-03 Thread Yuri Pradkin
Here is how we (attempt to) do it: Reducer (in streaming) writes one file for each different key it receives as input. Here's some example code in perl: my $envdir = $ENV{'mapred_output_dir'}; my $fs = ($envdir =~ s/^file://); if ($fs) { #output goes onto NFS open(FI

[ANNOUNCE] Hadoop release 0.16.2 available

2008-04-03 Thread Nigel Daley
Release 0.16.2 fixes critical bugs in 0.16.1. Note that HBase releases are now maintained at http://hadoop.apache.org/hbase/ and HBase has been removed from this release. For Hadoop release details and downloads, visit: http://hadoop.apache.org/core/releases.html Thanks to all who contr

Re: Is it possible in Hadoop to overwrite or update a file?

2008-04-03 Thread Ted Dunning
I sympathize fully with Owen's thoughts here, but Andrej's point that (essentially) users ought to be able to do it if they really, really want to is a good one. It IS true that the original point of hadoop is high performance sequential writing applications. It does that, more or less, pretty w

Re: If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Ted Dunning
That depends on where the file is. If you are reading a file on a normal file system, you use normal Java functions. If you are reading a file from HDFS, you use hadoop functions. On 4/3/08 1:22 AM, "Jeremy Chow" <[EMAIL PROTECTED]> wrote: > Hi list, > > If I define a method named configure

Re: Is it possible in Hadoop to overwrite or update a file?

2008-04-03 Thread Owen O'Malley
On Apr 3, 2008, at 3:53 AM, Andrzej Bialecki wrote: Hmm ... Exactly why random writes are not possible? For performance reasons? Or the problem of synchronization of replicas? The HDFS protocols to support random write would be much more complicated. Furthermore, part of the performance of

Re: Error msg: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration

2008-04-03 Thread Peeyush Bishnoi
Hello , As this problem is related to CLASSPATh of hadoop , so just set the HADOOP_CLASSPATH or CLASSPATH with hadoop core jar --- Peeyush On Wed, 2008-04-02 at 13:51 -0300, Anisio Mendes Lacerda wrote: > Hi, > > me and my coleagues are implementing a small search engine in my University > L

Re: Is it possible in Hadoop to overwrite or update a file?

2008-04-03 Thread Andrzej Bialecki
Owen O'Malley wrote: On Apr 2, 2008, at 11:39 PM, Garri Santos wrote: Hi! I'm starting to take alook at hadoop and the whole HDFS idea. I'm wondering if it's just fine to update or overwrite a file copied to hadoop? No. Although we are making progress on HADOOP-1700, which would allow ap

Re: If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
the config file is a normal text file. -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com

If I wanna read a config file before map task, which class I should choose?

2008-04-03 Thread Jeremy Chow
Hi list, If I define a method named configure in a mapper class which try to read a config file before all map tasks start, which class I should choose? A normal FileReader from jdk or another Reader provided by hadoop ? Can anyone give me an example? Thx, Jeremy -- My research interests are di