Hadoop scheduling question

2009-06-04 Thread Kristi Morton
Hi, I'm a Hadoop 17 user who is doing research with Prof. Magda Balazinska at the University of Washington on an improved progress indicator for Pig Latin. We have a question regarding how Hadoop schedules Pig Latin queries with JOIN operators. Does Hadoop schedule all MapReduce jobs in a s

Re: Hadoop scheduling question

2009-06-04 Thread Pankil Doshi
Hello Kristi, I am Research Assistant at University of Texas at Dallas. We are working of RDF data and we come across many joins in our queries. But We are not able to carry out all joins in a single job..we also tried our hadoop code using Pig scripts and found that for each join in PIG script ne

Hadoop scheduling question

2009-06-04 Thread Kristi Morton
Hi, I'm a Hadoop 17 user who is doing research with Prof. Magda Balazinska at the University of Washington on an improved progress indicator for Pig Latin. We have a question regarding how Hadoop schedules Pig Latin queries with JOIN operators. Does Hadoop schedule all MapReduce jobs in a s

Re: Fastlz coming?

2009-06-04 Thread Hong Tang
Using com.hadoop.compression.lzo.LzoCodec is not much different from using other codecs: adding the hadoop-gpl-compression-0.1.0-dev.jar in your classpath, and add the path to the native library libgplcompression.so in system property java.library.path. Hope this helps, Hong On Jun 4, 200

HBase v0.19.3 with Hadoop v0.19.1?

2009-06-04 Thread Amandeep Khurana
I have a couple of questions: 1. Is Hbase 0.19.3 release stable for a production cluster? 2. Can it be deployed over Hadoop v0.19.1? ..amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz

Re: Question about Hadoop filesystem

2009-06-04 Thread Brian Bockelman
It's in the FAQ: http://wiki.apache.org/hadoop/FAQ#17 Brian On Jun 4, 2009, at 6:26 PM, Harold Lim wrote: How do I remove a datanode? Do I simply "destroy" my datanode and the namenode will automatically detect it? Is there a more elegent way to do it? Also, when I remove a datanode, d

Question about Hadoop filesystem

2009-06-04 Thread Harold Lim
How do I remove a datanode? Do I simply "destroy" my datanode and the namenode will automatically detect it? Is there a more elegent way to do it? Also, when I remove a datanode, does hadoop automatically re-replicate the data right away? Thanks, Harold

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread asif md
@ Ravi. Not able to do that. On Thu, Jun 4, 2009 at 5:38 PM, Raghu Angadi wrote: > > Did you try 'telnet 198.55.35.229 54310' from this datanode? The log show > that it is not able to connect to "master:54310". ssh from datanode does not > matter. > > Raghu. > > asif md wrote: > >> I can SSH

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread Raghu Angadi
Did you try 'telnet 198.55.35.229 54310' from this datanode? The log show that it is not able to connect to "master:54310". ssh from datanode does not matter. Raghu. asif md wrote: I can SSH both ways .i.e. From master to slave and slave to master. the datanode is getting intialized at mas

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread asif md
I can SSH both ways .i.e. From master to slave and slave to master. the datanode is getting intialized at master but the log at slave looks like this / 2009-06-04 15:20:06,066 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: /**

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread asif md
@Ravi thanx ravi .. i'm now using my a definded tmp dir so the second issue is resolved. But i have ssh keys tht have passwords. But i am able to ssh to the slave and master from the master. should i be able to do tht from the slave as well. @ALL Any suggestions. Thanx Asif. On Thu, Jun 4,

Re: Task files in _temporary not getting promoted out

2009-06-04 Thread Ian Soboroff
No, they were completing successfully. In the end, I got it to work by manually making a local path (via JobConf), and then moving the output to HDFS in close(). Ian jason hadoop writes: > Are your tasks failing or completing successfully. Failed tasks have the > output directory wiped, only

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Raghu Angadi
Tom White wrote: Actually, the space is needed, to be interpreted as a Hadoop option by ToolRunner. Without the space it sets a Java system property, which Hadoop will not automatically pick up. I don't think space is required. Something like -Dfs.default.name=host:port works. I don't see Tool

Re: Letting the Mapper handle multiple lines.

2009-06-04 Thread Per Stolpe
I did indeed think that addInputPath() set the InputFormat class, so this is probably what has been my problem. I'll try this when I gain access to my cluster again on Monday, but I'm fairly confident that this will fix my program. Thank you very much for a good answer. Take care, I will post

Re: Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread Ravi Phulari
>From logs looks like your Hadoop cluster is facing two different issues . At Slave 1. exception: java.net.NoRouteToHostException: No route to host in your logs Diagnosis - One of your nodes cannot be reached correctly. Make sure you can ssh to your master and slave and passwordless ssh keys

Re: Fastlz coming?

2009-06-04 Thread Owen O'Malley
On Jun 4, 2009, at 11:19 AM, Kris Jirapinyo wrote: Hopefully we can have a new page on the hadoop wiki on how to use custom compression so that people won't have to go search through the threads to find the answer in the future. Yes, it would be extremely useful if you could start a wiki p

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Tom White
Actually, the space is needed, to be interpreted as a Hadoop option by ToolRunner. Without the space it sets a Java system property, which Hadoop will not automatically pick up. Ian, try putting the options after the classname and see if that helps. Otherwise, it would be useful to see a snippet o

Re: Subscription

2009-06-04 Thread Aaron Kimball
You need to send a message to core-user-subscr...@hadoop.apache.org from the address you want registered. See http://hadoop.apache.org/core/mailing_lists.html - Aaron On Thu, Jun 4, 2009 at 12:10 PM, Akhil langer wrote: > Please, add me to the hadoop-core user mailing list. > > email address: *

Cluster Setup Issues : Datanode not being initialized.

2009-06-04 Thread asif md
Hello all, I'm trying to setup a two node cluster < remote > using the following tutorials { NOTE : i'm ignoring the tmp directory property in hadoop-site.xml suggested by Michael } Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Vasyl Keretsman
Perhaps, there should not be the "space" between -D and your option ? -Dprise.collopts= Vasyl 2009/6/4 Ian Soboroff : > > bin/hadoop jar -files collopts -D prise.collopts=collopts p3l-3.5.jar > gov.nist.nlpir.prise.mapred.MapReduceIndexer input output > > The 'prise.collopts' option doesn

Subscription

2009-06-04 Thread Akhil langer
Please, add me to the hadoop-core user mailing list. email address: *akhilan...@gmail.com* Thank You! Akhil

Processing files lying in a directory structure

2009-06-04 Thread akhil1988
Hi! I am working on applying WordCount example on the entire Wikipedia dump. The entire english wikipedia is around 200GB which I have stored in HDFS in a cluster to which I have access. The problem: Wikipedia dump contains many directories (it has a very big directory structure) containing HTM

Re: Fastlz coming?

2009-06-04 Thread Kris Jirapinyo
Thanks Matt. Hopefully we can have a new page on the hadoop wiki on how to use custom compression so that people won't have to go search through the threads to find the answer in the future. On Thu, Jun 4, 2009 at 10:33 AM, Matt Massie wrote: > Kris- > > You might take a look at some of the pre

Re: Letting the Mapper handle multiple lines.

2009-06-04 Thread HRoger
I has read your code ,I think you should add job.setInputFormatClass(MultiLineInputFormat.class); when you not set the that ,it would use TextInputFormat and the value is Text default.You may thought that "MultiLineInputFormat.addInputPath()" would set the InputFormatClass auto, but it doesn't do

Re: Sharing object between mappers on same node (reuse.jvm ?)

2009-06-04 Thread Tarandeep Singh
Thanks Kevin for the clarification. I ran couple of tests as well and the system behaved exactly what you had said. So now the question is, how can I achieve what I want to do - share an object (Lucene IndexWriter instance) between mappers running on same node. I thought of running the IndexWriter

Re: Customizing machines to use for different jobs

2009-06-04 Thread Alex Loddengaard
Hi Raakhi, Unfortunately there is no built-in way of doing this. You'd have to instantiate two entirely separate Hadoop clusters to accomplish what you're trying to do, which isn't an uncommon thing to do. I'm not sure why you're hoping to have this behavior, but the fair share scheduler might b

Re: Fastlz coming?

2009-06-04 Thread Matt Massie
Kris- You might take a look at some of the previous lzo threads on this list for help. See: http://www.mail-archive.com/search?q=lzo&l=core-user%40hadoop.apache.org -Matt On Jun 4, 2009, at 10:29 AM, Kris Jirapinyo wrote: Is there any documentation on that site on how we can use lzo? I

Re: Fastlz coming?

2009-06-04 Thread Kris Jirapinyo
Is there any documentation on that site on how we can use lzo? I don't see any entries on the wiki page of the project. I see an entry on the Hadoop wiki (http://wiki.apache.org/hadoop/UsingLzoCompression) but seems like that's more oriented towards HBase. I am on hadoop 0.19.1. Thanks, Kris J.

Letting the Mapper handle multiple lines.

2009-06-04 Thread Per Stolpe
Hi. I'm quite new to Hadoop programming, so to get a good start I started writing my own program that summarizes a column in a large tab separated file (~100 000 000 lines). My first naive implementation was quite simple, a small rework of the WordCounter example that comes with Hadoop. This progra

Re: Task files in _temporary not getting promoted out

2009-06-04 Thread jason hadoop
Are your tasks failing or completing successfully. Failed tasks have the output directory wiped, only successfully completed tasks have the files moved up. I don't recall if the FileOutputCommitter class appeared in 0.18 On Wed, Jun 3, 2009 at 6:43 PM, Ian Soboroff wrote: > Ok, help. I am try

Re: *.gz input files

2009-06-04 Thread jason hadoop
General speaking, the .gz extension will be recognized by the input formats that inherit from TextInputFormat, and the correct thing will happen. Is there by chance an error in your log files about codec loading failure. What version of hadoop are you using, and can you provide a few more details

Re: *.gz input files

2009-06-04 Thread Ian Soboroff
If you're case is like mine, where you have lots of .gz files and you don't want splits in the middle of those files, you can use the code I just sent in the thread about traversing subdirectories. In brief, your RecordReader could do something like: public static class MyRecordReader

Re: problem getting map input filename

2009-06-04 Thread Rares Vernica
On 6/2/09, Rares Vernica wrote: > > I have a problem getting the map input file name. Here is what I tried: > > public class Map extends Mapper { > > public void map(Object key, Text value, Context context) > throws IOException, InterruptedException { > Configuration conf =

Re: Subdirectory question revisited

2009-06-04 Thread Ian Soboroff
Here's how I solved the problem using a custom InputFormat... the key part is in listStatus(), where we traverse the directory tree. Since HDFS doesn't have links this code is probably safe, but if you have a filesystem with cycles you will get trapped. Ian import java.io.IOException; import ja

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Ian Soboroff
bin/hadoop jar -files collopts -D prise.collopts=collopts p3l-3.5.jar gov.nist.nlpir.prise.mapred.MapReduceIndexer input output The 'prise.collopts' option doesn't appear in the JobConf. Ian Aaron Kimball writes: > Can you give an example of the exact arguments you're sending on the command

Re: Fastlz coming?

2009-06-04 Thread Johan Oskarsson
We're using Lzo still, works great for those big log files: http://code.google.com/p/hadoop-gpl-compression/ /Johan Kris Jirapinyo wrote: > Hi all, >In the remove lzo JIRA ticket > https://issues.apache.org/jira/browse/HADOOP-4874 Tatu mentioned he was > going to port fastlz from C to Java an

Re: Do I need to implement Readfields and Write Functions If I have Only One Field?

2009-06-04 Thread Aaron Kimball
If you don't add any member fields, then no, I don't think you need to change anything. - Aaron On Wed, Jun 3, 2009 at 4:11 PM, dealmaker wrote: > > I have the following as my type of my "value" object. Do I need to > implement > readfields and write functions? > > private static class StringA

Re: How do I convert DataInput and ResultSet to array of String?

2009-06-04 Thread Aaron Kimball
e.g. for readFields(), myItems = new ArrayList(); int numItems = dataInput.readInt(); for (i = 0; i < numItems; i++) { myItems.add(Text.readString(dataInput)); } then on the serialization (write) side, send: dataOutput.writeInt(myItems.length()); for (int i = 0; i < myItems.length(); i++) {

Re: Command-line jobConf options in 0.18.3

2009-06-04 Thread Aaron Kimball
Can you give an example of the exact arguments you're sending on the command line? - Aaron On Wed, Jun 3, 2009 at 5:46 PM, Ian Soboroff wrote: > If after I call getConf to get the conf object, I manually add the > key/value pair, it's there when I need it. So it feels like ToolRunner > isn't pa

Re: Sharing object between mappers on same node (reuse.jvm ?)

2009-06-04 Thread Kevin Peterson
On Wed, Jun 3, 2009 at 10:59 AM, Tarandeep Singh wrote: > I want to share a object (Lucene Index Writer Instance) between mappers > running on same node of 1 job (not across multiple jobs). Please correct me > if I am wrong - > > If I set the -1 for the property: mapred.job.reuse.jvm.num.tasks the

Re: question about when shuffle/sort start working

2009-06-04 Thread Jianmin Woo
Oh, I see. Thanks. - Jianmin From: Sharad Agarwal To: core-user@hadoop.apache.org Sent: Thursday, June 4, 2009 12:59:12 PM Subject: Re: question about when shuffle/sort start working Jianmin Woo wrote: > > Do you have some sample on the re-usage of static va