Clarifications on excludedNodeList in DFSClient

2012-11-19 Thread Inder Pall
Folks, i was wondering if there is any mechanism/logic to move a node back from the excludedNodeList to live nodes to be tried for new block creation. In the current DFSClient code i do not see this. The use-case is if the write timeout is being reduced and certain nodes get aggressively added

Re: How to Confugur ECLIPSE for MAP_REDUCE

2012-11-19 Thread Harsh J
You could install the plugin attached at https://issues.apache.org/jira/browse/MAPREDUCE-1280 (https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar) and configure it in steps similar to the ones described at

XMLOutputFormat, anything in the works?

2012-11-19 Thread David Parks
Is there an XMLOutputFormat in existence somewhere? I need to output Solr XML change docs, I'm betting I'm not the first. David

Re: Clarifications on excludedNodeList in DFSClient

2012-11-19 Thread Inder Pall
On Mon, Nov 19, 2012 at 3:25 PM, Inder Pall inder.p...@gmail.com wrote: Folks, i was wondering if there is any mechanism/logic to move a node back from the excludedNodeList to live nodes to be tried for new block creation. In the current DFSClient code i do not see this. The use-case is if

Re: Pydoop 0.7.0-rc1 released

2012-11-19 Thread Luca Pireddu
On 11/16/2012 10:02 PM, Bart Verwilst wrote: Hi Simone, I was wondering, is it possible to write AVRO files to hadoop straight from your lib ( mixed with avro libs ofcourse )? I'm currently trying to come up with a way to read from mysql ( but more complicated than sqoop can handle ) and write

a question on NameNode

2012-11-19 Thread Kartashov, Andy
Guys, I am learning that NN doesn't persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts. So, how does it work? Say we only have one file A.txt in our HDFS that is split into 4 blocks

Re: a question on NameNode

2012-11-19 Thread Kai Voigt
Hi, Am 19.11.2012 um 15:27 schrieb Kartashov, Andy andy.kartas...@mpac.ca: I am learning that NN doesn’t persistently store block locations. Only file names and heir permissions as well as file blocks. It is said that locations come from DataNodes when NN starts. So, how does it work?

RE: a question on MapReduce

2012-11-19 Thread Kartashov, Andy
Guys, Sometimes when I run my MR job I see that Reduce tasks kick in as early as when Map task reached only about 20%. How can the MR be possibly so sure and start running Reduce at this point? What if a Mapper produce more keys that Reduce function already finished with? Andy Kartashov MPAC

RE: a question on MapReduce

2012-11-19 Thread Kartashov, Andy
Hehe,... good to know. Thanks. From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, November 19, 2012 9:50 AM To: user@hadoop.apache.org Subject: Re: a question on MapReduce Hello Andy, Reduce phase starts only once the Map phase is 100% complete. The reduce progress you see

RE: a question on NameNode

2012-11-19 Thread Kartashov, Andy
Thank you Kai.. One more question please. Does MapReduce run tasks of redundant blocks ? Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 - DN1 / block 1(replica #1) - DN2 / block1 (replica #2) - DN3 Will MR attempt: a. to start 3 Map

Re: a question on NameNode

2012-11-19 Thread Kai Voigt
Hi, Am 19.11.2012 um 16:14 schrieb Kartashov, Andy andy.kartas...@mpac.ca: Does MapReduce run tasks of redundant blocks ? Say you have only 1 block of data replicated 3 times, one block over each of three DNodes, block 1 – DN1 / block 1(replica #1) – DN2 / block1 (replica #2) – DN3

Re: a question on NameNode

2012-11-19 Thread Mohammad Tariq
Hello Andy, If you have not disabled the speculative execution then your second assumption is correct. Regards, Mohammad Tariq On Mon, Nov 19, 2012 at 8:44 PM, Kartashov, Andy andy.kartas...@mpac.cawrote: Thank you Kai.. One more question please. Does MapReduce run tasks of

Unsubscribe

2012-11-19 Thread Jibins Joseph
Unsubscribe The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and

Re: a question on NameNode

2012-11-19 Thread Ted Dunning
IT sounds like you could benefit from reading the basic papers on map-reduce in general. Hadoop is a reasonable facsimile of the original Google systems. Try looking at this: http://research.google.com/archive/mapreduce.html On Mon, Nov 19, 2012 at 7:14 AM, Kartashov, Andy

Re: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to java.lang.String

2012-11-19 Thread Harsh J
Hi, 1. Map/Reduce in 1.x. does not know how to efficiently and automatically serialize regular Java types such as String, Long, etc.. There is experimental (and I wouldn't recommend using it either) Java serialization support for such types in 2.x releases if you enable the JavaSerialization

Fwd: debugging hadoop streaming programs (first code)

2012-11-19 Thread jamal sasha
Hi, This is my first attempt to learn the map reduce abstraction. My problem is as follows I have a text file as follows: id 1, id2, date,time,mrps,code,code2 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 Now

Re: Secondary JobTracker

2012-11-19 Thread Harsh J
The function of a Secondary NameNode is to take checkpoints, not failover. If you are looking for HA JobTracker (not yet available) or Recoverable JobTracker functionality (already present today IIRC), look at the parent JIRA https://issues.apache.org/jira/browse/MAPREDUCE-2288 for some ideas that