Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
Thank you, Rohit! Any multiple outputs sample code in python? Rex On Wed, Jan 20, 2016 at 10:04 PM, rohit sarewar wrote: > Hi Rex > > Please explore multiple outputs >

Re: Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread rohit sarewar
Hi Rex Please explore multiple outputs . Regards Rohit Sarewar On Thu, Jan 21, 2016 at 5:13 AM, Rex X wrote: > Dear all, > > To be specific, for example, given > >

Re: container error for bad configuration

2016-01-20 Thread Gaurav Gupta
Container will take the hdfs-site.xml from the classpath. Can you check if your classpath is correct? On Tue, Jan 19, 2016 at 5:28 PM, yaoxiaohua wrote: > Hi Gupta, > > Thanks for your reply, > > I observe the app running ,and find some

Re: HDFS short-circuit tokens expiring

2016-01-20 Thread Chris Nauroth
Hi Nick, This is something that the client does recover from internally. I expect the exception would not propagate to the caller, and therefore the reading process would continue successfully. There is a potential for a latency hit on the extra NameNode RPC required for recovery. On

Blocks processed by which task?

2016-01-20 Thread Sultan Alamro
Hi there, How do I know which block in my HDFS processed by which task? I want to make sure if my Hadoop applies "Locality" concept or not. Thanks,

Unsubscribe

2016-01-20 Thread Jason Rich
Unsubscribe - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org

Hadoop Streaming: How to parition output into subfolders?

2016-01-20 Thread Rex X
Dear all, To be specific, for example, given hadoop jar hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc Where myInputDirs has a *dated* subfolder structure of /input_dir//mm/dd/part-* I want

Re: JobHistory location/configuration

2016-01-20 Thread Alexander Alten-Lorenz
Robert, you should also take a look at: mapreduce.jobhistory.max-age-ms configures how long the files are held (default one week). And you’re right, just wait the configured time to see the logs. > On Jan 20, 2016, at 11:21 AM, Robert Schmidtke wrote: > > Nvm, it

Re: JobHistory location/configuration

2016-01-20 Thread Robert Schmidtke
Okay so I assumed I could specify paths on my local filesystem, which is not the case. After doing a hadoop fs -ls -R on my HDFS I found the done and done_intermediate folders. However only the intermediate folder contains files (even after the job finished). Is there any amount of time I have to

Re: JobHistory location/configuration

2016-01-20 Thread Robert Schmidtke
Nvm, it would seem mapreduce.jobhistory.move.interval-ms specifies exactly that. On Wed, Jan 20, 2016 at 10:56 AM, Robert Schmidtke wrote: > Okay so I assumed I could specify paths on my local filesystem, which is > not the case. After doing a hadoop fs -ls -R on my HDFS

Container exited with a non-zero exit code 1-SparkJOb on YARN

2016-01-20 Thread Siddharth Ubale
Hi, I am running a Spark Job on the yarn cluster. The spark job is a spark streaming application which is reading JSON from a kafka topic , inserting the JSON values to hbase tables via Phoenix , ands then sending out certain messages to a websocket if the JSON satisfies a certain criteria.