Job with only maptasks and map output still in local disk?

2015-02-20 Thread xeonmailinglist

Hi,

I noticed that when we have a mapreduce job with no reduce tasks, YARN 
saves the map output is the HDFS. I want that the job still save the map 
output in the local disk.


In YARN, is it possible to have a mapreduce job that only executes map 
tasks (no reduce tasks to execute), and that the map tasks still save 
the map output in the local disk?


Thanks,


Is there a way to submit a job using the YARN REST API?

2015-02-20 Thread xeonmailinglist

Hi,

Is there a way to submit a job using the YARN REST API?

Thanks,


Submit mapreduce job in remote YARN

2015-02-20 Thread xeonmailinglist

Hi,

I would like to submit a mapreduce job in a remote YARN cluster. Can I 
do this in java, or using  a REST API?


Thanks,


Steps for container release

2015-02-20 Thread Fabio C.
Hi everyone,
I was trying to understand the process that makes the resources of a
container available again to the ResourceManager.
As far as I can guess from the logs, the AM:
- sends a stop request to the NodeManager for the specific container
- suddenly tells the RM about the release of the resources, which become
available (queues are re-sorted).
Actually, I was expecting the RM to wait for an acknowledgment from the NM
(through NM-RM heartbeat) about the real end of the container, but it
looks to me that the resources are made available upon receiving this info
from the AM (AM-RM heartbeat).
Maybe the container decommission time is so small to be irrelevant?

The logs are at INFO level, and I can't change it to DEBUG since I'm not
the only one using the cluster, so maybe I am missing something.

Thanks

Fabio


Re: Is there a way to submit a job using the YARN REST API?

2015-02-20 Thread Ted Yu
Please take a look at https://issues.apache.org/jira/browse/MAPREDUCE-5874

Cheers 



 On Feb 20, 2015, at 3:11 AM, xeonmailinglist xeonmailingl...@gmail.com 
 wrote:
 
 Hi,
 
 Is there a way to submit a job using the YARN REST API?
 
 Thanks,


BLOCK and Split size question

2015-02-20 Thread SP
Hello Every one,

I have couple of doubts can any one please point me in right direction.

1What exactly happen when I want to copy 1TB file to Hadoop Cluster using
copyfromlocal command

1 what will be the split size? will it be same as the block size?

2 What is a block and split?


If we have 100 MB file and a block size of 64 MB, As we know it will be
divided into 2 blocks of 64 MB and 36 MB the second block still has 28 MB
of space left what will happen to that free space?
will the cluster have unequal block size or will it be occupied by other
file?


3) let’s say a 64MB block is on node A and replicated among 2 other
nodes(B,C), and the input split size for the map-reduce program is 64MB,
will this split just have location for node A? Or will it have locations
for all the three nodes A,b,C?


4) How is it handled if the Input Split size is greater or lesser than
block size?


can any one please help?

thanks

SP


How to Tune Hadoop Cluster from Administrator prospective

2015-02-20 Thread Krish Donald
Hi,

How to Tune Hadoop Cluster from Administrator prospective ?
What parameters we should consider etc?
What to look for performance tuning ?

Thanks
Krish


Scheduling in YARN according to available resources

2015-02-20 Thread tesm...@gmail.com
I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1
Namenode + 6 datanodes.

I followed the link from Hortonwroks [
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
] and made calculation according to the hardware configruation on my nodes.
Added the update mapred-site and yarn-site.xml files in my question. Still
my application is crashing with the same exection

My mapreduce application has 34 input splits with a block size of 128MB.

**mapred-site.xml** has the  following properties:

mapreduce.framework.name  = yarn
mapred.child.java.opts= -Xmx2048m
mapreduce.map.memory.mb   = 4096
mapreduce.map.java.opts   = -Xmx2048m

**yarn-site.xml** has the  following properties:

yarn.resourcemanager.hostname= hadoop-master
yarn.nodemanager.aux-services= mapreduce_shuffle
yarn.nodemanager.resource.memory-mb  = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144


 Exception from container-launch: ExitCodeException exitCode=134:
/bin/bash: line 1:  3876 Aborted  (core dumped)
/usr/lib/jvm/java-7-openjdk-amd64/bin/java
-Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx8192m
-Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_11/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11
-Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild
192.168.0.12 50842 attempt_1424264025191_0002_m_05_0 11 

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stdout
2

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stderr


How can avoid this?any help is appreciated

It looks to me that YAN is trying to launch all the container
simultaneously and anot according to the available resources. Is there an
option to restrict number of containers on hadoop ndoes?

Regards,
Tariq


Get method in Writable

2015-02-20 Thread unmesha sreeveni
Am I able to get the values from writable of a previous job.
ie I have 2 MR jobs
*MR 1:*
 I need to pass 3 element as values from reducer and the key is
NullWritable. So I created a custom writable class to achieve this.
* public class TreeInfoWritable implements Writable{*

* DoubleWritable entropy;*
* IntWritable sum;*
* IntWritable clsCount;*
* ..*
*}*
*MR 2:*
 I need to access MR 1 result in MR2 mapper setup function. And I accessed
it as distributed cache (small file).
 Is there a way to get those values using get method.
 *while ((setupData = bf.readLine()) != null) {*
* System.out.println(Setup Line +setupData);*
* TreeInfoWritable info = //something i can pass to TreeInfoWritable and
get values*
* DoubleWritable entropy = info.getEntropy();*
* System.out.println(entropy: +entropy);*
*}*

Tried to convert writable to gson format.
*MR 1*
*Gson gson = new Gson();*
*String emitVal = gson.toJson(valEmit);*
*context.write(out, new Text(emitVal));*

 But parsing canot be done in *MR2*.
*TreeInfoWritable info = gson.toJson(setupData, TreeInfoWritable.class);*

*Error: Type mismatch: cannot convert from String to TreeInfoWritable*
Once it is changed to string we cannot get values.

Am I able to get a workaround for the same. or to use just POJO classes
instaed of Writable. I'm afraid if that becomes slower as we are depending
on Java instaed of hadoop 's serializable classes


secure checksum in HDFS

2015-02-20 Thread xeonmailinglist

Hi,

Is it possible to use SHA-256, or MD5 as a checksum in a file in HDFS?

Thanks,


YARN container lauch failed exception and mapred-site.xml configuration

2015-02-20 Thread tesm...@gmail.com
I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1
Namenode + 6 datanodes.

**EDIT-1@ARNON:** I followed the link, mad calculation according to the
hardware configruation on my nodes and have added the update mapred-site
and yarn-site.xml files in my question. Still my application is crashing
with the same exection

My mapreduce application has 34 input splits with a block size of 128MB.

**mapred-site.xml** has the  following properties:

mapreduce.framework.name  = yarn
mapred.child.java.opts= -Xmx2048m
mapreduce.map.memory.mb   = 4096
mapreduce.map.java.opts   = -Xmx2048m

**yarn-site.xml** has the  following properties:

yarn.resourcemanager.hostname= hadoop-master
yarn.nodemanager.aux-services= mapreduce_shuffle
yarn.nodemanager.resource.memory-mb  = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144


 Exception from container-launch: ExitCodeException exitCode=134:
/bin/bash: line 1:  3876 Aborted  (core dumped)
/usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx8192m
-Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_11/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 192.168.0.12 50842
attempt_1424264025191_0002_m_05_0 11 

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stdout
2

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stderr


How can avoid this?any help is appreciated

Is there an option to restrict number of containers on hadoop ndoes?


Re: Scheduling in YARN according to available resources

2015-02-20 Thread R Nair
I had an issue very similar, I changed and used Oracle JDK. There is
nothing I see wrong with your configuration in my first look, thanks

Regards,
Nair

On Sat, Feb 21, 2015 at 1:42 AM, tesm...@gmail.com tesm...@gmail.com
wrote:

 I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1
 Namenode + 6 datanodes.

 I followed the link from Hortonwroks [
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
 ] and made calculation according to the hardware configruation on my
 nodes. Added the update mapred-site and yarn-site.xml files in my question.
 Still my application is crashing with the same exection

 My mapreduce application has 34 input splits with a block size of 128MB.

 **mapred-site.xml** has the  following properties:

 mapreduce.framework.name  = yarn
 mapred.child.java.opts= -Xmx2048m
 mapreduce.map.memory.mb   = 4096
 mapreduce.map.java.opts   = -Xmx2048m

 **yarn-site.xml** has the  following properties:

 yarn.resourcemanager.hostname= hadoop-master
 yarn.nodemanager.aux-services= mapreduce_shuffle
 yarn.nodemanager.resource.memory-mb  = 6144
 yarn.scheduler.minimum-allocation-mb = 2048
 yarn.scheduler.maximum-allocation-mb = 6144


  Exception from container-launch: ExitCodeException exitCode=134:
 /bin/bash: line 1:  3876 Aborted  (core dumped)
 /usr/lib/jvm/java-7-openjdk-amd64/bin/java
 -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN -Xmx8192m
 -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_11/tmp
 -Dlog4j.configuration=container-log4j.properties
 -Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11
 -Dyarn.app.container.log.filesize=0
 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild
 192.168.0.12 50842 attempt_1424264025191_0002_m_05_0 11 

 /home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stdout
 2

 /home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stderr


 How can avoid this?any help is appreciated

 It looks to me that YAN is trying to launch all the container
 simultaneously and anot according to the available resources. Is there an
 option to restrict number of containers on hadoop ndoes?

 Regards,
 Tariq




-- 
Warmest Regards,

Ravi Shankar


Re: Encryption At Rest Question

2015-02-20 Thread Ranadip Chatterjee
In case of SSL enabled cluster, the DEK will be encrypted on the wire by
the SSL layer.

In case of non-SSL enabled cluster, it is not. But the intercepter only
gets the DEK and not the encrypted data, so the data is still safe. Only if
the intercepter also manages to gain access to the encrypted data block and
associate that with the corresponding DEK, then the data is compromised.
Given that each HDFS file has a different DEK, the intercepter has to gain
quite a bit of access before the data is compromised.

On 18 February 2015 at 00:04, Plamen Jeliazkov 
plamen.jeliaz...@wandisco.com wrote:

 Hey guys,

 I had a question about how the new file encryption work done primarily in
 HDFS-6134.

 I was just curious, how is the DEK protected on the wire?
 Particularly after the KMS decrypts the EDEK and returns it to the client.

 Thanks,
 -Plamen



 5 reasons your Hadoop needs WANdisco
 http://www.wandisco.com/system/files/documentation/5-Reasons.pdf

 Listed on the London Stock Exchange: WAND
 http://www.bloomberg.com/quote/WAND:LN

 THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE
 PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its
 subsidiaries, (WANdisco) does not waive any confidentiality or
 privilege.  If you are not the intended recipient, please notify us
 immediately and destroy the message without disclosing its contents to
 anyone.  Any distribution, use or copying of this e-mail or the information
 it contains by other than an intended recipient is unauthorized.  The views
 and opinions expressed in this e-mail message are the author's own and may
 not reflect the views and opinions of WANdisco, unless the author is
 authorized by WANdisco to express such views or opinions on its behalf.
 All email sent to or from this address is subject to electronic storage and
 review by WANdisco.  Although WANdisco operates anti-virus programs, it
 does not accept responsibility for any damage whatsoever caused by viruses
 being passed.




-- 
Regards,
Ranadip Chatterjee


Re: Scheduling in YARN according to available resources

2015-02-20 Thread tesm...@gmail.com
Thanks for your answer Nair,
Is installing Oracle JDK on Ubuntu is that complicated as described in this
link
http://askubuntu.com/questions/56104/how-can-i-install-sun-oracles-proprietary-java-jdk-6-7-8-or-jre

Is there an alternate?

Regards


On Sat, Feb 21, 2015 at 6:50 AM, R Nair ravishankar.n...@gmail.com wrote:

 I had an issue very similar, I changed and used Oracle JDK. There is
 nothing I see wrong with your configuration in my first look, thanks

 Regards,
 Nair

 On Sat, Feb 21, 2015 at 1:42 AM, tesm...@gmail.com tesm...@gmail.com
 wrote:

 I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1
 Namenode + 6 datanodes.

 I followed the link from Hortonwroks [
 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
 ] and made calculation according to the hardware configruation on my
 nodes. Added the update mapred-site and yarn-site.xml files in my question.
 Still my application is crashing with the same exection

 My mapreduce application has 34 input splits with a block size of 128MB.

 **mapred-site.xml** has the  following properties:

 mapreduce.framework.name  = yarn
 mapred.child.java.opts= -Xmx2048m
 mapreduce.map.memory.mb   = 4096
 mapreduce.map.java.opts   = -Xmx2048m

 **yarn-site.xml** has the  following properties:

 yarn.resourcemanager.hostname= hadoop-master
 yarn.nodemanager.aux-services= mapreduce_shuffle
 yarn.nodemanager.resource.memory-mb  = 6144
 yarn.scheduler.minimum-allocation-mb = 2048
 yarn.scheduler.maximum-allocation-mb = 6144


  Exception from container-launch: ExitCodeException exitCode=134:
 /bin/bash: line 1:  3876 Aborted  (core dumped)
 /usr/lib/jvm/java-7-openjdk-amd64/bin/java
 -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN -Xmx8192m
 -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_11/tmp
 -Dlog4j.configuration=container-log4j.properties
 -Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11
 -Dyarn.app.container.log.filesize=0
 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild
 192.168.0.12 50842 attempt_1424264025191_0002_m_05_0 11 

 /home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stdout
 2

 /home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stderr


 How can avoid this?any help is appreciated

 It looks to me that YAN is trying to launch all the container
 simultaneously and anot according to the available resources. Is there
 an option to restrict number of containers on hadoop ndoes?

 Regards,
 Tariq




 --
 Warmest Regards,

 Ravi Shankar



Fwd: YARN container lauch failed exception and mapred-site.xml configuration

2015-02-20 Thread tesm...@gmail.com
I have 7 nodes in my Hadoop cluster [8GB RAM and 4VCPUs to each nodes], 1
Namenode + 6 datanodes.

I followed the link o horton works [
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html]
and made  calculation according to the hardware configruation on my nodes
and have added the update mapred-site and yarn-site.xml files in my
question. Still my application is crashing with the same exection

My mapreduce application has 34 input splits with a block size of 128MB.

**mapred-site.xml** has the  following properties:

mapreduce.framework.name  = yarn
mapred.child.java.opts= -Xmx2048m
mapreduce.map.memory.mb   = 4096
mapreduce.map.java.opts   = -Xmx2048m

**yarn-site.xml** has the  following properties:

yarn.resourcemanager.hostname= hadoop-master
yarn.nodemanager.aux-services= mapreduce_shuffle
yarn.nodemanager.resource.memory-mb  = 6144
yarn.scheduler.minimum-allocation-mb = 2048
yarn.scheduler.maximum-allocation-mb = 6144


 Exception from container-launch: ExitCodeException exitCode=134:
/bin/bash: line 1:  3876 Aborted  (core dumped)
/usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx8192m
-Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1424264025191_0002/container_1424264025191_0002_01_11/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 192.168.0.12 50842
attempt_1424264025191_0002_m_05_0 11 

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stdout
2

/home/ubuntu/hadoop/logs/userlogs/application_1424264025191_0002/container_1424264025191_0002_01_11/stderr


How can avoid this?any help is appreciated

Is there an option to restrict number of containers on hadoop ndoes?


Re: secure checksum in HDFS

2015-02-20 Thread Shahab Yunus
There seem to be some work done on this here:
https://issues.apache.org/jira/browse/HADOOP-9209

3rd party tool:
https://github.com/rdsr/hdfs-checksum

Regards,
Shahab

On Fri, Feb 20, 2015 at 12:39 PM, xeonmailinglist xeonmailingl...@gmail.com
 wrote:

 Hi,

 Is it possible to use SHA-256, or MD5 as a checksum in a file in HDFS?

 Thanks,



Re: How to get Hadoop's Generic Options value

2015-02-20 Thread Rajesh Kartha
Here is an example:
https://adhoop.wordpress.com/2012/02/16/generate-a-list-of-anagrams-round-3/

-Rajesh

On Thu, Feb 19, 2015 at 9:32 PM, Haoming Zhang haoming.zh...@outlook.com
wrote:

 Thanks guys,

 I will try your solutions later and update the result!

 --
 From: unmeshab...@gmail.com
 Date: Fri, 20 Feb 2015 10:04:38 +0530
 Subject: Re: How to get Hadoop's Generic Options value
 To: user@hadoop.apache.org


 Try implementing your program

 public class YourDriver extends Configured implements Tool {

 main()
 run()
 }

 Then supply your file using -D option.

 Thanks
 Unmesha Biju





Re: Submit mapreduce job in remote YARN

2015-02-20 Thread 杨浩
yes, you can do this in java, if these conditions are satisfied

   1. your client is in the same network with the hadoop cluster
   2. add the hadoop configuration to your java classpath, then the jvm
   will load the hadoop configuration

but the suggesttiong way is

 hadoop jar


2015-02-20 19:18 GMT+08:00 xeonmailinglist xeonmailingl...@gmail.com:

 Hi,

 I would like to submit a mapreduce job in a remote YARN cluster. Can I do
 this in java, or using  a REST API?

 Thanks,



Re: suspend and resume a job in execution?

2015-02-20 Thread Ravi Prakash
I am not aware of an API that would let you do this. You may be able to move an 
application to a queue with 0 resources to achieve the desired behavior but I'm 
not entirely sure.
 

 On Wednesday, February 18, 2015 9:24 AM, xeonmailinglist 
xeonmailingl...@gmail.com wrote:
   

 By job, I mean an mapreduce job. I would like to suspend and resume the 
mapreduce job whilst it is executing.


On 18-02-2015 12:10, xeonmailinglist wrote:
 Hi,

 I want to suspend a job that it is in execution when all maptasks 
 finishes, and then resume the job later.

 Can I do this in yarn? Is there an API for that, or I must use the 
 command line?

 Thanks,