Re: Using Slider as a default mechanism for HBase on HDP 2.2

2015-02-22 Thread Ulul

Hi

I was wondering the same, the following post suggests that HBase+Slider 
integration in Ambari should be delivered with HDP 2.3 or even 2.2.1

http://hortonworks.com/blog/discover-hdp-2-2-apache-hbase-yarn-slider-fast-nosql-access/
It also contains a link to a doc on how to install HBase over Slider on 
HDP 2.2 
(docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/running_applications_on_slider/index.html#Item1.1.7)


What I couldn't find is how you migrate from standalone to 
Slider-based for an existing HBase deployment. If you try that and have 
feedback don't hesitate to share


Ulul

Le 22/02/2015 16:33, Krishna Kishore Bonagiri a écrit :


Hi,

We just installed HDP 2.2 through Ambari. We were under the impression 
that in HDP 2.2., the default deployment mechanism for HBase/Accumulo 
is through Slider (i.e., they are enabled by default for YARN). 
However, that does not seem to be the case. Can we choose to install 
HBase through Slider during HDP installation through Ambari? i.e., was 
there a customization option that we are missing


If Slider is not the default mechanism for HBase on HDP 2.2, why not?

Thanks,

Kishore





unscubscribe

2015-02-22 Thread Umesh Reddy
unsubscribe


Default Block Size in HDFS

2015-02-22 Thread Krish Donald
Hi,

I have read somewhere that default block size in Hadoop 2.4 is 256MB .
Is it correct ?
In which version default block size was 128MB ?

Thanks
Krish


unsubscribe

2015-02-22 Thread Mainak Bandyopadhyay
unsubscribe.


Re: Default Block Size in HDFS

2015-02-22 Thread Ulul

Sorry forgot that as of meant starting with :-)

Actually 128 MB started around 2.2.0
2.0.5-alpha was still 64MB

In any case it's just a default, it is often raised on production

Le 22/02/2015 20:51, Ted Yu a écrit :

As of Hadoop 2.6, default blocksize is 128 MB (look for dfs.blocksize)
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Cheers

On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com 
mailto:gotomyp...@gmail.com wrote:


Hi,

I have read somewhere that default block size in Hadoop 2.4 is 256MB .
Is it correct ?
In which version default block size was 128MB ?

Thanks
Krish






Re: Default Block Size in HDFS

2015-02-22 Thread Ted Yu
As of Hadoop 2.6, default blocksize is 128 MB (look for dfs.blocksize)
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Cheers

On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com wrote:

 Hi,

 I have read somewhere that default block size in Hadoop 2.4 is 256MB .
 Is it correct ?
 In which version default block size was 128MB ?

 Thanks
 Krish



java.net.UnknownHostException on one node only

2015-02-22 Thread tesm...@gmail.com
I am getting java.net.UnknownHost exception continuously on one node Hadoop
MApReduce execution.

That node is accessible via SSH. This node is shown in yarn node -list
and hadfs dfsadmin -report queries.

Below is the log from execution

15/02/22 20:17:42 INFO mapreduce.Job: Task Id :
attempt_1424622614381_0008_m_43_0, Status : FAILED
Container launch failed for container_1424622614381_0008_01_16 :
java.lang.IllegalArgumentException: *java.net.UnknownHostException:
101-master10*
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at
org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352)
at
org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
*Caused by: java.net.UnknownHostException: 101-master10*
... 12 more



15/02/22 20:17:44 INFO

Regards,
Tariq


Re: java.net.UnknownHostException on one node only

2015-02-22 Thread Alexander Alten-Lorenz
Important is, that the canonical name must be the FQDN of the server, see the 
example.

1.1.1.1 one.one.org one namenode
1.1.1.2 two.one.og two datanode

If DNS is used the system’s hostname must match the FQDN in forward as well as 
reverse name resolution.

BR,
 Alex


 On 22 Feb 2015, at 21:51, tesm...@gmail.com wrote:
 
 I am getting java.net.UnknownHost exception continuously on one node Hadoop 
 MApReduce execution.
 
 That node is accessible via SSH. This node is shown in yarn node -list and 
 hadfs dfsadmin -report queries.
 
 Below is the log from execution
 
 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : 
 attempt_1424622614381_0008_m_43_0, Status : FAILED
 Container launch failed for container_1424622614381_0008_01_16 : 
 java.lang.IllegalArgumentException: java.net.UnknownHostException: 
 101-master10
   at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
   at 
 org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352)
   at 
 org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.net.UnknownHostException: 101-master10
   ... 12 more
 
 
 
 15/02/22 20:17:44 INFO
 
 Regards,
 Tariq



Re: recombining split files after data is processed

2015-02-22 Thread Jonathan Aquilina
 

Thanks Alex. where would that command be placed in a mapper or reducer
or run as a command. Here at work we are looking to use Amazon EMR to do
our number crunching and we have access to the master node, but not
really the rest of the cluster. Can this be added as a step to be run
after initial processing? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-23 08:05, Alexander Alten-Lorenz wrote: 

 Hi, 
 
 You can use an single reducer 
 (http://wiki.apache.org/hadoop/HowManyMapsAndReduces [1]) for smaller 
 datasets, or ‚getmerge': hadoop dfs -getmerge /hdfs/path local_file_name 
 
 BR, 
 Alex 
 
 On 23 Feb 2015, at 08:00, Jonathan Aquilina jaquil...@eagleeyet.net wrote: 
 
 Hey all, 
 
 I understand that the purpose of splitting files is to distribute the data 
 to multiple core and task nodes in a cluster. My question is that after the 
 output is complete is there a way one can combine all the parts into a 
 single file? 
 
 -- 
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 

Links:
--
[1] http://wiki.apache.org/hadoop/HowManyMapsAndReduces


Re: unsubscribe

2015-02-22 Thread Ram Kumar
Check http://hadoop.apache.org/mailing_lists.html#User

Regards,
Ramkumar Bashyam

On Sun, Feb 22, 2015 at 1:48 PM, Mainak Bandyopadhyay 
mainak.bandyopadh...@gmail.com wrote:

 unsubscribe.





Re: unscubscribe

2015-02-22 Thread Ram Kumar
Check http://hadoop.apache.org/mailing_lists.html#User

Regards,
Ramkumar Bashyam

On Mon, Feb 23, 2015 at 12:29 AM, Umesh Reddy ur2...@yahoo.com wrote:

 unsubscribe



RE: Hadoop - HTTPS communication between nodes - How to Confirm ?

2015-02-22 Thread hadoop.support
Thanks for the help Ulul/Daemeon,

 

Problem has sorted using lsof –p datanode pid command. 

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

Email: servi...@visolve.com mailto:servi...@visolve.com  | Phone:
408-850-2243

 

From: Ulul [mailto:had...@ulul.org] 
Sent: Sunday, February 22, 2015 5:23 AM
To: user@hadoop.apache.org
Subject: Re: Hadoop - HTTPS communication between nodes - How to Confirm ?

 

Hi

Be careful, HTTPS is to secure WebHDFS. If you want to protect all network
streams you need more than that :
 
https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk
_reference/content/reference_chap-wire-encryption.html
https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_
reference/content/reference_chap-wire-encryption.html

If you're just interested in HTTPS an lsof -p datanode pid | grep TCP will
show you that DN listening on 50075 for HTTP, 50475 for HTTPS. For namenode
that would be 50070 and 50470

Ulul
 

Le 21/02/2015 19:53, hadoop.supp...@visolve.com
mailto:hadoop.supp...@visolve.com  a écrit :

Hello Everyone,

 

We are trying to measure performance between HTTP and HTTPS version on
Hadoop DFS, Mapreduce and other related modules.

 

As of now, we have tested using several metrics on Hadoop HTTP Mode.
Similiarly we are trying to test the same metrics on HTTPS Platform.
Basically our test suite cluster consists of one Master Node and two Slave
Nodes. 

We have configured HTTPS connection and now we need to verify whether Nodes
are communicating directly through HTTPS. Tried checking logs, clusters
webhdfs ui, health check information, dfs admin report but of no help. Since
there is only limited documentation available in HTTPS, we are unable to
verify whether Nodes are communicating through HTTPS. 

 

Hence any experts around here can shed some light on how to confirm HTTPS
communication status between nodes (might be with mapreduce/DFS).

 

Note: If I have missed any information, feel free to check with me for the
same. 

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

email: servi...@visolve.com mailto:servi...@visolve.com  | Phone:
408-850-2243

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 



Re: writing mappers and reducers question

2015-02-22 Thread Drake민영근
I suggest Standalone mode for developing mapper or reducer. But in case of
partitioner or combiner, you need to setup Pseudo-Distributed mode.

Drake 민영근 Ph.D
kt NexR

On Fri, Feb 20, 2015 at 3:18 PM, unmesha sreeveni unmeshab...@gmail.com
wrote:

 You can write MapReduce jobs in eclipse also for testing purpose. Once it
 is done u can create jar and run that in your single node or multinode.
 But plese note while doing in such IDE s using hadoop dependecies, There
 will not be input splits, different mappers etc..





Re: Get method in Writable

2015-02-22 Thread unmesha sreeveni
Thanks Drake.
That was the point.It was my mistake.

On Mon, Feb 23, 2015 at 6:34 AM, Drake민영근 drake@nexr.com wrote:

 Hi, unmesha.

 I think this is a gson problem. you mentioned like this:

  But parsing canot be done in *MR2*.
 * TreeInfoWritable info = gson.toJson(setupData,
 TreeInfoWritable.class);*

 I think just use gson.fromJson, not toJson(setupData is already json
 string, i think).

 Is this right ?

 Drake 민영근 Ph.D
 kt NexR

 On Sat, Feb 21, 2015 at 4:55 PM, unmesha sreeveni unmeshab...@gmail.com
 wrote:

 Am I able to get the values from writable of a previous job.
 ie I have 2 MR jobs
 *MR 1:*
  I need to pass 3 element as values from reducer and the key is
 NullWritable. So I created a custom writable class to achieve this.
 * public class TreeInfoWritable implements Writable{*

 * DoubleWritable entropy;*
 * IntWritable sum;*
 * IntWritable clsCount;*
 * ..*
 *}*
 *MR 2:*
  I need to access MR 1 result in MR2 mapper setup function. And I
 accessed it as distributed cache (small file).
  Is there a way to get those values using get method.
  *while ((setupData = bf.readLine()) != null) {*
 * System.out.println(Setup Line +setupData);*
 * TreeInfoWritable info = //something i can pass to TreeInfoWritable and
 get values*
 * DoubleWritable entropy = info.getEntropy();*
 * System.out.println(entropy: +entropy);*
 *}*

 Tried to convert writable to gson format.
 *MR 1*
 *Gson gson = new Gson();*
 *String emitVal = gson.toJson(valEmit);*
 *context.write(out, new Text(emitVal));*

  But parsing canot be done in *MR2*.
 *TreeInfoWritable info = gson.toJson(setupData, TreeInfoWritable.class);*

 *Error: Type mismatch: cannot convert from String to TreeInfoWritable*
 Once it is changed to string we cannot get values.

 Am I able to get a workaround for the same. or to use just POJO classes
 instaed of Writable. I'm afraid if that becomes slower as we are depending
 on Java instaed of hadoop 's serializable classes








-- 
*Thanks  Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Re: java.net.UnknownHostException on one node only

2015-02-22 Thread Varun Kumar
Hi Tariq,

Issues looks like DNS configuration issue.


On Sun, Feb 22, 2015 at 3:51 PM, tesm...@gmail.com tesm...@gmail.com
wrote:

 I am getting java.net.UnknownHost exception continuously on one node
 Hadoop MApReduce execution.

 That node is accessible via SSH. This node is shown in yarn node -list
 and hadfs dfsadmin -report queries.

 Below is the log from execution

 15/02/22 20:17:42 INFO mapreduce.Job: Task Id :
 attempt_1424622614381_0008_m_43_0, Status : FAILED
 Container launch failed for container_1424622614381_0008_01_16 :
 java.lang.IllegalArgumentException: *java.net.UnknownHostException:
 101-master10*
 at
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
 at
 org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352)
 at
 org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
 at
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
 at
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 *Caused by: java.net.UnknownHostException: 101-master10*
 ... 12 more



 15/02/22 20:17:44 INFO

 Regards,
 Tariq




-- 
Regards,
Varun Kumar.P