Re: Using Slider as a default mechanism for HBase on HDP 2.2
Hi I was wondering the same, the following post suggests that HBase+Slider integration in Ambari should be delivered with HDP 2.3 or even 2.2.1 http://hortonworks.com/blog/discover-hdp-2-2-apache-hbase-yarn-slider-fast-nosql-access/ It also contains a link to a doc on how to install HBase over Slider on HDP 2.2 (docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/YARN_RM_v22/running_applications_on_slider/index.html#Item1.1.7) What I couldn't find is how you migrate from standalone to Slider-based for an existing HBase deployment. If you try that and have feedback don't hesitate to share Ulul Le 22/02/2015 16:33, Krishna Kishore Bonagiri a écrit : Hi, We just installed HDP 2.2 through Ambari. We were under the impression that in HDP 2.2., the default deployment mechanism for HBase/Accumulo is through Slider (i.e., they are enabled by default for YARN). However, that does not seem to be the case. Can we choose to install HBase through Slider during HDP installation through Ambari? i.e., was there a customization option that we are missing If Slider is not the default mechanism for HBase on HDP 2.2, why not? Thanks, Kishore
unscubscribe
unsubscribe
Default Block Size in HDFS
Hi, I have read somewhere that default block size in Hadoop 2.4 is 256MB . Is it correct ? In which version default block size was 128MB ? Thanks Krish
unsubscribe
unsubscribe.
Re: Default Block Size in HDFS
Sorry forgot that as of meant starting with :-) Actually 128 MB started around 2.2.0 2.0.5-alpha was still 64MB In any case it's just a default, it is often raised on production Le 22/02/2015 20:51, Ted Yu a écrit : As of Hadoop 2.6, default blocksize is 128 MB (look for dfs.blocksize) https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Cheers On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com mailto:gotomyp...@gmail.com wrote: Hi, I have read somewhere that default block size in Hadoop 2.4 is 256MB . Is it correct ? In which version default block size was 128MB ? Thanks Krish
Re: Default Block Size in HDFS
As of Hadoop 2.6, default blocksize is 128 MB (look for dfs.blocksize) https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Cheers On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com wrote: Hi, I have read somewhere that default block size in Hadoop 2.4 is 256MB . Is it correct ? In which version default block size was 128MB ? Thanks Krish
java.net.UnknownHostException on one node only
I am getting java.net.UnknownHost exception continuously on one node Hadoop MApReduce execution. That node is accessible via SSH. This node is shown in yarn node -list and hadfs dfsadmin -report queries. Below is the log from execution 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : attempt_1424622614381_0008_m_43_0, Status : FAILED Container launch failed for container_1424622614381_0008_01_16 : java.lang.IllegalArgumentException: *java.net.UnknownHostException: 101-master10* at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352) at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.net.UnknownHostException: 101-master10* ... 12 more 15/02/22 20:17:44 INFO Regards, Tariq
Re: java.net.UnknownHostException on one node only
Important is, that the canonical name must be the FQDN of the server, see the example. 1.1.1.1 one.one.org one namenode 1.1.1.2 two.one.og two datanode If DNS is used the system’s hostname must match the FQDN in forward as well as reverse name resolution. BR, Alex On 22 Feb 2015, at 21:51, tesm...@gmail.com wrote: I am getting java.net.UnknownHost exception continuously on one node Hadoop MApReduce execution. That node is accessible via SSH. This node is shown in yarn node -list and hadfs dfsadmin -report queries. Below is the log from execution 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : attempt_1424622614381_0008_m_43_0, Status : FAILED Container launch failed for container_1424622614381_0008_01_16 : java.lang.IllegalArgumentException: java.net.UnknownHostException: 101-master10 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352) at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: 101-master10 ... 12 more 15/02/22 20:17:44 INFO Regards, Tariq
Re: recombining split files after data is processed
Thanks Alex. where would that command be placed in a mapper or reducer or run as a command. Here at work we are looking to use Amazon EMR to do our number crunching and we have access to the master node, but not really the rest of the cluster. Can this be added as a step to be run after initial processing? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-02-23 08:05, Alexander Alten-Lorenz wrote: Hi, You can use an single reducer (http://wiki.apache.org/hadoop/HowManyMapsAndReduces [1]) for smaller datasets, or ‚getmerge': hadoop dfs -getmerge /hdfs/path local_file_name BR, Alex On 23 Feb 2015, at 08:00, Jonathan Aquilina jaquil...@eagleeyet.net wrote: Hey all, I understand that the purpose of splitting files is to distribute the data to multiple core and task nodes in a cluster. My question is that after the output is complete is there a way one can combine all the parts into a single file? -- Regards, Jonathan Aquilina Founder Eagle Eye T Links: -- [1] http://wiki.apache.org/hadoop/HowManyMapsAndReduces
Re: unsubscribe
Check http://hadoop.apache.org/mailing_lists.html#User Regards, Ramkumar Bashyam On Sun, Feb 22, 2015 at 1:48 PM, Mainak Bandyopadhyay mainak.bandyopadh...@gmail.com wrote: unsubscribe.
Re: unscubscribe
Check http://hadoop.apache.org/mailing_lists.html#User Regards, Ramkumar Bashyam On Mon, Feb 23, 2015 at 12:29 AM, Umesh Reddy ur2...@yahoo.com wrote: unsubscribe
RE: Hadoop - HTTPS communication between nodes - How to Confirm ?
Thanks for the help Ulul/Daemeon, Problem has sorted using lsof p datanode pid command. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com Email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243 From: Ulul [mailto:had...@ulul.org] Sent: Sunday, February 22, 2015 5:23 AM To: user@hadoop.apache.org Subject: Re: Hadoop - HTTPS communication between nodes - How to Confirm ? Hi Be careful, HTTPS is to secure WebHDFS. If you want to protect all network streams you need more than that : https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk _reference/content/reference_chap-wire-encryption.html https://s3.amazonaws.com/dev.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_ reference/content/reference_chap-wire-encryption.html If you're just interested in HTTPS an lsof -p datanode pid | grep TCP will show you that DN listening on 50075 for HTTP, 50475 for HTTPS. For namenode that would be 50070 and 50470 Ulul Le 21/02/2015 19:53, hadoop.supp...@visolve.com mailto:hadoop.supp...@visolve.com a écrit : Hello Everyone, We are trying to measure performance between HTTP and HTTPS version on Hadoop DFS, Mapreduce and other related modules. As of now, we have tested using several metrics on Hadoop HTTP Mode. Similiarly we are trying to test the same metrics on HTTPS Platform. Basically our test suite cluster consists of one Master Node and two Slave Nodes. We have configured HTTPS connection and now we need to verify whether Nodes are communicating directly through HTTPS. Tried checking logs, clusters webhdfs ui, health check information, dfs admin report but of no help. Since there is only limited documentation available in HTTPS, we are unable to verify whether Nodes are communicating through HTTPS. Hence any experts around here can shed some light on how to confirm HTTPS communication status between nodes (might be with mapreduce/DFS). Note: If I have missed any information, feel free to check with me for the same. Thanks, S.RagavendraGanesh ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com http://www.visolve.com email: servi...@visolve.com mailto:servi...@visolve.com | Phone: 408-850-2243
Re: writing mappers and reducers question
I suggest Standalone mode for developing mapper or reducer. But in case of partitioner or combiner, you need to setup Pseudo-Distributed mode. Drake 민영근 Ph.D kt NexR On Fri, Feb 20, 2015 at 3:18 PM, unmesha sreeveni unmeshab...@gmail.com wrote: You can write MapReduce jobs in eclipse also for testing purpose. Once it is done u can create jar and run that in your single node or multinode. But plese note while doing in such IDE s using hadoop dependecies, There will not be input splits, different mappers etc..
Re: Get method in Writable
Thanks Drake. That was the point.It was my mistake. On Mon, Feb 23, 2015 at 6:34 AM, Drake민영근 drake@nexr.com wrote: Hi, unmesha. I think this is a gson problem. you mentioned like this: But parsing canot be done in *MR2*. * TreeInfoWritable info = gson.toJson(setupData, TreeInfoWritable.class);* I think just use gson.fromJson, not toJson(setupData is already json string, i think). Is this right ? Drake 민영근 Ph.D kt NexR On Sat, Feb 21, 2015 at 4:55 PM, unmesha sreeveni unmeshab...@gmail.com wrote: Am I able to get the values from writable of a previous job. ie I have 2 MR jobs *MR 1:* I need to pass 3 element as values from reducer and the key is NullWritable. So I created a custom writable class to achieve this. * public class TreeInfoWritable implements Writable{* * DoubleWritable entropy;* * IntWritable sum;* * IntWritable clsCount;* * ..* *}* *MR 2:* I need to access MR 1 result in MR2 mapper setup function. And I accessed it as distributed cache (small file). Is there a way to get those values using get method. *while ((setupData = bf.readLine()) != null) {* * System.out.println(Setup Line +setupData);* * TreeInfoWritable info = //something i can pass to TreeInfoWritable and get values* * DoubleWritable entropy = info.getEntropy();* * System.out.println(entropy: +entropy);* *}* Tried to convert writable to gson format. *MR 1* *Gson gson = new Gson();* *String emitVal = gson.toJson(valEmit);* *context.write(out, new Text(emitVal));* But parsing canot be done in *MR2*. *TreeInfoWritable info = gson.toJson(setupData, TreeInfoWritable.class);* *Error: Type mismatch: cannot convert from String to TreeInfoWritable* Once it is changed to string we cannot get values. Am I able to get a workaround for the same. or to use just POJO classes instaed of Writable. I'm afraid if that becomes slower as we are depending on Java instaed of hadoop 's serializable classes -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
Re: java.net.UnknownHostException on one node only
Hi Tariq, Issues looks like DNS configuration issue. On Sun, Feb 22, 2015 at 3:51 PM, tesm...@gmail.com tesm...@gmail.com wrote: I am getting java.net.UnknownHost exception continuously on one node Hadoop MApReduce execution. That node is accessible via SSH. This node is shown in yarn node -list and hadfs dfsadmin -report queries. Below is the log from execution 15/02/22 20:17:42 INFO mapreduce.Job: Task Id : attempt_1424622614381_0008_m_43_0, Status : FAILED Container launch failed for container_1424622614381_0008_01_16 : java.lang.IllegalArgumentException: *java.net.UnknownHostException: 101-master10* at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373) at org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:352) at org.apache.hadoop.yarn.util.ConverterUtils.convertFromYarn(ConverterUtils.java:237) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:218) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.net.UnknownHostException: 101-master10* ... 12 more 15/02/22 20:17:44 INFO Regards, Tariq -- Regards, Varun Kumar.P