Re: Questions about Setting Up Encryption

2020-01-06 Thread Hariharan
For 1) you can set up transparent encryption at the root directory level for HDFS. However this works at file level and not volume level. For volume level encryption you will have to use something like LUKS only. For 2), in addition to the steps mentioned in "data confidentiality", you may also ne

Re: Create a block - file map

2020-01-01 Thread Amith sha
enable DEBUG mode on org.apache.hadoop.hdfs.server.blockmanagement on namenode. Thanks & Regards Amithsha On Wed, Jan 1, 2020 at 4:55 AM Arpit Agarwal wrote: > That is the only way to do it using the client API. > > Just curious why you need the mapping. > > > On Tue, Dec 31, 2019, 00:41 David

Re: Create a block - file map

2019-12-31 Thread Arpit Agarwal
That is the only way to do it using the client API. Just curious why you need the mapping. On Tue, Dec 31, 2019, 00:41 Davide Vergari wrote: > Hi all, > I need to create a block map for all files in a specific directory (and > subdir) in HDFS. > > I'm using fs.listFiles API then I loop in the

Re: Spark Dataset API for secondary sorting

2019-12-24 Thread Akira Ajisaka
Hi Daniel, This is the user mailing list for Apache Hadoop, not Apache Spark. Please use instead. https://spark.apache.org/community.html -Akira On Tue, Dec 3, 2019 at 1:00 AM Daniel Zhang wrote: > Hi, Spark Users: > > I have a question related to the way I use the spark Dataset API for my >

Re: hadoop java compatability

2019-12-24 Thread Akira Ajisaka
Hi Augustine, Java 11 is not supported even in the latest version in Apache Hadoop. I hope Apache Hadoop 3.3.0 will support Java 11 (only runtime support) but 3.3.0 is not released yet. (Our company (Yahoo! JAPAN) builds trunk with OpenJDK 8 and run HDFS dev cluster with OpenJDK 11 successfully.)

Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-24 Thread tobe
> > > > Regards > > > > *De :* tobe > *Envoyé :* mardi 24 décembre 2019 08:15 > *À :* Vinod Kumar Vavilapalli > *Cc :* user.hadoop > *Objet :* Re: How can we access multiple Kerberos-enabled Hadoop with > different users in single JVM process > > >

RE: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-24 Thread Thibault VERBEQUE
). Regards De : tobe Envoyé : mardi 24 décembre 2019 08:15 À : Vinod Kumar Vavilapalli Cc : user.hadoop Objet : Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process Thanks @Vinod and proxy-users was considered. But what we want to support is

Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-23 Thread tobe
Thanks @Vinod and proxy-users was considered. But what we want to support is accessing multiple secured Hadoop. If we want to initialize the Kerberos credentials, we need config the file of /etc/krb5.conf. If we want to access two different Kerberos services(specified KDC), we can not run JVM pro

Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-23 Thread Vinod Kumar Vavilapalli
You are looking for the proxy-users pattern. See here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html Thanks +Vinod > On Dec 24, 2019, at 9:49 AM, tobe wrote: > > Currently Hadoop relies on Kerberos to do authentication and authorization. > For single

Re: Performance issue

2019-12-13 Thread HarshaKiran Reddy Boreddy
I used same environment and same configurations..Not used EC.. On Friday, December 13, 2019, Julien Laurenceau < julien.laurenc...@pepitedata.com> wrote: > Hi > Could you please share more details ? > We can assume it is the same hardware and same OS / JVM. > What about the hadoop configuration ?

Re: Performance issue

2019-12-13 Thread Julien Laurenceau
Hi Could you please share more details ? We can assume it is the same hardware and same OS / JVM. What about the hadoop configuration ? Are you using Erasure Coding ? Regards Le jeu. 12 déc. 2019 à 14:12, HarshaKiran Reddy Boreddy a écrit : > Yes, Even in HDFS(3.1.1), I have seen the performan

Re: Performance issue

2019-12-12 Thread HarshaKiran Reddy Boreddy
Yes, Even in HDFS(3.1.1), I have seen the performance degradation compared with 2.7.2. I have done the DFSIO test following are the details. Test: DATA SIZE: 1TB No of Files : 1000 File size : 1GB *Hadoop-3.1.1::* *DFSIO Write : 1420.99 sec* *Hadoop-2.7.2::* *DFSIO Write : 1297.971 sec* so,t

Re: How to set YARN_CONF_DIR environment variable on remote client?

2019-12-05 Thread Zhankun Tang
Hi Piper, Just set HADOOP_CONF_DIR should work. Did you try that? BR, Zhankun On Fri, 6 Dec 2019 at 00:43, Piper Piper wrote: > Hello, > > I want to run Spark or Flink jobs from a client (remote desktop) onto a > YARN cluster. Another example will be if I am running a YARN cluster on > VMs, the

Re: user notification upon application error

2019-12-04 Thread Prabhu Josephraj
MapReduce Application can be configured to notify the status on completion through mapreduce.job.end-notification.url. Need to write a webservice to collect the status and send email to users. Below has examples https://hadoopi.wordpress.com/2013/09/18/hadoop-get-a-callback-on-mapreduce-job-comple

Re: Hadoop Native Build Steps

2019-11-28 Thread István Fajth
Hi, for me on Mac the natives are compiling fine (except FUSE but that is normal as it is binding to Linux libraries to the best of my knowledge). I would check 2 things: Are libsasl libraries there in /usr/lib/libsasl*? I have these files/links: lrwxr-xr-x 1 root wheel 16 May 4 2019 /usr

Re: how to list yarn applications by creation time and filter by username?

2019-11-27 Thread Prabhu Josephraj
...@cloudera.com] > *Sent:* Wednesday, November 27, 2019 8:09 PM > *To:* Manuel Sopena Ballesteros > *Cc:* user@hadoop.apache.org > *Subject:* Re: how to list yarn applications by creation time and filter > by username? > > > > Yarn CLI does not do that, i think u need to w

RE: how to list yarn applications by creation time and filter by username?

2019-11-27 Thread Manuel Sopena Ballesteros
Thanks Prabhu, Do you know which yarn command can I use in order to get application creation time? Thank you Manuel From: Prabhu Josephraj [mailto:pjos...@cloudera.com] Sent: Wednesday, November 27, 2019 8:09 PM To: Manuel Sopena Ballesteros Cc: user@hadoop.apache.org Subject: Re: how to list

Re: how to list yarn applications by creation time and filter by username?

2019-11-27 Thread Prabhu Josephraj
Yarn CLI does not do that, i think u need to write a script which does that on top of the output provided by YARN CLI. On Wed, Nov 27, 2019 at 9:19 AM Manuel Sopena Ballesteros < manuel...@garvan.org.au> wrote: > Dear Hadoop community, > > > > I am learning yarn and would like to find an applicat

RE: Can't Change Retention Period for YARN Log Aggregation

2019-11-24 Thread David M
if that doesn’t solve the issue. From: Prabhu Josephraj Sent: Friday, November 22, 2019 12:13 AM To: David M Cc: user@hadoop.apache.org Subject: Re: Can't Change Retention Period for YARN Log Aggregation The deletion service runs as part of MapReduce JobHistoryServer. Can you try resta

Re: Can't Change Retention Period for YARN Log Aggregation

2019-11-21 Thread Prabhu Josephraj
The deletion service runs as part of MapReduce JobHistoryServer. Can you try restarting it? On Fri, Nov 22, 2019 at 3:42 AM David M wrote: > All, > > > > I have an HDP 2.6.1 cluster where we’ve had > yarn.log-aggregation.retain-seconds set to 30 days for a while, and > everything was working pro

Re: how to use Hadoop echosystem

2019-11-13 Thread Jeff Hubbs
I should have included Scala in that language list. On 11/13/19 10:10 AM, Jeff Hubbs wrote: I'm assuming you mean "Hadoop ecosystem." :) Here's what I know. Hadoop is a collection of daemons (a.k.a. "services") that are typically run across multiple machines to form a Hadoop cluster. Centra

Re: how to use Hadoop echosystem

2019-11-13 Thread Jeff Hubbs
I'm assuming you mean "Hadoop ecosystem." :) Here's what I know. Hadoop is a collection of daemons (a.k.a. "services") that are typically run across multiple machines to form a Hadoop cluster. Central to the whole idea of Hadoop is the presence of an HDFS (Hadoop Distributed File System): a

Re: HDFS du Utility Inconsistencies?

2019-11-08 Thread David M
___ From: Arpit Agarwal Sent: Friday, November 8, 2019 11:41:31 AM To: David M Cc: user@hadoop.apache.org Subject: Re: HDFS du Utility Inconsistencies? Got any snapshots? On Fri, Nov 8, 2019, 09:38 David M mailto:mcginni...@outlook.com>> wrote: All, I’m working on a cluster that i

Re: HDFS du Utility Inconsistencies?

2019-11-08 Thread Arpit Agarwal
Got any snapshots? On Fri, Nov 8, 2019, 09:38 David M wrote: > All, > > > > I’m working on a cluster that is running Hadoop 2.7.3. I have one folder > in particular where the command hdfs dfs -du is giving me strange results. > If I query the folder and ask for a summary, it tells me 10 GB. If I

Re: Understanding the relationship between block size and RPC / IPC length?

2019-11-08 Thread Wei-Chiu Chuang
There are more details in this jira: https://issues.apache.org/jira/browse/HADOOP-16452 Denser DataNodes are common. It is not uncommon to find a DataNode with > 7 > million blocks these days. > With such a high number of blocks, the block report message can exceed the > 64mb limit (defined by ipc

Re: how to get yarn application attempt logs?

2019-11-04 Thread Malcolm McFarland
Hi Manuel; There's a lot of information available via the REST API; check out these docs: http://hadoop.apache.org/docs/r2.7.6/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API I haven't delved too far into those, but you might be able to pull out the app attempts fro

Re: GUI to upload files to HDFs

2019-11-04 Thread Alexander Batyrshin
You can try to upload via namenode web interface http://namenode:port/explorer.html > On 5 Nov 2019, at 02:21, naresh Goud wrote: > > Hi all , > > Do we have any GUI tools to upload data to HDFS from window/Unix machine? > Other than cloudera hue? > > T

RE: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-29 Thread V N, Suchithra (Nokia - IN/Bangalore)
...@flink.apache.org Subject: Re: Flink 1.8.1 HDFS 2.6.5 issue You could also disable the security feature of the Hadoop cluster or upgrade the hadoop version. I'm not sure if this is acceptable for you as it requires more changes. Setting the configuration is the minimum changes I could think of to solve

Re: Exception in thread "main" org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "Token" (Class org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse

2019-10-17 Thread Alex Wang
Hi Prabhu: Thank you for your reply. I checked the environment of our cluster. I looked at the "Hadoop version support matrix" and found that 1.3.x supports Hadoop-2.7.1+ Reference: http://hbase.apache.org/book.html#hadoop Then I looked at the pom file of the hbase source. The had

Re: Exception in thread "main" org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "Token" (Class org.apache.hadoop.yarn.api.records.timeline.TimelineDelegationTokenResponse

2019-10-17 Thread Prabhu Josephraj
Suspect the TimelineClient and ApplicationHistoryServer are using different hadoop libraries. Can you make sure the client uses the same hadoop jars and dependency jars as the ApplicationHistoryServer process. Simple workaround is to disable timeline service for this job. hbase org.apache.hadoop.h

Re: errors submitting a spark job in yarn cluster mode

2019-10-16 Thread Prabhu Josephraj
As per the error message, Spark Assembly Jar is not in the Spark Executor Classpath. *Error: Could not find or load main class org.apache.spark.executor.CoarseGrainedExecutorBackend* 1. Pass Spark Assembly jar using --jars option while using spark-submit 2. Add Spark assembly jar in spark.jars co

Re: CapacityScheduler questions - (AM) preemption

2019-10-15 Thread Lars Francke
Eric, sorry for the slow reply. I'm afraid I'm not 100% sure anymore (this was at a client and I don't have access to the system at the moment). I believe we submitted to foo_ultimo first and then to foo_daily but I've reached out to them to get feedback. Cheers, Lars On Fri, Oct 11, 2019 at 8:

Re: CapacityScheduler questions - (AM) preemption

2019-10-11 Thread epa...@apache.org
> Submit a job to queue 1 which uses 100% of the cluster. > Submit a job to queue 2 which doesn't get allocated because there are not > enough resources for the AM. Lars, can you please correlate the 'queue 1' and 'queue 2' with the config that you provided? That is, based on your configs what i

Re: CapacityScheduler questions - (AM) preemption

2019-10-11 Thread Lars Francke
Sunil, this is the full capacity-scheduler.xml file: http://www.w3.org/2001/XInclude";> yarn.scheduler.capacity.maximum-am-resource-percent 0.2 yarn.scheduler.capacity.maximum-applications 1 yarn.scheduler.capacity.node-locality-del

RE: can't start spark thrift after Configuring YARN container executor

2019-10-10 Thread Manuel Sopena Ballesteros
Thank you very much Prabhu, Deleting the /d0 folder fixed the issue Manuel From: Prabhu Josephraj [mailto:pjos...@cloudera.com] Sent: Thursday, October 10, 2019 6:17 PM To: Manuel Sopena Ballesteros Cc: user@hadoop.apache.org Subject: Re: can't start spark thrift after Configuring

Re: can't start spark thrift after Configuring YARN container executor

2019-10-10 Thread Prabhu Josephraj
As per the error, spark user does not have permission to create directory under NodeManager Local Directory or the existing spark user directory is with stale uid or gid. *Permission denied Can't create directory /d1/hadoop/yarn/local/usercache/spark/appcache/application_1570681803028_0018* 1. Ch

Re: CapacityScheduler questions - (AM) preemption

2019-10-09 Thread Lars Francke
Sunil, thank you for the answer. This is HDP 3.1 based on Hadoop 3.1.1. No preemption defaults were changed I believe. The only thing changed is to enable preemption (monitor.enabled) in Ambari but I can get the full XML for you if that's helpful. I'll get back to you on that. Cheers, Lars On

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
Filed HADOOP-16647 I am not planning to work on this any time soon so if any one is interested feel free to pick it up/supply additional information. On Wed, Oct 9, 2019 at 9:19 AM Wei-Chiu Chuang wrote: > Ok I stand corrected. > > That was fo

Re: CapacityScheduler questions - (AM) preemption

2019-10-09 Thread Sunil Govindan
Hi, Expectation of the scenario explained is more or less correct. Few informations are needed to get more clear picture. 1. hadoop version 2. capacity scheduler xml (to be precise, all preemption related configs which are added) - Sunil On Wed, Oct 9, 2019 at 6:23 PM Lars Francke wrote: > Hi,

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
Ok I stand corrected. That was for OpenSSL 1.1.0 and it might not even work for 1.1.1. The OpenSSL release version doesn't imply backward compatibility. Would you please try it out and let me know if HADOOP-14597 works or not? If not we need to file a jira to track this because OpenSSL 1.1.0 has a

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Wei-Chiu Chuang
See https://issues.apache.org/jira/browse/HADOOP-14597 OpenSSL 1.1.0 is supported with Hadoop 3. We should backport this in Hadoop 2. I don't recall if we ever documented openssl version supported. Would be nice to add that too. On Wed, Oct 9, 2019 at 12:09 AM Gonzalo Gomez wrote: > Hi, any com

Re: Hadoop and OpenSSL 1.1.1

2019-10-09 Thread Gonzalo Gomez
Hi, any comment regarding Hadoop and OpenSSL 1.1.1? On Fri, Oct 4, 2019 at 10:50 AM Gonzalo Gomez wrote: > Hi all, > > > As OpenSSL 1.0.2 EOL is getting closer (end of this year [1]) I tried to > run Hadoop with OpenSSL 1.1.1d, but running the checknative command gives > false for the openssl li

Re: Kubernetes integration for HDFS

2019-10-08 Thread Allan Espinosa
Allan Espinosa , Subject: Re: Kubernetes integration for HDFS ENV JAVA_HOME /usr/lib/jvm/jre-11-openjdk Is Hadoop OK with this version? As far i can see - this version is not compatible https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions -- Alexander Batyrshin ak

Re: Separate Projects

2019-10-07 Thread Elek, Marton
To be more preciously: * Submarine is moving out (AFAIK the proposal is written and board will discuss it) * Ozone is voted to be moved out to a separate __Hadoop__ repository (it's a code split not a project split) Some community members suggested to move out with Ozone, but nobody starte

Re: Kubernetes integration for HDFS

2019-10-05 Thread Alexander Batyrshin
ENV JAVA_HOME /usr/lib/jvm/jre-11-openjdk Is Hadoop OK with this version? As far i can see - this version is not compatible https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions -- Alexander Batyrshin aka bash Biomechanical Artificial Sabotage Humanoid On Fri, 20 Sep 2019 at 08:

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Vinod Kumar Vavilapalli
Done: https://twitter.com/hadoop/status/1176787511865008128. If you have tweetdeck, any of the PMC members can do this. BTW, it looks we haven't published any releases since Nov 2018. Let's get back to doing this going forward! Thanks +Vinod > On Sep 25, 2019, at 2:44 PM, Rohith Sharma K S >

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Updated twitter message: `` Apache Hadoop 3.2.1 is released: https://s.apache.org/96r4h Announcement: https://s.apache.org/jhnpe Overview: https://s.apache.org/tht6a Changes: https://s.apache.org/pd6of Release notes: https://s.apache.org/ta50b Thanks to our community of developers, o

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Updated announcement Hi all, It gives us great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 3.2.1. Apache Hadoop 3.2.1 is the stable release of Apache Hadoop 3.2 line, which includes 493 fixes since Hadoop 3.2.0 release

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Sunil Govindan
Here the link of Overview URL is old. We should ideally use https://hadoop.apache.org/release/3.2.1.html Thanks Sunil On Wed, Sep 25, 2019 at 2:10 PM Rohith Sharma K S wrote: > Can someone help to post this in twitter account? > > Apache Hadoop 3.2.1 is released: https://s.apache.org/mzdb6 > Ov

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Rohith Sharma K S
Can someone help to post this in twitter account? Apache Hadoop 3.2.1 is released: https://s.apache.org/mzdb6 Overview: https://s.apache.org/tht6a Changes: https://s.apache.org/pd6of Release notes: https://s.apache.org/ta50b Thanks to our community of developers, operators, and users. -Rohith Sh

Re: hdfs audit permission denied when running from NN

2019-09-22 Thread Francisco de Freitas
No ideas here? On Wed, 11 Sep 2019 at 17:31, Francisco de Freitas wrote: > HDFS version is 2.8.5 > > I recently updated my log4j.properties file to > > # Log at INFO level to DRFAAUDIT > > log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=INFO,DRFAAUDIT > # Do not forward au

Re: Release plans for Hadoop 2.10.0

2019-09-20 Thread Sean Busbey
Please bring questions about releases that have not yet happened to the common-dev@hadoop mailing list. On Fri, Sep 20, 2019 at 2:15 PM Viral Bajaria wrote: > > All, > > Are we going to see a new release in the 2.x.x ? > > I noticed a bunch of tickets that have been resolved in the last year have

Re: Kubernetes integration for HDFS

2019-09-19 Thread Anu Engineer
That is very interesting. A sub-project of Hadoop called Ozone tests regularly with Kubernetes. If you would like to contribute your work to Hadoop, you can send us a pull request and we can maintain this code as a sample under Ozone K8s samples directory. An advantage with that approach is that th

Re: hdfs namenode fails over frequently due to timeout with zkfc

2019-09-19 Thread Wenqi Ma
Thanks for the helpful advice. I got that "RpcQueueTimeNumOps" : 33612239, "RpcQueueTimeAvgTime" : 7.782384127056349, "RpcProcessingTimeNumOps" : 33612239, "RpcProcessingTimeAvgTime" : 32.94238776763952, It is very close to the timeout time: 45s. I will monitor these value for

Re: hdfs namenode fails over frequently due to timeout with zkfc

2019-09-19 Thread HK
Do you push Namenode JMX metrics to somewhere? Please check RPC avg processing time and RPC queue time avg time. If it is higher than the time out, health monitor request is waiting more time in the RPC queue to get it served. Enabling service RPC will definitely resolve this issue. You can also e

Re: hdfs namenode fails over frequently due to timeout with zkfc

2019-09-19 Thread Wenqi Ma
More information: 1. The balancer is running. And if we stop it, failover would only happen about 2-3 times a day. But, we have to run it since the datanodes usage is like: 14.65% / 78.37% / 83.18% / 23.27% 2. Jvm pause log is not often, and all pauses are less than 2 seconds Wenqi Ma 于2019年9月19日

Re: hdfs namenode fails over frequently due to timeout with zkfc

2019-09-18 Thread Wenqi Ma
Sure I checked that, and it is namenode health monitoring timing out, like: 2019-09-19 09:15:03,823 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at dphadoop20/192.168.1.20:8020 to active state 2019-09-19 10:48:55,898 WARN org.apache.hadoop.ha.HealthMonitor: Tr

Re: hdfs namenode fails over frequently due to timeout with zkfc

2019-09-18 Thread HK
Are you checking ZKFC process logs and jstack? At what stage ZKFC timing out? zk session timing out? or namenode health monitoring timing out? On Thu, Sep 19, 2019 at 9:17 AM Wenqi Ma wrote: > HDFS version is 2.7.7 > > We have 500+ nodes, 230 million files and directories, 270 million blocks,

Re: accessing hdfs cluster through ssh tunnel

2019-09-16 Thread saurabh pratap singh
Hi all So I was not satisfied with the above mentioned approach and tried hadoop socks server config at client end and used ssh with -D option as mentioned by Hariharan Iyer (Thank you for that) and it worked as expected without the need of opening separate ssh tunnels for data nodes. Thanks. On

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread saurabh pratap singh
Thank you all for your help . Solution that worked for me is as follows: I opened ssh tunnel for namenode which ensure that hadoop fs -ls works In order for hadoop fs -put to work (as it was timing out because namenode was returning private ip addresses of datanode which cant be resolved by edge ma

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread Hariharan Iyer
You will have to use a socks proxy (-D option in ssh tunnel). In addition, when invoking hadoop fs command, you will have to add -Dsocks.proxyHost and - Dsocks.proxyPort. Thanks, Hariharan On Thu, 12 Sep 2019, 23:26 saurabh pratap singh, wrote: > Thank you so much for your reply . > I have furt

Re: accessing hdfs cluster through ssh tunnel

2019-09-13 Thread Julien Laurenceau
Hi Hadoop is designed to avoid proxy as it will act as a bottleneck. Namenodes are used to obtain a direct socket client / datanodes that is specific to each job. Le ven. 13 sept. 2019 à 14:21, Tony S. Wu a écrit : > You need connectivity from edge node to the entire cluster, not just > namenode

Re: accessing hdfs cluster through ssh tunnel

2019-09-12 Thread saurabh pratap singh
Thank you so much for your reply . I have further question there are some blogs which talks about some similar setup like this one https://github.com/vkovalchuk/hadoop-2.6.0-windows/wiki/How-to-access-HDFS-behind-firewall-using-SOCKS-proxy I am just curious how does that works. On Thu, Sep 12,

Re: accessing hdfs cluster through ssh tunnel

2019-09-12 Thread Tony S. Wu
You need connectivity from edge node to the entire cluster, not just namenode. Your topology, unfortunately, probably won’t work too well. A proper VPN / IPSec tunnel might be a better idea. On Thu, Sep 12, 2019 at 12:04 AM saurabh pratap singh < saurabh.cs...@gmail.com> wrote: > Hadoop version :

Re: NegativeArraySizeException during map segment merge

2019-09-04 Thread Prabhu Josephraj
1. Looking at IFile$Reader#nextRawValue, not sure why we create valBytes array of size 2 * currentValueLength even though it tries to read data of currentValueLength size. If there is no reason, this can be fixed which will fix the problem. public void nextRawValue(DataInputBuffer value) throws IO

Re: Is shortcircuit-read (SCR) really fast?

2019-09-04 Thread Wei-Chiu Chuang
Hi Daegyu, let's move this discussion to the user group, so that any one else can comment on this. I obviously don't have the best answers to the questions. But these are great questions. Re: benchmarks for SCR: I believe yes. In fact, I found a benchmark running Accumulo and HBase on

Re: Why TEMPORARY block join setup pipeline process?

2019-09-04 Thread Kang Minwoo
I found same issue. https://issues.apache.org/jira/browse/HDFS-10714 However this issue seems like holding. Sent from my iPhone 2019. 9. 4. 18:06, Kang Minwoo mailto:minwoo.k...@outlook.com>> 작성: Hello, Users. When Hadoop Cluster had a heavy write workload, Sometime DFS Client receives a Clo

Re: Blocks Report Dump

2019-09-03 Thread HK
Alexander, I think I did not state it clearly, Actually we need total block report which says, which block located on which datanode. On Tue, Sep 3, 2019 at 7:31 PM Alexander Batyrshin <0x62...@gmail.com> wrote: > JMX metrics: > > http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-c

Re: Blocks Report Dump

2019-09-03 Thread Alexander Batyrshin
JMX metrics: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem Try to get them via http://namenode:$dfs.namenode.http-address/jmx > On 3 Sep 2019, at

Re: Blocks Report Dump

2019-09-03 Thread Amith sha
@alex I think no JMX metrics available and also regarding fsck that will be a costlier operation in a cluster with 250Million fsobjects. Thanks & Regards Amithsha On Tue, Sep 3, 2019 at 6:46 PM Alexander Batyrshin <0x62...@gmail.com> wrote: > 1) JMX metrics > 2) hdfs fsck / > > > On 3 Sep 2019

Re: Not able to disable TLSv1, TLSv1.1 on Apache Yarn

2019-09-03 Thread Anton Puzanov
hadoop.ssl.enabled.protocols=TLSv1.2 is already set in core-site.xml This is the resource manager in my case On Tue, Sep 3, 2019 at 4:01 PM bappa kon wrote: > Thats strange, I'm assuming your resource manager running on 8190 port as > by default it is timeline server port in HDP. > > Sorry but I

Re: Blocks Report Dump

2019-09-03 Thread Alexander Batyrshin
1) JMX metrics 2) hdfs fsck / > On 3 Sep 2019, at 13:23, HK wrote: > > Hi All, > Is there a way to dump the block report from namenode ? > > -- Hema Kumar - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For addi

Re: Not able to disable TLSv1, TLSv1.1 on Apache Yarn

2019-09-03 Thread bappa kon
Thats strange, I'm assuming your resource manager running on 8190 port as by default it is timeline server port in HDP. Sorry but I have last thing to test. Can you add below is core-site xml file and restart all hadoop processes? hadoop.ssl.enabled.protocols=TLSv1.2 Thanks On Tue, 3 Sep 2019,

Re: Not able to disable TLSv1, TLSv1.1 on Apache Yarn

2019-09-03 Thread Anton Puzanov
I have tried it right now and TLSv1 is still available. Running the openssl command shows the server certificate. I check for the protocols using nmap (-sV) which shows support for TLSv1, TLSv1.1, TLSv1.2 On Tue, Sep 3, 2019 at 1:41 PM bappa kon wrote: > Can you share the output of below command

Re: Not able to disable TLSv1, TLSv1.1 on Apache Yarn

2019-09-03 Thread bappa kon
Can you share the output of below command?. openssl s_client -connect hostname:8190 -tls1 Also, have you already tried below in custom yarn-site xml? ssl.exclude.protocol=TLSv1,TLSv1.1 Thanks On Mon, 2 Sep 2019, 20:22 Anton Puzanov, wrote: > Hi, > > I have been requested to disable TLSv1 and

Re: Docker container executor is failing

2019-09-02 Thread Yen-Onn Hiu
The problem is resided on the docker version that installed. Apparently downgrading docker-ce to docker-1.13 is helping. On Fri, Aug 30, 2019 at 5:23 PM Yen-Onn Hiu wrote: > Yes, i have tried. That have solved the untrusted image from execution. > However still hitting the exit code 127 error. M

Re: Is shortcircuit-read (SCR) really fast?

2019-08-30 Thread Wei-Chiu Chuang
Interesting benchmark. Thank you, Daegyu. Can you try a larger file too? Like 128mb or 1gb? HDFS is not optimized for smaller files. What did you use for benchmark? Daegyu Han 於 2019年8月29日 週四,下午11:40寫道: > Hi all, > > Is ShortCircuit read faster than legacy read which goes through data nodes? > >

Re: Docker container executor is failing

2019-08-30 Thread Yen-Onn Hiu
Yes, i have tried. That have solved the untrusted image from execution. However still hitting the exit code 127 error. Mind to have second thoughts on this? On Fri, 30 Aug 2019 at 5:19 PM, Prabhu Josephraj wrote: > Can you test with adding local into docker.trusted.registries in > container-exec

Re: Docker container executor is failing

2019-08-30 Thread Prabhu Josephraj
Can you test with adding local into docker.trusted.registries in container-executor.cfg. Fyi https://community.cloudera.com/t5/Support-Questions/Not-able-to-run-docker-container-on-yarn-even-after/m-p/224259 On Fri, Aug 30, 2019 at 2:07 PM Yen-Onn Hiu wrote: > hi all, > > I have a bash script t

Re: please care and vote for Chinese people under cruel autocracy of CCP, great thanks!

2019-08-28 Thread Yi Du
All, I am sickened to see in a collaborative platform for research purposes which is used for political propaganda and would deeply regret if people mix study/research with politics. This email listed the so-called 'malicious and evil deeds' are pure words spreading hate to one party: 1. Don't s

Re: compiling hadoop native libs with icc instead of gcc

2019-08-28 Thread Matt Foley
It sounds like you need to change the C compiler, not for maven, but for cmake. This FAQ entry addresses how to do so: https://gitlab.kitware.com/cmake/community/wikis/FAQ#how-do-i-use-a-different-compiler It gives three methods. The third is marked “avoid”. Why is explained in the f

Re: NameNode recovering

2019-08-27 Thread HK
If it OK to put your active namenode to safe mode, there is a way. - Put your active namenode to safe mode - Checkpoint the name space - Bootstrap Standby namenode, and start. I think hadoop 3 allows you to use checkpoint node, it will help you to avoid this kind of issues. On Wed, Aug

Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Susheel Kumar Gadalay
Please remove user@hadoop from this mail list. It is specific to dev team only. Thanks On Friday, August 23, 2019, Wangda Tan wrote: > Sounds good, let me make the changes to do simply bi-weekly then. > I will update it tonight if possible. > > Best, > Wangda > On Fri, Aug 23, 2019 at 1:50 AM Ma

Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Susheel Kumar Gadalay
Please remove user@hadoop from this mail list. It is specific to dev team only. Thanks SK On Friday, August 23, 2019, Wangda Tan wrote: > Sounds good, let me make the changes to do simply bi-weekly then. > I will update it tonight if possible. > > Best, > Wangda > On Fri, Aug 23, 2019 at 1:50 AM

Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Wangda Tan
Sounds good, let me make the changes to do simply bi-weekly then. I will update it tonight if possible. Best, Wangda On Fri, Aug 23, 2019 at 1:50 AM Matt Foley wrote: > Wangda and Eric, > We can express the intent, I think, by scheduling two recurring meetings: > - monthly, on the 2nd Wednesda

Re: Hadoop storage community online sync

2019-08-22 Thread Matt Foley
+1 for publishing notes. Thanks! On Aug 21, 2019, at 4:16 PM, Aaron Fabbri wrote: Thank you Wei-Chiu for organizing this and sending out notes! On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang mailto:weic...@apache.org>> wrote: > We had a great turnout today, thanks to Konstantin for leading t

Re: Hadoop Community Sync Up Schedule

2019-08-22 Thread Matt Foley
Wangda and Eric, We can express the intent, I think, by scheduling two recurring meetings: - monthly, on the 2nd Wednesday, and - monthly, on the 4th Wednesday. This is pretty easy to understand, and not too onerous to maintain. But I’m okay with simple bi-weekly too. I’m neutral on 10 vs 11am, P

Re: Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer

2019-08-21 Thread alex noure
ir permissions too > > > > > > *From:* alex noure [mailto:hello123...@gmail.com] > *Sent:* 20 August 2019 16:50 > *To:* Prabhu Josephraj > *Cc:* user@hadoop.apache.org; mapreduce-iss...@hadoop.apache.org > *Subject:* Re: Could not find or load main class > org.apache.h

Re: Hadoop storage community online sync

2019-08-21 Thread Aaron Fabbri
Thank you Wei-Chiu for organizing this and sending out notes! On Wed, Aug 21, 2019 at 1:10 PM Wei-Chiu Chuang wrote: > We had a great turnout today, thanks to Konstantin for leading the > discussion of the NameNode Fine-Grained Locking proposal. > > There were at least 16 participants joined the

Re: Hadoop storage community online sync

2019-08-21 Thread Wei-Chiu Chuang
We had a great turnout today, thanks to Konstantin for leading the discussion of the NameNode Fine-Grained Locking proposal. There were at least 16 participants joined the call. Today's summary can be found here: https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit

Re: Hadoop Community Sync Up Schedule

2019-08-21 Thread Manikandan R
>> For folks in other US time zones: how about 11am PDT, is it better or 10am PDT will be better? +1 for 10 am PDT On Wed, Aug 21, 2019 at 3:31 PM Wangda Tan wrote: > For folks in other US time zones: how about 11am PDT, is it better or 10am > PDT will be better? I will be fine with both. > > H

Re: Hadoop Community Sync Up Schedule

2019-08-21 Thread epa...@apache.org
Let's go with bi-weekly (every 2 weeks). Sometimes this gives us 3 sync-ups in one month, which I think is fine. -Eric Payne On Wednesday, August 21, 2019, 5:01:52 AM CDT, Wangda Tan wrote: > > For folks in other US time zones: how about 11am PDT, is it better or 10am > PDT will be better? I

Re: Hadoop Community Sync Up Schedule

2019-08-21 Thread Wangda Tan
For folks in other US time zones: how about 11am PDT, is it better or 10am PDT will be better? I will be fine with both. Hi Matt, Thanks for mentioning this issue, this is the exactly issue I saw 🤣. Basically there’re two options: - a. weekly, bi-weekly (for odd/even week) and every four months

Re: Hadoop Community Sync Up Schedule

2019-08-20 Thread Matt Foley
Hi Wangda, thanks for this. A question about the schedule correction: > > 1) In the proposal, repeats are not properly. (I used bi-weekly instead of > 2nd/4th week as repeat frequency). I'd like to fix the frequency on Thu and > it will take effect starting next week. I understand that “bi-weekly

Re: What is the best way to analyze io latency in hdfs?

2019-08-20 Thread Julien Laurenceau
Hi, On Linux you can monitor système call of any process using: strace -p PIDofHDFSdatanode It can be very verbose but the information will be there. Did you try metrics available in ambari or cloudera manager ? Regards Le mar. 20 août 2019 à 02:47, Daegyu Han a écrit : > Hi all, > > I'm cur

Re: [DISCUSS] move storage community online sync schedule

2019-08-20 Thread Wei-Chiu Chuang
I don't see an objection so let's move to 10AM US Pacific Daylight Saving Time (UTC-7) starting this Wednesday. I'll update the relevant information. Further reminder, Konstantin will talk about NameNode Fine-Grained Locking. Please let me know if you have any feedbacks and ideas to share. Is the

Re: Hadoop Community Sync Up Schedule

2019-08-20 Thread Wei-Chiu Chuang
+1 On Mon, Aug 19, 2019 at 8:32 PM Wangda Tan wrote: > Hi folks, > > We have run community sync up for 1.5 months. I spoke to folks offline and > got some feedback. Here's a summary of what I've observed from sync ups and > talked to organizers. > > Following sync ups have very good participants

Re: Hadoop Community Sync Up Schedule

2019-08-20 Thread epa...@apache.org
Hi Wangda, Thank you for continuing to keep us moving forward and refining these vital sync-ups. > 3) Update the US [YARN/MapReduce] sync up time from 9AM to 10AM PDT. That puts it at noon central time, which is during our lunch hour. However, I am +1 for this if we are able to allow greater pa

RE: Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer

2019-08-20 Thread Bibinchundatt
...@hadoop.apache.org Subject: Re: Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer Hi Prabhu Thank you for your reply. I did the following: chmod 777 hadoop-yarn-server-nodemanager-.jar Restarted NodeManager, resourcemanager Still

Re: Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer

2019-08-20 Thread alex noure
Hi Prabhu Thank you for your reply. I did the following: chmod 777 hadoop-yarn-server-nodemanager-.jar Restarted NodeManager, resourcemanager Still reporting an error Could not find or load main class org.apache.hadoop.yarn.server.nodemanager.containermanager.loca lizer.ContainerLocalizer P

<    1   2   3   4   5   6   7   8   9   10   >