Re: Unable to append to a file in HDFS

2017-10-30 Thread Ravi Prakash
Hi Tarik! The lease is owned by a client. If you launch 2 client programs, they will be viewed as separate (even though the user is same). Are you sure you closed the file when you first wrote it? Did the client program which wrote the file, exit cleanly? In any case, after the namenode lease hard

Re: Vulnerabilities to UserGroupInformation / credentials in a Spark Cluster

2017-10-30 Thread Blaze Spinnaker
I looked at this a bit more and I see a container_tokens file in spark directory. Does this contain the credentials where are added by addCredentials? Is this file accessible to the spark executors? It looks like just a clear text protobuf file. https://github.com/apache/hadoop/blob/82cb2a649

Re:

2017-10-30 Thread Ravi Prakash
And one of the good things about open-source projects like Hadoop, you can read all about why :-) : https://issues.apache.org/jira/browse/HADOOP-4952 Enjoy! Ravi On Mon, Oct 30, 2017 at 11:54 AM, Ravi Prakash wrote: > Hi Doris! > > FileContext was created to overcome some of the limitations tha

Re:

2017-10-30 Thread Ravi Prakash
Hi Doris! FileContext was created to overcome some of the limitations that we learned FileSystem had after a lot of experience. Unfortunately, a lot of code (i'm guessing maybe even the majority) still uses FileSystem. I suspect FileContext is probably the interface you want to use. HTH, Ravi O

Re. ERROR: oz is not COMMAND nor fully qualified CLASSNAME

2017-10-17 Thread Nandakumar Vadivelu
Hi Margus, The commit (code version) you are using for building ozone is very old (Tue Nov 22 17:41:13 2016), can you do a “git pull” on HDFS-7240 branch and take a new build? The documentation you are referring is also very old one, currently the documentation work is happening as part of HDF

Re: How to print values in console while running MapReduce application

2017-10-08 Thread Naganarasimha Garla
If it is for debugging purpose would advise to try Custom MR Counters ! Though you will not get it in console you will be get it from web ui for running job too. On Sun, Oct 8, 2017 at 9:24 PM, Harsh J wrote: > Consider running your job in the local mode (set config ' > mapreduce.framework.name'

Re: How to print values in console while running MapReduce application

2017-10-08 Thread Harsh J
Consider running your job in the local mode (set config ' mapreduce.framework.name' to 'local'). Otherwise, rely on the log viewer from the (Job) History Server to check the console prints in each task (under the stdout or stderr sections). On Thu, 5 Oct 2017 at 05:15 Tanvir Rahman wrote: > Hell

Re: How to print values in console while running MapReduce application

2017-10-04 Thread Tanvir Rahman
Hello Demian, Thanks for the answer. 1. I am using Java for writing the MapReduce Application. Can you tell me how to do it in JAVA? 2. In the mapper or reducer function, which command did you use to write the output? Is it going to write it in Log folder? I have multiple nodes and

Re: How to print values in console while running MapReduce application

2017-10-04 Thread Demian Kurejwowski
i did the same tutorial, i think they only way  is doing it  outside hadoop.in the command line:cat folder/* | python mapper.py | sort | python reducer El Miércoles, 4 de octubre, 2017 16:20:31, Tanvir Rahman escribió: Hello,I have a small cluster and I am running MapReduce WordCount a

Re: How to print values in console while running MapReduce application

2017-10-04 Thread Sultan Alamro
Hi, The easiest way is to open a new window and display the log file as follow tail -f /path/to/log/file.log Best, Sultan > On Oct 4, 2017, at 5:20 PM, Tanvir Rahman wrote: > > Hello, > I have a small cluster and I am running MapReduce WordCount application in > it. > I want to print some va

Re: Will Backup Node download image and edits log from NameNode?

2017-09-30 Thread Gurmukh Singh
your observation is correct. backup node will also download. If you look at the journey/evolution of hadoop, we had primary, backup only, checkpointing node and then a generic secondary node. checking node will do the merge of fsimage and edits On 25/9/17 5:57 pm, Chang.Wu wrote: From the

Re: how to set a rpc timeout on yarn application.

2017-09-30 Thread Gurmukh Singh
Hi Can you explain me the job a bit, there are few rpc timeout like at datanode level, mapper timeouts etc On 28/9/17 1:47 pm, Demon King wrote: Hi,      We have finished a yarn application and deploy it to hadoop 2.6.0 cluster. But if one machine in cluster is down. Our application will h

Re: hadoop questions for a begginer

2017-09-30 Thread Gurmukh Singh
Well, in actual job the input will be a file. so, instead of: echo "bla ble bli bla" | python mapper.py | sort -k1,1 | python reducer.py you will have: cat file.txt | python mapper.py | sort -k1,1 | python reducer.py The file has to be on HDFS (keeping simple, it can be other filesystems), t

Re: Inter-cluster Communication

2017-09-27 Thread Madhav A
Rakesh, What sort of communication are you looking for between the clusters? I mean, Is it * At the data node level? * VPC inter-communication between 2 clusters ? * Data replication via custom tools? More details might better help in understanding what you're trying to accomplish. -Madhav On

Re: Hadoop "managed" setup basic question (Ambari, CDH?)

2017-09-26 Thread Marton, Elek
If you would like to do it in a more dynamic way you can also you service registry/key-value stores. For example the configuration could be stored in Consul and the servers (namenode, datanode) could be started with consul-template (https://github.com/hashicorp/consul-template) In case of

Re: Hadoop "managed" setup basic question (Ambari, CDH?)

2017-09-22 Thread Sanel Zukan
Hi, For this amount of nodes, I'd go with automation tools like Ansible[1]/Puppet[2]/Rex[3]. They can install necessary packages, setup /etc/hosts and make per-node settings. Ansibles has a nice playbook (https://github.com/analytically/hadoop-ansible) you can start with and Puppet isn't short ei

Re: Is Hadoop validating the checksum when reading only a part of a file?

2017-09-20 Thread Ralph Soika
Thanks a lot for your answer. This makes it now clear to me and I expected that hadoop work in this way. === Ralph On 20.09.2017 07:57, Harsh J wrote: Yes, checksum match is checked for every form of read (unless explicitly disabled). By default, a checksum is generated and stored for every

Re: Is Hadoop validating the checksum when reading only a part of a file?

2017-09-19 Thread Harsh J
Yes, checksum match is checked for every form of read (unless explicitly disabled). By default, a checksum is generated and stored for every 512 bytes of data (io.bytes.per.checksum), so only the relevant parts are checked vs. the whole file when doing a partial read. On Mon, 18 Sep 2017 at 19:23

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-13 Thread Ravi Prakash
Hi Kevin! The ApplicationMaster doesn't really need any more configuration I think. Here's something to try out. Launch a very long mapreduce job: # A sleep job with 1 mapper and 1 reducer. (All the mapper and reducer do is sleep for the duration specified in -mt and -rt) yarn jar $HADOOP_HOME/s

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-12 Thread Kevin Buckley
On 9 September 2017 at 05:17, Ravi Prakash wrote: > I'm not sure my reply will be entirely helpful, but here goes. It sheds more light on things than I previously understood, Ravi, so cheers > The ResourceManager either proxies your request to the ApplicationMaster (if > the application is runn

Re: Question about Resource Manager Rest APIs

2017-09-12 Thread Sunil G
Hi Jason, All data fetched from ResourceManager such as list of apps or reports etc are taken at current time (not cached). Do you expect some other data ? - Sunil On Tue, Sep 12, 2017 at 8:09 PM Xu,Jason wrote: > Hi all, > > > > I am trying to get information about the cluster via Resource M

Re: Apache ambari

2017-09-08 Thread Ravi Prakash
Hi Sidharth! The question seems relevant to the Ambari list : https://ambari.apache.org/mail-lists.html Cheers Ravi On Fri, Sep 8, 2017 at 1:15 AM, sidharth kumar wrote: > Hi, > > Apache ambari is open source. So,can we setup Apache ambari to manage > existing Apache Hadoop cluster ? > > Warm

Re: Hadoop 2.8.0: Job console output suggesting non-existent rmserver 8088:proxy URI

2017-09-08 Thread Ravi Prakash
Hi Kevin! I'm not sure my reply will be entirely helpful, but here goes. The ResourceManager either proxies your request to the ApplicationMaster (if the application is running), or (once the application is finished) serves it itself if the job is in the "cache" (usually the last 1 applicatio

Re: When is an hdfs-* service restart required?

2017-09-07 Thread Ravi Prakash
Hi Kellen! The first part of the configuration is a good indication of which service you need to restart. Unfortunately the only way to be completely sure is to read the codez. e.g. most hdfs configuration is mapped to variables in DFSConfigKeys $ find . -name *.java | grep -v test | xargs grep "

Re: When is an hdfs-* service restart required?

2017-09-07 Thread Mingliang Liu
Restarting datanode(s) only is OK in this case. Thanks, > On Sep 7, 2017, at 10:46 AM, Kellen Arb wrote: > > Hello, > > I have a seemingly simple question, to which I can't find a clear answer. > > Which services/node-types must be restarted for each of the configuration > properties? For exa

Re: Sqoop and kerberos ldap hadoop authentication

2017-09-07 Thread Wei-Chiu Chuang
Hi, The message "User xxx not found" feels more like group mapping error. Do you have the relevant logs? Integrating AD with Hadoop can be non-trivial, and Cloudera's general recommendation is to use third party authentication integrator like SSSD or Centrify, instead of using LdapGroupsMapping.

Re: Sqoop and kerberos ldap hadoop authentication

2017-09-07 Thread Rams Venkatesh
Yes it works. However this doesn't work with Microsoft SQL server Sent from my iPhone > On 7 Sep 2017, at 10:09, dna_29a wrote: > > Hi, > I want to run sqoop jobs under kerberos authentication. If I have a ticket > for local Kerberos user (local KDC and user exists as linux user on each > hos

Re: HDFS: Confused about "immutability" wrt overwrites

2017-09-07 Thread Philippe Kernévez
Hi, Immutability is about rewriting a file (random access). That is massively used by databases for example. On HDFS you can only append new data to file. HDFS have permission like a Posix File System, so you can remove the 'w' permisson on the file if you want to prevent deletion/overwrite. You

RE: Is Hadoop basically not suitable for a photo archive?

2017-09-06 Thread Zheng, Kai
Looks like HBase MOB should be mentioned, since the feature was definitely introduced with photo files/objects in mind. Regards, Kai From: Grant Overby [mailto:grant.ove...@gmail.com] Sent: Thursday, September 07, 2017 3:05 AM To: Ralph Soika Cc: user@hadoop.apache.org Subject: Re: Is Hadoop

Re: Is Hadoop basically not suitable for a photo archive?

2017-09-06 Thread Grant Overby
I'm late to the party, and this isn't a hadoop solution, but apparently Cassandra is pretty good at this. https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593 On Wed, Sep 6, 2017 at 2:48 PM, Ralph Soika wrote: > Hi > > I want to thank you

Re: Is Hadoop basically not suitable for a photo archive?

2017-09-06 Thread Ralph Soika
Hi I want to thank you all for your answers and your good ideas how to solve the hadoop "small-file-problem". Now I would like to briefly summarize your answers and suggested solutions. First of all I describe once again my general use case: * An external enterprise application need to sto

RE: Is Hadoop basically not suitable for a photo archive?

2017-09-06 Thread Muhamad Dimas Adiputro
I think mapR-fs is your solution. From: Anu Engineer [mailto:aengin...@hortonworks.com] Sent: Tuesday, September 05, 2017 10:33 PM To: Hayati Gonultas; Alexey Eremihin; Uwe Geercken Cc: Ralph Soika; user@hadoop.apache.org Subject: Re: Is Hadoop basically not suitable for a photo archive? Please

Re: Is Hadoop basically not suitable for a photo archive?

2017-09-05 Thread Anu Engineer
mixs.com>>, "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" mailto:user@hadoop.apache.org>> Subject: Re: Re: Is Hadoop basically not suitable for a photo archive? I would recommend an object store such as openstack swift as another option. On Mon, Sep 4, 2017 at 1:09 PM Uw

Re: Re: Is Hadoop basically not suitable for a photo archive?

2017-09-04 Thread daemeon reiydelle
sday, September 05, 2017 6:06 AM > *To:* Alexey Eremihin ; Uwe Geercken < > uwe.geerc...@web.de> > *Cc:* Ralph Soika ; user@hadoop.apache.org > *Subject:* Re: Re: Is Hadoop basically not suitable for a photo archive? > > > > I would recommend an object store such as opens

RE: Re: Is Hadoop basically not suitable for a photo archive?

2017-09-04 Thread Zheng, Kai
@hadoop.apache.org Subject: Re: Re: Is Hadoop basically not suitable for a photo archive? I would recommend an object store such as openstack swift as another option. On Mon, Sep 4, 2017 at 1:09 PM Uwe Geercken mailto:uwe.geerc...@web.de>> wrote: just my two cents: Maybe you can use hadoop for s

Re: Re: Is Hadoop basically not suitable for a photo archive?

2017-09-04 Thread Hayati Gonultas
y to go for. > > Cheers, > > Uwe > > *Gesendet:* Montag, 04. September 2017 um 21:32 Uhr > *Von:* "Alexey Eremihin" > *An:* "Ralph Soika" > *Cc:* "user@hadoop.apache.org" > *Betreff:* Re: Is Hadoop basically not suitable for a photo archi

Aw: Re: Is Hadoop basically not suitable for a photo archive?

2017-09-04 Thread Uwe Geercken
time span.   Yes it would be a duplication, but maybe - without knowing all the details - that would be acceptable and and easy way to go for.   Cheers,   Uwe   Gesendet: Montag, 04. September 2017 um 21:32 Uhr Von: "Alexey Eremihin" An: "Ralph Soika" Cc: "user@hadoo

Re: Is Hadoop basically not suitable for a photo archive?

2017-09-04 Thread Alexey Eremihin
Hi Ralph, In general Hadoop is able to store such data. And even Har archives can be used with conjunction with WebHDFS (by passing offset and limit attributes). What are your reading requirements? FS meta data are not distributed and reading the data is limited by the HDFS NameNode server performa

Re: Region assignment on restart

2017-09-02 Thread Rob Verkuylen
Sorry this was meant for hbase. Copy/paste error. Will post there. On Sat, Sep 2, 2017 at 10:10 AM, Rob Verkuylen wrote: > On CDH5.12 with HBase 1.2, I'm experiencing an issue I thought was long > solved. The regions are all assigned to a single regionserver on a restart > of hbase though cloude

Re: Mapreduce example from library isuue

2017-09-01 Thread Atul Rajan
Hello Akira, yes thanks for the solution i checked some classpath was missing from yarn and mapred site. adding those files it resolved the issue and mapreduce ran smoothly. thanks thanks for the article Thanks and Regards Atul Rajan -Sent from my iPhone On 02-Sep-2017, at 1:17 AM, Akira Aj

Re: Representing hadoop metrics on ganglia web interface

2017-09-01 Thread Akira Ajisaka
Hi Nishant, Multicast is used to communicate between Ganglia daemons by default and it is banned in AWS EC2. Would you try unicast setting? Regards, Akira On 2017/08/04 12:37, Nishant Verma wrote: Hello We are supposed to collect hadoop metrics and see the cluster health and performance. I

Re: Prime cause of NotEnoughReplicasException

2017-09-01 Thread Akira Ajisaka
Hi Nishant, The debug message shows there are not enough racks configured to satisfy the rack awareness. http://hadoop.apache.org/docs/r3.0.0-alpha4/hadoop-project-dist/hadoop-common/RackAwareness.html If you don't need to place replicas in different racks, you can simply ignore the debug mess

Re: spark on yarn error -- Please help

2017-09-01 Thread Akira Ajisaka
Hi sidharth, Would you ask Spark related question to the user mailing list of Apache Spark? https://spark.apache.org/community.html Regards, Akira On 2017/08/28 11:49, sidharth kumar wrote: Hi, I have configured apace spark over yarn. I am able to run map reduce job successfully but spark-sh

Re: Mapreduce example from library isuue

2017-09-01 Thread Akira Ajisaka
Hi Atul, Have you added HADOOP_MAPRED_HOME to yarn.nodemanager.env-whitelist in yarn-site.xml? The document may help: http://hadoop.apache.org/docs/r3.0.0-alpha4/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node Regards, Akira On 2017/08/29 17:45, Atul Rajan wrote: H

Re: YARN - How is a node for a container determined?

2017-08-29 Thread Grant Overby
Most of the applications are twill apps and are some what long running, but not perpetual, a few hours to a day. Many of the apps (say about half) have a lot of idle time. These apps come from across the enterprise, Idk why they're idle. There are also a few MR, TEZ, and Spark apps in the mix. If

Re: unsubscribe

2017-08-29 Thread Ravi Prakash
Hi Corne! Please send an email to user-unsubscr...@hadoop.apache.org as mentioned on https://hadoop.apache.org/mailing_lists.html Thanks On Sun, Aug 27, 2017 at 10:25 PM, Corne Van Rensburg wrote: > [image: Softsure] > > unsubscribe > > > > *Corne Van RensburgManaging Director Softsure* > [ima

Re:

2017-08-29 Thread Ravi Prakash
Hi Dominique, Please send an email to user-unsubscr...@hadoop.apache.org as mentioned on https://hadoop.apache.org/mailing_lists.html Thanks Ravi 2017-08-26 10:49 GMT-07:00 Dominique Rozenberg : > unsubscribe > > > > > > [image: cid:image001.jpg@01D10A65.E830C520] > > *דומיניק רוזנברג*, מנהלת פ

Re: Recommendation for Resourcemanager GC configuration

2017-08-29 Thread Ravuri, Venkata Puneet
uneet" , "common-u...@hadoop.apache.org" Subject: Re: Recommendation for Resourcemanager GC configuration Hi Puneet, Along with the heap dump details, I would also like to know the version of the Hadoop-Yarn being used, size of the cluster, all Memory configurations, and JRE version.

Re: Recommendation for Resourcemanager GC configuration

2017-08-29 Thread Ravuri, Venkata Puneet
Hi Vinod, The heap size is 40GB and NewRatio is set to 3. We have max completed applications set to 10. Regards, Puneet From: Vinod Kumar Vavilapalli Date: Wednesday, August 23, 2017 at 5:47 PM To: "Ravuri, Venkata Puneet" Cc: "common-u...@hadoop.apache.org" Subject

Re: File copy from local to hdfs error

2017-08-29 Thread Atul Rajan
Hello Istavan, Thanks for the help it worked finally There was firewall issue solving that part made the hdfs work and take entry from local file system. Thanks and Regards Atul Rajan -Sent from my iPhone On 28-Aug-2017, at 11:20 PM, István Fajth wrote: Hi Atul, as suggested before, set th

Re: YARN - How is a node for a container determined?

2017-08-29 Thread Philippe Kernévez
" densely pack containers on fewer nodes" : quite surprising, +1 with Daemon You have Yarn labels that can be used for that. Classical example are the need of specific hardware fir some processing. https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Regards, Philippe

Re: YARN - How is a node for a container determined?

2017-08-28 Thread daemeon reiydelle
Perhaps you can go into a bit more detail? Especially for e.g. a map job (or reduce in mapR), this seems like a major antipattern. *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872* On Mon, Aug 28, 2017 at 3:37 PM, Grant Overby wrote: > When YARN receives a request

Re: File copy from local to hdfs error

2017-08-28 Thread István Fajth
Hi Atul, as suggested before, set the blockmanager log level to debug, and check logs for reasons. You can either set the whole NameNode log to DEBUG level, and see for the messages logged by the BlockManager. Around the INFO level message in the NameNode log similar to the message you see now in

Re: File copy from local to hdfs error

2017-08-28 Thread Atul Rajan
DataNodes were having issue earlier, i added the ports required in the iptables after that data logs are running but HDFS not able to distribute the file and make blocks. and any file copied on the cluster is throwing this error. On 28 August 2017 at 21:46, István Fajth wrote: > Hi Atul, > > you

Re: File copy from local to hdfs error

2017-08-28 Thread István Fajth
Hi Atul, you can check NameNode logs if the DataNodes were in service or there were issues with them. As well you can check for BlockManager's debug level logs for more exact reasons if you can reproduce the issue at will. Istvan On Aug 28, 2017 17:56, "Atul Rajan" wrote: Hello Sir, when i am

Re: UNSUBSCRIBE

2017-08-28 Thread Shan Huasong
Please UNSUBSCRIBE too! On Mon, Aug 28, 2017 at 1:25 AM, Corne Van Rensburg wrote: > [image: Softsure] > > UNSUBSCRIBE > > > > *Corne Van RensburgManaging Director Softsure* > [image: Tel] 044 805 3746 > [image: Fax] > [image: Email] co...@softsure.co.za > *Softsure (Pty) Ltd | Registration No.

RE: Namenode not able to come out of SAFEMODE

2017-08-27 Thread omprakash
reducing the total block count as a work around to the problem. Regards Om Prakash From: Gurmukh Singh [mailto:gurmukh.dhil...@yahoo.com] Sent: 25 August 2017 17:22 To: omprakash ; brahmareddy.batt...@huawei.com Cc: 'surendra lilhore' ; user@hadoop.apache.org Subject: Re: Namenod

Re: Namenode not able to come out of SAFEMODE

2017-08-25 Thread Gurmukh Singh
*Subject:* RE: Namenode not able to come out of SAFEMODE Hi Omprakash, The reported blocks 0 needs additional 6132675 blocks to reach the threshold 0.9990 of total blocks 6138814. The number of *live datanodes 0* has reached the minimum number 0. ---> By seeing this message looks l

Re: Data streamer java exception

2017-08-24 Thread surendra lilhore
Hi, I suggest you use shell command for accessing cluster info instead of curl command. For hdfs shell command you can refer https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html For yarn shell command you can refer https://hadoop.apache.org/docs/current/had

Re: Data streamer java exception

2017-08-24 Thread Atul Rajan
Hello Team, I come to resolution of this issue by allowing the iproute table entry for the specific ports used for namenode as well as datanode. now hdfs is running and cluster is running. thanks a lot for the suggestion. now i have another issue of interface as i am running console view of RHEL

RE: Data streamer java exception

2017-08-24 Thread surendra lilhore
Hi Atul, Please can you share the datanode exception logs ?. Check if namenode and datanode hostname mapping is proper or not in /etc/hosts. Put operation is failing because datanode’s are not connected to the namenode. -Surendra From: Atul Rajan [mailto:atul.raja...@gmail.com] Sent: 24 Augus

Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Vinod Kumar Vavilapalli
What is the ResourceManager JVM’s heap size? What is the value for the configuration yarn.resourcemanager.max-completed-applications? +Vinod > On Aug 23, 2017, at 9:23 AM, Ravuri, Venkata Puneet wrote: > > Hello, > > I wanted to know if there is any recommendation for ResourceManager GC > s

Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Naganarasimha Garla
Hi Puneet, Along with the heap dump details, I would also like to know the version of the Hadoop-Yarn being used, size of the cluster, all Memory configurations, and JRE version. Also if possible can you share the rational behind the choice for Parallel GC collector over others (CMS or G1) ? Rega

Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Ravi Prakash
Hi Puneet Can you take a heap dump and see where most of the churn is? Is it lots of small applications / few really large applications with small containers etc. ? Cheers Ravi On Wed, Aug 23, 2017 at 9:23 AM, Ravuri, Venkata Puneet wrote: > Hello, > > > > I wanted to know if there is any reco

Re: Some Configs in hdfs-default.xml

2017-08-23 Thread Ravi Prakash
check-interval : This is more a function of how busy your datanodes are (sometimes they are too busy to heartbeat) and how robust is your network (dropping heartbeat packets). It doesn't really take too long to *check* the last heartbeat time of datanodes, but its a lot of work to order re-replicat

Re: JVM OPTS about HDFS

2017-08-18 Thread Akash Mishra
I am currently supporting Single Name Service is HA, Based on QJM with 0.9 PT data with 55-58 Million Object [ files + Blocks ] with 36G of JVM heap with G1GC. I would recommend starting with 16G and scale depending on your blocks with G1GC Garbage collection. Thanks, On Fri, Aug 18, 2017 at 4

Re: JVM OPTS about HDFS

2017-08-18 Thread Gurmukh Singh
400GB as heap space for Namenode is bit high. The GC pause time will be very high. For a cluster with about 6PB, approx 20GB is decent memory. As you mentioned it is HA, so it is safe to assume that the fsimage is check pointed at regular intervals and we do not need to worry during a manual

Re: Restoring Data to HDFS with distcp from standard input /dev/stdin

2017-08-16 Thread Ravi Prakash
Hi Heitor! Welcome to the Hadoop community. Think of the "hadoop distcp" command as a script which launches other JAVA programs on the Hadoop worker nodes. The script collects the list of sources, divides it among the several worker nodes and waits for the worker nodes to actually do the copying

Re: Error connecting to ZooKeeper server

2017-08-16 Thread Michael Chen
Also, the cluster is on AWS. Security group set to allow all inbound and outbound traffic... Any ideas?... On 08/16/2017 12:37 PM, Michael Chen wrote: Hi, I've run into a ZooKeeper connection error during the execution of a Nutch hadoop job. The tasks stall on connection error to ZooKeeper

Re: Forcing a file to update its length

2017-08-09 Thread Harsh J
[image: cid:image004.png@01D19182.F24CA3E0] > > > > *From:* Harsh J [mailto:ha...@cloudera.com] > *Sent:* Wednesday, August 9, 2017 3:01 PM > *To:* David Robison ; user@hadoop.apache.org > *Subject:* Re: Forcing a file to update its length > > > > I don't think it&#x

RE: Forcing a file to update its length

2017-08-09 Thread David Robison
@hadoop.apache.org Subject: Re: Forcing a file to update its length I don't think it'd be safe for a reader to force an update of length at the replica locations directly. Only the writer would be perfectly aware of the DNs in use for the replicas and their states, and the precise coun

Re: Forcing a file to update its length

2017-08-09 Thread Harsh J
I don't think it'd be safe for a reader to force an update of length at the replica locations directly. Only the writer would be perfectly aware of the DNs in use for the replicas and their states, and the precise count of bytes entirely flushed out of the local buffer. Thereby only the writer is i

Re: Forcing a file to update its length

2017-08-09 Thread Ravi Prakash
Hi David! A FileSystem class is an abstraction for the file system. It doesn't make sense to do an hsync on a file system (should the file system sync all files currently open / just the user's etc.) . With appropriate flags maybe you can make it make sense, but we don't have that functionality.

Re: modify the MapTask.java but no change

2017-08-07 Thread Ravi Prakash
09 PM, duanyu teng wrote: > Hi, > > I modify the MapTask.java file in order to output more log information. I > re-compile the file and deploy the jar to the whole clusters, but I found > that the output log has not changed, I don't know why. >

Re: modify the MapTask.java but no change

2017-08-07 Thread Edwina Lu
modify the MapTask.java but no change Hi, I modify the MapTask.java file in order to output more log information. I re-compile the file and deploy the jar to the whole clusters, but I found that the output log has not changed, I don't know why.

Re: Hadoop 2.8.0: Use of container-executor.cfg to restrict access to MapReduce jobs

2017-08-07 Thread Varun Vasudev
Hi Kevin, The check that’s carried out is the following(pseudo-code) - If(user_id < min_user_id && user_not_in_allowed_system_users) { return “user banned”; } If(user_in_banned_users_list) { return “user banned”; } In your case, you can either bump up the min user id to a higher number and

Re: Kerberised JobHistory Server not starting: User jhs trying to create the /mr-history/done directory

2017-08-06 Thread Kevin Buckley
On 25 July 2017 at 03:21, Erik Krogen wrote: > Hey Kevin, > > Sorry, I missed your point about using auth_to_local. You're right that you > should be able to use that for what you're trying to achieve. I think it's > just that your rule is wrong; I believe it should be: > > RULE:[2:$1@$0](jh

Re: Replication Factor Details

2017-08-02 Thread Ravi Prakash
/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L805 to find out how re-replications are ordered. (If you start the Namenode with environment variable "export HADOOP_NAMENODE_OPTS='-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1049'

Re: Shuffle buffer size in presence of small partitions

2017-07-31 Thread Robert Schmidtke
Hi all, fyi this is the ticket I opened up: https://issues.apache.org/jira/browse/MAPREDUCE-6923 Thanks in advance! Robert On Mon, Jul 31, 2017 at 10:21 PM, Ravi Prakash wrote: > Hi Robert! > > I'm sorry I do not have a Windows box and probably don't understand the > shuffle process well enoug

RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars

2017-07-31 Thread omprakash
Hi Surendra, Thanks a lot for the help. After adding this jar the error is gone. Regards Om Prakash From: surendra lilhore [mailto:surendra.lilh...@huawei.com] Sent: 31 July 2017 18:25 To: omprakash ; Brahma Reddy Battula ; 'user' Subject: RE: No FileSystem for scheme:

Re: How to write a Job for importing Files from an external Rest API into Hadoop

2017-07-31 Thread Ralph Soika
Hi Ravi, thanks a lot for your response and the code example! I think this will help me a lot to get started .I am glad to see that my idea is not to exotic. I will report if I can adapt the solution for my problem. best regards Ralph On 31.07.2017 22:05, Ravi Prakash wrote: Hi Ralph! Alth

Re: Shuffle buffer size in presence of small partitions

2017-07-31 Thread Ravi Prakash
Hi Robert! I'm sorry I do not have a Windows box and probably don't understand the shuffle process well enough. Could you please create a JIRA in the mapreduce proect if you would like this fixed upstream? https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&projectKey=MAPREDUCE Th

Re: How to write a Job for importing Files from an external Rest API into Hadoop

2017-07-31 Thread Ravi Prakash
Hi Ralph! Although not totally similar to your use case, DistCp may be the closest thing to what you want. https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java . The client builds a file list, and then submits an MR job to copy ov

RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars

2017-07-31 Thread surendra lilhore
: 31 July 2017 18:10 To: Brahma Reddy Battula; 'user' Subject: RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars Hi, I am executing the client from eclipse from my dev machine. The Hadoop cluster is a remote machine. I have added the required jars(including hadoop-hdfs-2.8

RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars

2017-07-31 Thread omprakash
From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: 31 July 2017 16:15 To: omprakash ; 'user' Subject: RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars Looks jar(hadoop-hdfs-2.8.0.jar) is missing in the classpath.Please check the client

RE: No FileSystem for scheme: hdfs when using hadoop-2.8.0 jars

2017-07-31 Thread Brahma Reddy Battula
Looks jar(hadoop-hdfs-2.8.0.jar) is missing in the classpath.Please check the client classpath. Might be there are no permissions OR missed the this jar while copying..? Reference: org.apache.hadoop.fs.FileSystem#getFileSystemClass if (clazz == null) { throw new UnsupportedFileSystemExcepti

Re: DR for Data Lake

2017-07-29 Thread daemeon reiydelle
Determine what is meant by "disaster recovery". What are the scenarious, what data. Architect to the business need, not the buzz words *“Anyone who isn’t embarrassed by who they were last year probably isn’t learning enough.” - Alain de Botton* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198Londo

Re: YARN - level/depth of monitoring info - newbie question

2017-07-28 Thread Naganarasimha Garla
Hi Rajila, Sorry for the delayed reply, You can refer to http://hadoop.apache.org/docs/r3.0.0-alpha4/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Counters or more detailed info is available in the book* "Hadoop- The Definitive Guide, 4th Edition" -> Chapter 9 MapRedu

Re: MapReduce and Spark jobs not starting

2017-07-28 Thread Ravi Prakash
Hi Nishant! You should be able to look at the datanode and nodemanager log files to find out why they died after you ran the 76 mappers. It is extremely unusual (I haven't heard of a verified case for over 4-5 years) of a job killing nodemanagers unless your cluster is configured poorly. Which con

Re: Hadoop upgrade from 2.5.1 to 2.7.3 (Running Hbase 1.2.5)

2017-07-28 Thread Sean Busbey
Take a look at the 2.7.3 docs on rolling upgrade for HDFS: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html I don't think there's similar existing docs for YARN, but your cluster description sounds like you're only using HDFS anyways. On Fri, Jul 28, 2

Re: how to get info about which data in hdfs or file system that a MapReduce job visits?

2017-07-27 Thread Ravi Prakash
Hi Jaxon! MapReduce is just an application (one of many including Tez, Spark, Slider etc.) that runs on Yarn. Each YARN application decides to log whatever it wants. For MapReduce, https://github.com/apache/hadoop/blob/27a1a5fde94d4d7ea0ed172635c146d594413781/hadoop-mapreduce-project/hadoop-mapred

Re: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-27 Thread Ravi Prakash
ng blocks > on DN2. > > > > Can this be related to properties I added for increasing replication rate? > > > > Regards > > Om Prakash > > > > *From:* Ravi Prakash [mailto:ravihad...@gmail.com] > *Sent:* 27 July 2017 01:26 > *To:* omprakash > *Cc:* user

Re: Install a hadoop cluster manager for open source hadoop 2.7.3

2017-07-27 Thread Billy Watson
Nishant, Sorry about the late reply. You may want to check out https://ambari.apache.org/mail-lists.html to see if the Ambari user list can answer your question better. William Watson Lead Software Engineer J.D. Power O2O http://www.jdpower.com/data-and-analytics/media-and-marketing-solutions-o2o

Re: How to use webhdfs CONCAT?

2017-07-27 Thread Wellington Chevreuil
Yes, all the files passed must pre-exist. In this case, you would need to run something as follows: curl -i -X POST "http://HOST/webhdfs/v1/PATH_TO_YOUR_HDFS_FOLDER/part-01-00-000?user.name=hadoop&op=CONCAT&sources=PATH_TO_YOUR_HDFS_FOLDER/part-02-00-000,PATH_TO_YOUR_HDFS_FOLDER/part-04-

Re: How to use webhdfs CONCAT?

2017-07-27 Thread Cinyoung Hur
Hi, Wellington All the source parts are: -rw-r--r-- hadoop supergroup 2.43 KB 2 32 MB part-01-00-000 -rw-r--r-- hadoop supergroup 21.14 MB 2 32 MB part-02-00-000 -rw-r--r-- hadoop supergroup 22.1 MB 2 32 MB part-04-00-000 -rw-r--r-- hadoop supergroup 22.29 MB 2 32 MB part-05-00-00

RE: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-26 Thread omprakash
[mailto:ravihad...@gmail.com] Sent: 27 July 2017 01:26 To: omprakash Cc: user Subject: Re: Lots of Exception for "cannot assign requested address" in datanode logs Hi Omprakash! DatanodeRegistration happens when the Datanode first hearbeats to the Namenode. In your case, it

Re: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-26 Thread Ravi Prakash
Hi Omprakash! DatanodeRegistration happens when the Datanode first hearbeats to the Namenode. In your case, it seems some other application has acquired the port 50010 . You can check this with the command "netstat -anp | grep 50010" . Are you trying to run 2 datanode processes on the same machine

Re: YARN - level/depth of monitoring info - newbie question

2017-07-25 Thread rajila2008 .
Thank you Naga & Sunil . Naga, Would like to know more about the counters ; Are they a cluster wide resource managed at a central location - so they can be tracked/verified later ?! Please advise Thanks, Rajila On Tue, Jul 25, 2017 at 7:01 PM, Naganarasimha Garla < naganarasimha...@apache.org>

Re: YARN - level/depth of monitoring info - newbie question

2017-07-25 Thread Naganarasimha Garla
Hi Rajila, One option you can think of is using custom "counters" and have a logic to increment them when ever you insert or have any custom logic. These counters can be got from the MR interfaces and even in the web ui even after the job has finished. Regards, + Naga On Tue, Jul 25

Re: How to use webhdfs CONCAT?

2017-07-25 Thread Wellington Chevreuil
Hi Cinyoung, Concat has some restrictions, like the need for src file having last block size to be the same as the configured dfs.block.size. If all the conditions are met, below command example should work (where we are concatenating /user/root/file-2 into /user/root/file-1): curl -i -X POST

<    5   6   7   8   9   10   11   12   13   14   >