Re: HDFS HA namenode issue

2023-10-03 Thread Susheel Kumar Gadalay
name.dir) configured. Beware of data loss due to lack of > redundant storage directories! > (org.apache.hadoop.hdfs.server.namenode.FSNamesystem) > > I am using a journal node, so I am not clear if I am supposed to have > multiple dfs.namenode.name.dir directories > I thought each nam

Re: HDFS HA namenode issue

2023-10-03 Thread Susheel Kumar Gadalay
The core-site.xml configuration settings will be overridden by hdfs-site.xml, mapred-site.xml, yarn-site.xml. This was like that but don't know if it is changed now. Look at your shared.edits.dir configuration. You have not set it correct across name nodes. Regards On Tue, 3 Oct 2023, 1:59 pm

Re: Compare hadoop and ytsaurus

2023-09-29 Thread Susheel Kumar Gadalay
Why still investing in these old technologies? Any reasons except for not able to migrate to cloud because of non-availabilty and data residency requirements. How much is Hadoop data compatibility (parquet and HBase data), code compatibility of UDFs, megastore migration etc.. Thanks Susheel

Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Susheel Kumar Gadalay
Please remove user@hadoop from this mail list. It is specific to dev team only. Thanks On Friday, August 23, 2019, Wangda Tan wrote: > Sounds good, let me make the changes to do simply bi-weekly then. > I will update it tonight if possible. > > Best, > Wangda > On Fri, Aug 23, 2019 at 1:50 AM

Re: Hadoop Community Sync Up Schedule

2019-08-23 Thread Susheel Kumar Gadalay
Please remove user@hadoop from this mail list. It is specific to dev team only. Thanks SK On Friday, August 23, 2019, Wangda Tan wrote: > Sounds good, let me make the changes to do simply bi-weekly then. > I will update it tonight if possible. > > Best, > Wangda > On Fri, Aug 23, 2019 at 1:50

Re: Mapreduce to and from public clouds

2019-06-14 Thread Susheel Kumar Gadalay
In GCP the equivalent of HDFS is Google Could Storage. You have to change the url from hdfs:// to gs://. The map reduce api's will work as it is with this change. You run map reduce jobs on Google Dataproc instance. Your storage is in Google Cloud Storage bucket. Refer GCP documents. On Friday,

Re: Avoiding using hostname for YARN nodemanagers

2017-12-05 Thread Susheel Kumar Gadalay
Check properties yarn.nodemanager.hostname, yarn.resourcemanager.hostname under yarn-site.xml. On 12/5/17, Alvaro Brandon wrote: > Thanks for your answer Vinay: > > The thing is that I'm using Marathon and not the Docker engine per se. I > don't want to set a -h

Re: Hadoop DR setup for HDP 2.6

2017-11-16 Thread Susheel Kumar Gadalay
1.0.0/bk_dlm-administration/content/dlm_terminology.html > ) > > On Wed, Nov 15, 2017 at 10:43 PM, Susheel Kumar Gadalay > <skgada...@gmail.com >> wrote: > >> Hi, >> >> We have to setup DR for production Hadoop environment based on HDP 2.6. >> >> Ca

Hadoop DR setup for HDP 2.6

2017-11-15 Thread Susheel Kumar Gadalay
Hi, We have to setup DR for production Hadoop environment based on HDP 2.6. Can someone share the detailed setup instructions and best practices. Thanks SKG - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For

Re: Just switched to yarn/Hadoop 2.6.0 - help with logs please!

2015-11-25 Thread Susheel Kumar Gadalay
Use port 8088 :8088 On 11/25/15, Tony Burton wrote: > Hi, > > After a long time using Hadoop 1.x, I've recently switched to Hadoop 2.6.0. > I've got a MapReduce program running, but I want to see the logs and debug > info that I used to be able to view via the

Re: App Master takes ~30min to re-schedule task attempts.

2015-08-20 Thread Susheel Kumar Gadalay
Change mapreduce.reduce.shuffle.connect.timeout, mapreduce.reduce.shuffle.read.timeout. By default they are 18. On 8/20/15, manoj manojm@gmail.com wrote: Hello all, I'm running Apache2.6.0. I'm trying to remove a node from a Hadoop Cluster and the add it back. The taskattempts on the

Re: App Master takes ~30min to re-schedule task attempts.

2015-08-20 Thread Susheel Kumar Gadalay
Change mapreduce.reduce.shuffle.connect.timeout, mapreduce.reduce.shuffle.read.timeout. By default they are 18. On 8/20/15, manoj manojm@gmail.com wrote: Hello all, I'm running Apache2.6.0. I'm trying to remove a node from a Hadoop Cluster and the add it back. The taskattempts on the

Re: Connection Refused error on Hadoop2.6 running Ubuntu 15.04 Desktop on Pseudo-distributed mode

2015-04-27 Thread Susheel Kumar Gadalay
jps listing is not showing namenode daemon. Verify why namenode is not up from the logs. On 4/27/15, Anand Murali anand_vi...@yahoo.com wrote: Dear All: Please find below. and_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$ start-dfs.sh Starting namenodes on [localhost] localhost: starting

Re: How I list files in HDFS?

2015-02-05 Thread Susheel Kumar Gadalay
You can do like this. Configuration conf = getConf(); FileSystem fs = FileSystem.get(conf); FileStatus[] fstatus = fs.listStatus(new Path(...)); String generatedFile; for (int i=0; ifstatus.length; i++) { generatedFile = fstatus[i].getPath().getName();

Re: Question about shuffle/merge/sort phrase

2014-12-21 Thread Susheel Kumar Gadalay
It is the mapper which will push the o/p to the respective reducer as soon as it completes. The no of reducers are known at the beginning itself. The mapper as it process the input split, generate the o/p of for each reducer (if the mapper o/p key is eligible for the reducer). The reducer will

Re: Question about shuffle/merge/sort phrase

2014-12-21 Thread Susheel Kumar Gadalay
Sorry, typo It is the reducer which will pull the mapper o/p as soon as it completes. On 12/22/14, Susheel Kumar Gadalay skgada...@gmail.com wrote: It is the mapper which will push the o/p to the respective reducer as soon as it completes. The no of reducers are known at the beginning itself

Re: Re: Question about shuffle/merge/sort phrase

2014-12-21 Thread Susheel Kumar Gadalay
the mapper nodes before reducer see the key,value1,value2..? bit1...@163.com From: Susheel Kumar Gadalay Date: 2014-12-22 13:20 To: user Subject: Re: Question about shuffle/merge/sort phrase Sorry, typo It is the reducer which will pull the mapper o/p as soon as it completes. On 12/22/14

Re: How to let namenode http web UI port listen on another interface?

2014-12-19 Thread Susheel Kumar Gadalay
Try giving this property in hdfs-site.xml dfs.namenode.http-address=0.0.0.0:50070 Replace 0.0.0.0 with your other network interface. On 12/18/14, Tao Xiao xiaotao.cs@gmail.com wrote: I installed HDFS (CDH 5.2.1) on a host with two interfaces - eth0 (10.252.25.68) and eth1 (120.27.43.80).

Re: Error while creating hadoop package 2.6.0 with Maven 3.2.3

2014-12-17 Thread Susheel Kumar Gadalay
Install git and add the git\bin to the path environment variable. Most of the UNIX/LINUX commands are available in git\bin and from Windows command I can use common Unix shell commands ls -l, rm, grep... On 12/18/14, Venkat Ramakrishnan venkat.archit...@gmail.com wrote: Hi Arpit, Thanks for

Re: Hadoop 2.6.0: FileSystem file:/// is not a distributed file system

2014-12-15 Thread Susheel Kumar Gadalay
Give complete hostname with domain name not just master-node. property namefs.defaultFS/name valuehdfs://master-node.domain.name:9000/value /property Else give IP address also On 12/16/14, Dan Dong dongda...@gmail.com wrote: Hi, Johny, Yes, they have been turned off from the beginning.

Re: Where the output of mappers are saved ?

2014-12-15 Thread Susheel Kumar Gadalay
Map outputs will be in hdfs under your user name and output directory. They will have name like part-m-, part-m-0001 On 12/16/14, Abdul Navaz navaz@gmail.com wrote: Hello, Second Try ! I have created a directory to store this mapper output as below. property

Re: Re: Where the output of mappers are saved ?

2014-12-15 Thread Susheel Kumar Gadalay
,say, 2G, will the file be splitted into more files under the output directory,that is, one reducer could product more than one files. bit1...@163.com From: Susheel Kumar Gadalay Date: 2014-12-16 14:17 To: user Subject: Re: Re: Where the output of mappers are saved ? Yes, the map outputs

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Susheel Kumar Gadalay
Simple solution.. Copy the HDFS file to local and use OS commands to count no of lines cat file1 | wc -l and cut it based on line number. On 12/12/14, unmesha sreeveni unmeshab...@gmail.com wrote: I am trying to divide my HDFS file into 2 parts/files 80% and 20% for classification

Re: MR job fails with too many mappers

2014-11-19 Thread Susheel Kumar Gadalay
In which case the split metadata go beyond 10MB? Can u give some details of your input file and splits. On 11/19/14, francexo83 francex...@gmail.com wrote: Thank you very much for your suggestion, it was very helpful. This is what I have after turning off log aggregation: 2014-11-18

Re: hadoop 2.4 using Protobuf - How does downgrade back to 2.3 works ?

2014-10-21 Thread Susheel Kumar Gadalay
New files added in 2.4.0 will not be there in the metadata of 2.3.0. You need to add once again. On 10/21/14, Manoj Samel manojsamelt...@gmail.com wrote: Is the pre-upgrade metadata also kept updated with any changes one in 2.4.0 ? Or is it just the 2.3.0 snapshot preserved? Thanks, On Sat,

Re: Hadoop UI - Unable to connect to the application master from the Hadoop UI.

2014-09-30 Thread Susheel Kumar Gadalay
, 2014 at 12:44 AM, Susheel Kumar Gadalay skgada...@gmail.com wrote: I also faced some issue like this. It shows the URL in host name:port Copy paste the link in browser and expand the host name. I set up the host names in windows user etc/hosts file but still it could not resolve. On 9/29

Hadoop shutdown scripts failing

2014-09-29 Thread Susheel Kumar Gadalay
How to redirect the storing of the following files from /tmp to some other location. hadoop-os user-namenode.pid hadoop-os user-datanode.pid yarn-os user-resourcemanager.pid yarn-os user-nodemanager.pid In /tmp, these files are cleared by OS sometime back and I am unable to shutdown by standard

Re: Hadoop shutdown scripts failing

2014-09-29 Thread Susheel Kumar Gadalay
to some other directory (most common is /var/run/hadoop-hdfs hadoop-yarn) Hope it helps, Aitor On 29 September 2014 07:50, Susheel Kumar Gadalay skgada...@gmail.com wrote: How to redirect the storing of the following files from /tmp to some other location. hadoop-os user-namenode.pid hadoop-os

Re: No space when running a hadoop job

2014-09-29 Thread Susheel Kumar Gadalay
particular case, the balancer won't fix your issue. Hope it helps, Aitor On 29 September 2014 05:53, Susheel Kumar Gadalay skgada...@gmail.com wrote: You mean if multiple directory locations are given, Hadoop will balance the distribution of files across these different directories

Re: Hadoop shutdown scripts failing

2014-09-29 Thread Susheel Kumar Gadalay
Thanks On 9/29/14, Aitor Cedres aced...@pivotal.io wrote: Check the file $HADOOP_HOME/bin/yarn-daemon.sh; there is a reference to YARN_PID_DIR. If it's not set. it will default to /tmp. On 29 September 2014 13:11, Susheel Kumar Gadalay skgada...@gmail.com wrote: Thanks Aitor

Re: No space when running a hadoop job

2014-09-28 Thread Susheel Kumar Gadalay
of files? One way is to list the files and move. Will start balance script will work? On 9/27/14, Alexander Pivovarov apivova...@gmail.com wrote: It can read/write in parallel to all drives. More hdd more io speed. On Sep 27, 2014 7:28 AM, Susheel Kumar Gadalay skgada...@gmail.com wrote: Correct

Re: No space when running a hadoop job

2014-09-27 Thread Susheel Kumar Gadalay
Correct me if I am wrong. Adding multiple directories will not balance the files distributions across these locations. Hadoop will add exhaust the first directory and then start using the next, next .. How can I tell Hadoop to evenly balance across these directories. On 9/26/14, Matt Narrell

Re: FileUtil Copy example

2014-09-25 Thread Susheel Kumar Gadalay
().length() = 10) On 25 Sep 2014, at 07:14, Susheel Kumar Gadalay skgada...@gmail.com wrote: I solved it like this. This will move a file from one location to another location. FileStatus[] fstatus = FileSystem.listStatus(new Path(Old HDFS directory)); for (int i=0; ifstatus.length; i

Re: FileUtil Copy example

2014-09-24 Thread Susheel Kumar Gadalay
(some name)) FileSystem.rename(new Path(Old HDFS Directory + / + fstatus[i].getPath().getName()), new Path(New HDFS Directory + / + fstatus[i].getPath().getName())); } } I don't think copy individual file API is available. On 9/23/14, Susheel Kumar Gadalay skgada...@gmail.com wrote: Can

FileUtil Copy example

2014-09-23 Thread Susheel Kumar Gadalay
Can somebody give a good example of using Hadoop FileUtil API to copy set of files from one directory to another directory. I want to copy only a set of files not all files in the directory and also I want to use wild character like part_r_*. Thanks Susheel Kumar

Re: Failed to rollback from hadoop-2.4.1 to hadoop 2.2.0

2014-09-18 Thread Susheel Kumar Gadalay
You have to upgrade both name node and data node. Better issue start-dfs.sh -upgrade. Check whether current and previous directories are present in both dfs.namenode.name.dir and dfs.datanode.data.dir directory. On 9/18/14, sam liu samliuhad...@gmail.com wrote: Hi Expert, Below are my steps

Re: Failed to rollback from hadoop-2.4.1 to hadoop 2.2.0

2014-09-18 Thread Susheel Kumar Gadalay
comment! I can upgrade from 2.2.0 to 2.4.1 using command 'start-dfs.sh -upgrade', however failed to rollback from 2.4.1 to 2.2.0 using command 'start-dfs.sh -rollback': the namenode always stays on safe mode(awaiting reported blocks (0/315)). Why? 2014-09-18 1:51 GMT-07:00 Susheel Kumar

Re: YARN ResourceManager Hostname in HA configuration

2014-09-17 Thread Susheel Kumar Gadalay
I observed in Yarn Cluster, you set these properties yarn.resourcemanager.hostname.rm-id1 yarn.resourcemanager.hostname.rm-id2 not yarn.resourcemanager.hostname. On 9/17/14, Matt Narrell matt.narr...@gmail.com wrote: How do I configure the “yarn.resourcemanager.hostname” property when in an

Re: Cannot start DataNode after adding new volume

2014-09-16 Thread Susheel Kumar Gadalay
Is it something to do current/VERSION file in data node directory. Just copy from the existing directory and start. On 9/16/14, Charles Robertson charles.robert...@gmail.com wrote: Hi all, I am running out of space on a data node, so added a new volume to the host, mounted it and made sure

Re: Cannot start DataNode after adding new volume

2014-09-16 Thread Susheel Kumar Gadalay
to it without changes? I see it has various guids, and so I'm worried about it clashing with the VERSION file in the other data directory. Thanks, Charles On 16 September 2014 10:57, Susheel Kumar Gadalay skgada...@gmail.com wrote: Is it something to do current/VERSION file in data node directory

Re: virtual memory consumption

2014-09-11 Thread Susheel Kumar Gadalay
Your physical memory is 1GB on this node. What are the other containers (map tasks) running on this? You have given map memory as 768M and reduce memory as 1024M and am as 1024M. With AM and a single map task it is 1.7M and cannot start another container for reducer. Reduce these values and

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Susheel Kumar Gadalay
If you don't want key in the final output, you can set like this in Java. job.setOutputKeyClass(NullWritable.class); It will just print the value in the output file. I don't how to do it in python. On 9/10/14, Dmitry Sivachenko trtrmi...@gmail.com wrote: Hello! Imagine the following common

Re: conf.get(dfs.data.dir) return null when hdfs-site.xml doesn't set it explicitly

2014-09-08 Thread Susheel Kumar Gadalay
One doubt on building Configuration object. I have a Hadoop remote client and Hadoop cluster. When a client submitted a MR job, the Configuration object is built from Hadoop cluster node xml files, basically the resource manager node core-site.xml and mapred-site.xml and yarn-site.xml. Am I

Re: Running job issues

2014-08-27 Thread Susheel Kumar Gadalay
You have to use this command to format hdfs namenode –format not hdfs dfs -format On 8/27/14, Blanca Hernandez blanca.hernan...@willhaben.at wrote: Hi, thanks for your answers. Sorry, I forgot to add it, I couldn´t run the command neither:

Re: ignoring map task failure

2014-08-18 Thread Susheel Kumar Gadalay
Check the parameter yarn.app.mapreduce.client.max-retries. On 8/18/14, parnab kumar parnab.2...@gmail.com wrote: Hi All, I am running a job where there are between 1300-1400 map tasks. Some map task fails due to some error. When 4 such maps fail the job naturally gets killed. How to

Re: MR AppMaster unable to load native libs

2014-08-12 Thread Susheel Kumar Gadalay
This message I have also got when running in 2.4.1 I have found the native libraries in $HADOOP_HOME/lib/native are 32 bit not 64 bit. Recompile once again and build 64 bit shared objects, but it is a lengthy exercise. On 8/13/14, Subroto Sanyal ssan...@datameer.com wrote: Hi, I am running a

How to restrict ephemral ports used by Yarn App Master

2014-08-10 Thread Susheel Kumar Gadalay
Hi, I have a question. How do I selectively open port range for Hadoop Yarn App Master on a cluster. I have seen the jira issue in http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-issues/201204.mbox/%3c74835698.75.1335357881103.javamail.tom...@hel.zones.apache.org%3E fixed in version

How to run Job tracker/Task tracker in 2.2.0 and later

2014-08-10 Thread Susheel Kumar Gadalay
Hi, I am using Hadoop 2.2.0 version. I am not finding start-mapred.sh in the sbin directory. How do I start Job tracker and Task tracker. I tried version 1 way by directly executing but getting these errors. [hadoop@ip-10-147-128-12 ~]$ sbin/hadoop-daemon.sh start jobtracker starting

[no subject]

2014-08-08 Thread Susheel Kumar Gadalay
Hi, I have a question. How do I selectively open port range for Hadoop Yarn App Master on a cluster. I have seen the jira issue in http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-issues/201204.mbox/%3c74835698.75.1335357881103.javamail.tom...@hel.zones.apache.org%3E fixed in version