Re: Securing secrets for S3 FileSystems in DistCp

2016-05-03 Thread Shekhar Sharma
Have u used IAM (identity access management ) roles ? On 3 May 2016 18:11, "Elliot West" wrote: > Hello, > > We're currently using DistCp and S3 FileSystems to move data from a > vanilla Apache Hadoop cluster to S3. We've been concerned about exposing > our AWS secrets on our

Re: Multinode setup..

2015-01-03 Thread Shekhar Sharma
Have you done the following: (1) Edit masters and slaves file (2) Edit mapred and core site.xml (3) delete the hadoop-temp folder (folder as specified by either dfs.data.dir and dfs.name.dir) (4) format the namenode (hadoop namenode -format) (5) start the cluster Regards, Som Shekhar Sharma

Re: Map-Reduce: How to make MR output one file an hour?

2014-03-01 Thread Shekhar Sharma
Don't you think using flume would be easier. Use hdfs sink and use a property to roll out the log file every hour. By doing this way you use a single flume agent to receive logs as and when it is generating and you will be directly dumping to hdfs. If you want to remove unwanted logs you can write

Re: Mappers vs. Map tasks

2014-02-26 Thread Shekhar Sharma
Writing a custom input format would be much easier and you would have better control You might be tempted to use Jackson lib to do the process json but that requires that you need to know your Json data and this assumption would break if your format of data changes I would suggest write a custom

Re: Reg:Hive query with mapreduce

2014-02-20 Thread Shekhar Sharma
of map tasks in this case and if you want to have single output file then (1) Set the mapred.min.split.size=Equal to file size or some bigger value like Long.MAX_VALUE. It will spawn only one mapper task and you will get one output file } Regards, Som Shekhar Sharma +91-8197243810 On Thu, Feb 20

Re: XML to TEXT

2014-02-12 Thread Shekhar Sharma
Which input format you are using . Use xml input format. On 3 Jan 2014 10:47, Ranjini Rathinam ranjinibe...@gmail.com wrote: Hi, Need to convert XML into text using mapreduce. I have used DOM and SAX parser. After using SAX Builder in mapper class. the child node act as root Element.

Re: A hadoop command to determine the replication factor of a hdfs file ?

2014-02-08 Thread Shekhar Sharma
Run fsck command On 8 Feb 2014 23:03, Raj Hadoop hadoop...@yahoo.com wrote: Hi, Is there a hadoop command to determine the replication factor of a hdfs file ? Please advise. I know that fs setrep only changes the replication factor. Regards, Raj

commissioning and decommissioning a task tracker

2014-02-02 Thread Shekhar Sharma
to run the task tracker on a machine which is not there in include file. I have worked with the new property as well mapred.jobtracker.hosts.file , but it didnt worked as well. Please advise me. Regards, Som Shekhar Sharma +91-8197243810

Commissioning Task tracker

2014-01-27 Thread Shekhar Sharma
this property is not recognized by the job tracker. Please advise me how to commission a task tracker Regards, Som Shekhar Sharma +91-8197243810

Re: Commissioning Task tracker

2014-01-27 Thread Shekhar Sharma
U mean property name? Regards, Som Shekhar Sharma +91-8197243810 On Mon, Jan 27, 2014 at 9:14 PM, Nitin Pawar nitinpawar...@gmail.comwrote: I think the file name is mapred.include On Mon, Jan 27, 2014 at 9:11 PM, Shekhar Sharma shekhar2...@gmail.comwrote: Hello, I am using apache hadoop

Re: Commissioning Task tracker

2014-01-27 Thread Shekhar Sharma
/configuration My include file has the entry of the machine... Regards, Som Shekhar Sharma +91-8197243810 On Mon, Jan 27, 2014 at 9:28 PM, Nitin Pawar nitinpawar...@gmail.comwrote: Sorry for incomplete reply In hadoop 1.2/1.0 , following is the property property namemapred.hosts

Re: HDFS data transfer is faster than SCP based transfer?

2014-01-25 Thread Shekhar Sharma
WHEN u put the data or write into HDFS, 64kb of data is written on client side and then it is pushed through pipeline and this process continue till 64mb of data is written which is the block size defined by the client. While on the other hand scp will try to buffer the entire data. Passing

Re: hadoop report has corrupt block but i can not find any in block metadata

2014-01-25 Thread Shekhar Sharma
Run fsck command hadoop fsck 《path》-files -blocks -locations On 25 Jan 2014 08:04, ch huang justlo...@gmail.com wrote: hi,maillist: this morning nagios alert hadoop has corrupt block ,i checked it use hdfs dfsadmin -report ,from it output ,it did has corrupt blocks

Re: HDFS data transfer is faster than SCP based transfer?

2014-01-25 Thread Shekhar Sharma
We have the concept of short circuit reads which directly reads from data node which improve read performance. Do we have similar concept like short circuit writes On 25 Jan 2014 16:10, Harsh J ha...@cloudera.com wrote: There's a lot of difference here, although both do use TCP underneath, but

Re: Datanode Shutting down automatically

2014-01-24 Thread Shekhar Sharma
Incompatible name space Id error.its because dat u might have formatted the namenode but data nodes folder is still have the same I'd. What is the value of the following property dfs. Data.dir dfs. name.dir hadoop.tmp.dir The value of these properties is directory on local file system Solution

Re: Set number of mappers

2014-01-21 Thread Shekhar Sharma
nO of map tasks is determined by number of input splits.you can change the NUM Of map tasks by changing the input split size But you can set to num of reducertasks explicitly On 21 Jan 2014 20:25, xeon xeonmailingl...@gmail.com wrote: Hi, I want to set the number of map tasks in the Wordcount

Re: DataNode not starting in slave machine

2013-12-25 Thread Shekhar Sharma
It is running on local file system file:/// Regards, Som Shekhar Sharma +91-8197243810 On Wed, Dec 25, 2013 at 7:01 PM, Vishnu Viswanath vishnu.viswanat...@gmail.com wrote: Hi, I am getting this error while starting the datanode in my slave system. I read the JIRA HDFS-2515, it says

Re: Hadoop-MapReduce

2013-12-17 Thread Shekhar Sharma
of org.apache.hadoop.mapreduce.lib Check out the XMLINputFormat.java, which package of FileInputFormat they have used... Regards, Som Shekhar Sharma +91-8197243810 On Tue, Dec 17, 2013 at 12:55 PM, Ranjini Rathinam ranjinibe...@gmail.com wrote: Hi, I am using hadoop 0.20 version In that while

Re: XmlInputFormat Hadoop -Mapreduce

2013-12-17 Thread Shekhar Sharma
problem. In case if you face any difficulty please feel free to contact Regards, K Som Shekhar Sharma +91-8197243810 Regards, Som Shekhar Sharma +91-8197243810 On Tue, Dec 17, 2013 at 5:42 PM, Ranjini Rathinam ranjinibe...@gmail.com wrote: Hi, I have attached the code. Please verify

Re: Estimating the time of my hadoop jobs

2013-12-17 Thread Shekhar Sharma
cluster. You will for sure see the performance improvement Regards, Som Shekhar Sharma +91-8197243810 On Tue, Dec 17, 2013 at 6:42 PM, Devin Suiter RDX dsui...@rdx.com wrote: Nikhil, One of the problems you run into with Hadoop in Virtual Machine environments is performance issues when

Re: How to set hadoop.tmp.dir if I have multiple disks per node?

2013-12-16 Thread Shekhar Sharma
subfolders dfs and mapred. the /home/training/hadoop/mapred folder will be on HDFS also Hope this clears Regards, Som Shekhar Sharma +91-8197243810 On Mon, Dec 16, 2013 at 1:42 PM, Dieter De Witte drdwi...@gmail.com wrote: Hi, Make sure to also set mapred.local.dir to the same set of output

Re: issue when using HDFS

2013-12-16 Thread Shekhar Sharma
Seems like DataNode is not running or went dead Regards, Som Shekhar Sharma +91-8197243810 On Mon, Dec 16, 2013 at 1:40 PM, Geelong Yao geelong...@gmail.com wrote: Hi Everyone After I upgrade the hadoop to CDH 4.2.0 Hadoop 2.0.0,I try to running some test When I try to upload file to HDFS

Re: External db

2013-12-15 Thread Shekhar Sharma
HDFS is batch mode suitable for OLAP not for OLTP.. OLTP requires the data need to be updated, deleted , basically CRUD operation..But HDFS doesnt support random writes..so not suitable for OLTP...And thats where the nosql databases comes in Regards, Som Shekhar Sharma +91-8197243810 On Sun

Re: Hadoop-MapReduce

2013-12-09 Thread Shekhar Sharma
First Option: Put the jar in $HADOOP_HOME/lib folder And then run hadoop classpath command on your terminal to check whether the jar has been added Second OPtion: PUt the jar path in HADOOP_CLASSPATH variable ( hadoop-env.sh file) and restart your cluster.. Regards, Som Shekhar Sharma +91

Re: Hadoop-MapReduce

2013-12-09 Thread Shekhar Sharma
, now you can use a xml parser to parse the contents into an java object and form your own key value pairs ( custom key and custom value) Hope you have enough pointers to write the code. Regards, Som Shekhar Sharma +91-8197243810 On Mon, Dec 9, 2013 at 6:30 PM, Ranjini Rathinam ranjinibe

Re: Uploading a file to HDFS

2013-09-26 Thread Shekhar Sharma
then he will again contact NN via heart beat signal and this process continuess... Check how does writing happens in HDFS? Regards, Som Shekhar Sharma +91-8197243810 On Thu, Sep 26, 2013 at 3:41 PM, Karim Awara karim.aw...@kaust.edu.sa wrote: Hi, I have a couple of questions about

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Shekhar Sharma
mapred.min.split.size? have you changed to something else or is it default which is 1? Regards, Som Shekhar Sharma +91-8197243810 On Thu, Sep 26, 2013 at 4:39 PM, Viji R v...@cloudera.com wrote: Hi, Default number of map tasks is 2. You can set mapred.map.tasks to 1 to avoid this. Regards, Viji

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Shekhar Sharma
( mapred.min.split.size, mapred.max.split.size and dfs.block.size) max(minSPlitSize, min(maxSPlitsize, blocksize)) Regards, Som Shekhar Sharma +91-8197243810 On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv dwivedishash...@gmail.com wrote: just try giving -Dmapred.tasktracker.map.tasks.maximum=1

Re: InvalidProtocolBufferException while submitting crunch job to cluster

2013-08-31 Thread Shekhar Sharma
: java.net.UnknownHostException: bdatadev edit your /etc/hosts file Regards, Som Shekhar Sharma +91-8197243810 On Sat, Aug 31, 2013 at 2:05 AM, Narlin M hpn...@gmail.com wrote: Looks like I was pointing to incorrect ports. After correcting the port numbers, conf.set(fs.defaultFS, hdfs

Re: InvalidProtocolBufferException while submitting crunch job to cluster

2013-08-31 Thread Shekhar Sharma
Can you please check whether are you able to access HDFS using java API..and also able to run MR Job. Regards, Som Shekhar Sharma +91-8197243810 On Sat, Aug 31, 2013 at 7:08 PM, Narlin M hpn...@gmail.com wrote: The server_address that was mentioned in my original post is not pointing

Re: secondary sort - number of reducers

2013-08-30 Thread Shekhar Sharma
Is the hash code of that key is negative.? Do something like this return groupKey.hashCode() Integer.MAX_VALUE % numParts; Regards, Som Shekhar Sharma +91-8197243810 On Fri, Aug 30, 2013 at 6:25 AM, Adeel Qureshi adeelmahm...@gmail.com wrote: okay so when i specify the number of reducers

Re: how to find process under node

2013-08-29 Thread Shekhar Sharma
Are your trying to find the java process under a node...Then simple thing would be to do ssh and run jps command to get the list of java process Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 29, 2013 at 12:27 PM, suneel hadoop suneel.bigd...@gmail.com wrote: Hi All, what im trying

Re: reading input stream

2013-08-29 Thread Shekhar Sharma
); } Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 29, 2013 at 12:15 PM, jamal sasha jamalsha...@gmail.com wrote: Hi, Probably a very stupid question. I have this data in binary format... and the following piece of code works for me in normal java. public classparser

Re: Hadoop client user

2013-08-29 Thread Shekhar Sharma
Put that user in hadoop group... And if the user wants to hadoop client, then the use should be aware of two properties fs.default.name which is the address of NameNode and mapred.job.tracker which is the address of job tracker Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 29, 2013

Re: Sqoop issue related to Hadoop

2013-08-29 Thread Shekhar Sharma
Go inside the $HADOOP_HOME/log/user/history... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 29, 2013 at 10:13 AM, Hadoop Raj hadoop...@yahoo.com wrote: Hi Kate, Where can I find the task attempt log? Can you specify the location please? Thanks, Raj On Aug 28, 2013, at 7:13 PM

Re: hadoop debugging tools

2013-08-27 Thread Shekhar Sharma
You can get the stats for a job using rumen. http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html Regards, Som Shekhar Sharma +91-8197243810 On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M mgo...@gmail.com wrote: Harsh: thanks for the quick response. we often see

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
did in Master machine Once you start the cluster by running the command start-all.sh, check the ports 54310 and 54311 got opened by running the command netstat -tuplen..it will show whether ports are opened or not Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 4:57 PM

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
-set-using-vms.html Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 5:21 PM, Shekhar Sharma shekhar2...@gmail.comwrote: if you have removed this property from the slave machines then your DN information will be created under /tmp folder and once you reboot your data node

Re: Is there any way to use a hdfs file as a Circular buffer?

2013-08-07 Thread Shekhar Sharma
Use CEP tool like Esper and Storm, you will be able to achieve that ...I can give you more inputs if you can provide me more details of what you are trying to achieve Regards, Som Shekhar Sharma +91-8197243810 On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin vboylin1...@gmail.com wrote: Hi Niels

Re: Datanode doesn't connect to Namenode

2013-08-07 Thread Shekhar Sharma
Disable the firewall on data node and namenode machines.. Regards, Som Shekhar Sharma +91-8197243810 On Wed, Aug 7, 2013 at 11:33 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Your hdfs name entry should be same on master and databnodes * namefs.default.name/name* *valuehdfs://cloud6

Re: specify Mapred tasks and slots

2013-08-07 Thread Shekhar Sharma
use mapred.tasktracker.reduce.tasks in mapred-site.xml the default value is 2...Which means that on this task tracker it will not run more than 2 reducer tasks at any given point of time.. Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com

Re: specify Mapred tasks and slots

2013-08-07 Thread Shekhar Sharma
Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one

Re: New hadoop 1.2 single node installation giving problems

2013-07-23 Thread Shekhar Sharma
Its warning not error... Create a directory and then do ls ( In your case /user/hduser is not created untill and unless for the first time you create a directory or put some file) hadoop fs -mkdir sample hadoop fs -ls I would suggest if you are getting pemission problem, please check the

Re: New hadoop 1.2 single node installation giving problems

2013-07-23 Thread Shekhar Sharma
After starting i would suggest always check whether your NameNode and job tracker UI are working or not and check the number of live nodes in both of the UI.. Regards, Som Shekhar Sharma +91-8197243810 On Tue, Jul 23, 2013 at 10:41 PM, Ashish Umrani ashish.umr...@gmail.comwrote: Thanks

Re: New hadoop 1.2 single node installation giving problems

2013-07-23 Thread Shekhar Sharma
hadoop jar wc.jar fully qualified driver name inputdata outputdestination Regards, Som Shekhar Sharma +91-8197243810 On Tue, Jul 23, 2013 at 10:58 PM, Ashish Umrani ashish.umr...@gmail.comwrote: Jitendra, Som, Thanks. Issue was in not having any file there. Its working fine now. I am

Re: Hadoop property precedence

2013-07-14 Thread Shekhar Sharma
the replica. THis information is sent to client and then client starts writing the data on to the data nodes by forming pipe line..And then client write that much amount of data onto a block which he has set. Regards, Som Shekhar Sharma +91-8197243810 On Sun, Jul 14, 2013 at 4:24 PM, Harsh J ha

Map slots and Reduce slots

2013-07-14 Thread Shekhar Sharma
machine, how can i determine that what would be the optimal number of map and reducer slots for this machine Regards, Som Shekhar Sharma +91-8197243810