Have u used IAM (identity access management ) roles ?
On 3 May 2016 18:11, "Elliot West" wrote:
> Hello,
>
> We're currently using DistCp and S3 FileSystems to move data from a
> vanilla Apache Hadoop cluster to S3. We've been concerned about exposing
> our AWS secrets on our
Have you done the following:
(1) Edit masters and slaves file
(2) Edit mapred and core site.xml
(3) delete the hadoop-temp folder (folder as specified by either
dfs.data.dir and dfs.name.dir)
(4) format the namenode (hadoop namenode -format)
(5) start the cluster
Regards,
Som Shekhar Sharma
Don't you think using flume would be easier. Use hdfs sink and use a
property to roll out the log file every hour.
By doing this way you use a single flume agent to receive logs as and when
it is generating and you will be directly dumping to hdfs.
If you want to remove unwanted logs you can write
Writing a custom input format would be much easier and you would have
better control
You might be tempted to use Jackson lib to do the process json but that
requires that you need to know your Json data and this assumption would
break if your format of data changes
I would suggest write a custom
of map tasks in this case and
if you want to have single output file then
(1) Set the mapred.min.split.size=Equal to file size or some bigger value
like Long.MAX_VALUE. It will spawn only one mapper task and you will get
one output file
}
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Feb 20
Which input format you are using . Use xml input format.
On 3 Jan 2014 10:47, Ranjini Rathinam ranjinibe...@gmail.com wrote:
Hi,
Need to convert XML into text using mapreduce.
I have used DOM and SAX parser.
After using SAX Builder in mapper class. the child node act as root
Element.
Run fsck command
On 8 Feb 2014 23:03, Raj Hadoop hadoop...@yahoo.com wrote:
Hi,
Is there a hadoop command to determine the replication factor of a hdfs
file ? Please advise.
I know that fs setrep only changes the replication factor.
Regards,
Raj
to run the task tracker on
a machine which is not there in include file.
I have worked with the new property as well mapred.jobtracker.hosts.file
, but it didnt worked as well.
Please advise me.
Regards,
Som Shekhar Sharma
+91-8197243810
this property is not recognized by the job
tracker.
Please advise me how to commission a task tracker
Regards,
Som Shekhar Sharma
+91-8197243810
U mean property name?
Regards,
Som Shekhar Sharma
+91-8197243810
On Mon, Jan 27, 2014 at 9:14 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
I think the file name is mapred.include
On Mon, Jan 27, 2014 at 9:11 PM, Shekhar Sharma shekhar2...@gmail.comwrote:
Hello,
I am using apache hadoop
/configuration
My include file has the entry of the machine...
Regards,
Som Shekhar Sharma
+91-8197243810
On Mon, Jan 27, 2014 at 9:28 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
Sorry for incomplete reply
In hadoop 1.2/1.0 , following is the property
property
namemapred.hosts
WHEN u put the data or write into HDFS, 64kb of data is written on client
side and then it is pushed through pipeline and this process continue till
64mb of data is written which is the block size defined by the client.
While on the other hand scp will try to buffer the entire data. Passing
Run fsck command
hadoop fsck 《path》-files -blocks -locations
On 25 Jan 2014 08:04, ch huang justlo...@gmail.com wrote:
hi,maillist:
this morning nagios alert hadoop has corrupt block ,i checked
it use
hdfs dfsadmin -report ,from it output ,it did has corrupt blocks
We have the concept of short circuit reads which directly reads from data
node which improve read performance. Do we have similar concept like short
circuit writes
On 25 Jan 2014 16:10, Harsh J ha...@cloudera.com wrote:
There's a lot of difference here, although both do use TCP underneath,
but
Incompatible name space Id error.its because dat u might have formatted the
namenode but data nodes folder is still have the same I'd.
What is the value of the following property
dfs. Data.dir
dfs. name.dir
hadoop.tmp.dir
The value of these properties is directory on local file system
Solution
nO of map tasks is determined by number of input splits.you can change the
NUM Of map tasks by changing the input split size
But you can set to num of reducertasks explicitly
On 21 Jan 2014 20:25, xeon xeonmailingl...@gmail.com wrote:
Hi,
I want to set the number of map tasks in the Wordcount
It is running on local file system file:///
Regards,
Som Shekhar Sharma
+91-8197243810
On Wed, Dec 25, 2013 at 7:01 PM, Vishnu Viswanath
vishnu.viswanat...@gmail.com wrote:
Hi,
I am getting this error while starting the datanode in my slave system.
I read the JIRA HDFS-2515, it says
of org.apache.hadoop.mapreduce.lib
Check out the XMLINputFormat.java, which package of FileInputFormat
they have used...
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Dec 17, 2013 at 12:55 PM, Ranjini Rathinam
ranjinibe...@gmail.com wrote:
Hi,
I am using hadoop 0.20 version
In that while
problem. In
case if you face any difficulty please feel free to contact
Regards,
K Som Shekhar Sharma
+91-8197243810
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Dec 17, 2013 at 5:42 PM, Ranjini Rathinam
ranjinibe...@gmail.com wrote:
Hi,
I have attached the code. Please verify
cluster. You will for sure see the performance
improvement
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Dec 17, 2013 at 6:42 PM, Devin Suiter RDX dsui...@rdx.com wrote:
Nikhil,
One of the problems you run into with Hadoop in Virtual Machine
environments is performance issues when
subfolders dfs and mapred.
the /home/training/hadoop/mapred folder will be on HDFS also
Hope this clears
Regards,
Som Shekhar Sharma
+91-8197243810
On Mon, Dec 16, 2013 at 1:42 PM, Dieter De Witte drdwi...@gmail.com wrote:
Hi,
Make sure to also set mapred.local.dir to the same set of output
Seems like DataNode is not running or went dead
Regards,
Som Shekhar Sharma
+91-8197243810
On Mon, Dec 16, 2013 at 1:40 PM, Geelong Yao geelong...@gmail.com wrote:
Hi Everyone
After I upgrade the hadoop to CDH 4.2.0 Hadoop 2.0.0,I try to running some
test
When I try to upload file to HDFS
HDFS is batch mode suitable for OLAP not for OLTP..
OLTP requires the data need to be updated, deleted , basically CRUD
operation..But HDFS doesnt support random writes..so not suitable for
OLTP...And thats where the nosql databases comes in
Regards,
Som Shekhar Sharma
+91-8197243810
On Sun
First Option: Put the jar in $HADOOP_HOME/lib folder
And then run hadoop classpath command on your terminal to check
whether the jar has been added
Second OPtion: PUt the jar path in HADOOP_CLASSPATH variable (
hadoop-env.sh file) and restart your cluster..
Regards,
Som Shekhar Sharma
+91
, now you
can use a xml parser to parse the contents into an java object and
form your own key value pairs ( custom key and custom value)
Hope you have enough pointers to write the code.
Regards,
Som Shekhar Sharma
+91-8197243810
On Mon, Dec 9, 2013 at 6:30 PM, Ranjini Rathinam ranjinibe
then he will again contact NN via heart beat
signal and this process continuess...
Check how does writing happens in HDFS?
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Sep 26, 2013 at 3:41 PM, Karim Awara karim.aw...@kaust.edu.sa wrote:
Hi,
I have a couple of questions about
mapred.min.split.size? have you changed to
something else or is it default which is 1?
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Sep 26, 2013 at 4:39 PM, Viji R v...@cloudera.com wrote:
Hi,
Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.
Regards,
Viji
(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)
max(minSPlitSize, min(maxSPlitsize, blocksize))
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
dwivedishash...@gmail.com wrote:
just try giving -Dmapred.tasktracker.map.tasks.maximum=1
: java.net.UnknownHostException: bdatadev
edit your /etc/hosts file
Regards,
Som Shekhar Sharma
+91-8197243810
On Sat, Aug 31, 2013 at 2:05 AM, Narlin M hpn...@gmail.com wrote:
Looks like I was pointing to incorrect ports. After correcting the port
numbers,
conf.set(fs.defaultFS, hdfs
Can you please check whether are you able to access HDFS using java
API..and also able to run MR Job.
Regards,
Som Shekhar Sharma
+91-8197243810
On Sat, Aug 31, 2013 at 7:08 PM, Narlin M hpn...@gmail.com wrote:
The server_address that was mentioned in my original post is not
pointing
Is the hash code of that key is negative.?
Do something like this
return groupKey.hashCode() Integer.MAX_VALUE % numParts;
Regards,
Som Shekhar Sharma
+91-8197243810
On Fri, Aug 30, 2013 at 6:25 AM, Adeel Qureshi adeelmahm...@gmail.com wrote:
okay so when i specify the number of reducers
Are your trying to find the java process under a node...Then simple
thing would be to do ssh and run jps command to get the list of java
process
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 29, 2013 at 12:27 PM, suneel hadoop
suneel.bigd...@gmail.com wrote:
Hi All,
what im trying
);
}
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 29, 2013 at 12:15 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
Probably a very stupid question.
I have this data in binary format... and the following piece of code works
for me in normal java.
public classparser
Put that user in hadoop group...
And if the user wants to hadoop client, then the use should be aware
of two properties fs.default.name which is the address of NameNode and
mapred.job.tracker which is the address of job tracker
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 29, 2013
Go inside the $HADOOP_HOME/log/user/history...
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 29, 2013 at 10:13 AM, Hadoop Raj hadoop...@yahoo.com wrote:
Hi Kate,
Where can I find the task attempt log? Can you specify the location please?
Thanks,
Raj
On Aug 28, 2013, at 7:13 PM
You can get the stats for a job using rumen.
http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M mgo...@gmail.com wrote:
Harsh: thanks for the quick response.
we often see
did in Master machine
Once you start the cluster by running the command start-all.sh, check the
ports 54310 and 54311 got opened by running the command netstat
-tuplen..it will show whether ports are opened or not
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 8, 2013 at 4:57 PM
-set-using-vms.html
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 8, 2013 at 5:21 PM, Shekhar Sharma shekhar2...@gmail.comwrote:
if you have removed this property from the slave machines then your DN
information will be created under /tmp folder and once you reboot your data
node
Use CEP tool like Esper and Storm, you will be able to achieve that
...I can give you more inputs if you can provide me more details of what
you are trying to achieve
Regards,
Som Shekhar Sharma
+91-8197243810
On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin vboylin1...@gmail.com wrote:
Hi Niels
Disable the firewall on data node and namenode machines..
Regards,
Som Shekhar Sharma
+91-8197243810
On Wed, Aug 7, 2013 at 11:33 PM, Jitendra Yadav
jeetuyadav200...@gmail.comwrote:
Your hdfs name entry should be same on master and databnodes
* namefs.default.name/name*
*valuehdfs://cloud6
use mapred.tasktracker.reduce.tasks in mapred-site.xml
the default value is 2...Which means that on this task tracker it will not
run more than 2 reducer tasks at any given point of time..
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com
Slots are decided upon the configuration of machines, RAM etc...
Regards,
Som Shekhar Sharma
+91-8197243810
On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote:
Hi Dears,
Can I specify how many slots to use for reduce?
I know we can specify reduces tasks, but is there one
Its warning not error...
Create a directory and then do ls ( In your case /user/hduser is not
created untill and unless for the first time you create a directory or put
some file)
hadoop fs -mkdir sample
hadoop fs -ls
I would suggest if you are getting pemission problem,
please check the
After starting i would suggest always check whether your NameNode and job
tracker UI are working or not and check the number of live nodes in both of
the UI..
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Jul 23, 2013 at 10:41 PM, Ashish Umrani ashish.umr...@gmail.comwrote:
Thanks
hadoop jar wc.jar fully qualified driver name inputdata outputdestination
Regards,
Som Shekhar Sharma
+91-8197243810
On Tue, Jul 23, 2013 at 10:58 PM, Ashish Umrani ashish.umr...@gmail.comwrote:
Jitendra, Som,
Thanks. Issue was in not having any file there. Its working fine now.
I am
the replica. THis information is sent
to client and then client starts writing the data on to the data nodes by
forming pipe line..And then client write that much amount of data onto a
block which he has set.
Regards,
Som Shekhar Sharma
+91-8197243810
On Sun, Jul 14, 2013 at 4:24 PM, Harsh J ha
machine, how can i
determine that what would be the optimal number of map and reducer slots
for this machine
Regards,
Som Shekhar Sharma
+91-8197243810
47 matches
Mail list logo