I have been running Hadoop on a clister set to not check permissions. I
would run a java client on my local machine and would run as the local user
on the cluster.
I say
* String connectString = hdfs:// + host + : + port + /;*
*Configuration config = new Configuration();*
*
*
*
@Tariq can you point me to some resource which shows how distcp is used to
upload files from local to hdfs.
isn't distcp a MR job ? wouldn't it need the data to be already present in
the hadoop's fs?
Rahul
On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq donta...@gmail.com wrote:
You'r
you can do that using file:///
example:
hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
@Tariq can you point me to some resource which shows how distcp is used to
upload files
@Rahul : I'm sorry as I am not aware of any such document. But you could
use distcp for local to HDFS copy :
*bin/hadoop distcp file:///home/tariq/in.txt hdfs://localhost:9000/*
*
*
And yes. When you use distcp from local to HDFS, you can't take the
pleasure of parallelism as the data is stored
@Rahul : I'm sorry I answered this on a wrong thread by mistake. You could
do that as Nitin has shown.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
you can do that using file:///
example:
hadoop distcp
Sorry for the blunder guys.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 5:39 PM, Mohammad Tariq donta...@gmail.com wrote:
@Rahul : I'm sorry as I am not aware of any such document. But you could
use distcp for local to HDFS copy :
*bin/hadoop distcp
Thanks to both of you!
Rahul
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
you can do that using file:///
example:
hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/
On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee
Hi
The concept of task slots is used in MRv1.
In the new version of Hadoop ,MRv2 uses yarn instead of slots.
You can read it from Hadoop definitive 3rd.
发自我的 iPhone
在 2013-5-12,20:11,Mohammad Tariq donta...@gmail.com 写道:
Sorry for the blunder guys.
Warm Regards,
Oh! I though distcp works on complete files rather then mappers per
datablock.
So I guess parallelism would still be there if there are multipel files..
please correct if ther is anything wrong.
Thank,
Rahul
On Sun, May 12, 2013 at 5:39 PM, Mohammad Tariq donta...@gmail.com wrote:
@Rahul :
sorry for my blunder as well. my previous post for for Tariq in a wrong
post.
Thanks.
Rahul
On Sun, May 12, 2013 at 6:03 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Oh! I though distcp works on complete files rather then mappers per
datablock.
So I guess parallelism would still
Hahaha..I think we could continue this over there..
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 6:04 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
sorry for my blunder as well. my previous post for for Tariq in a wrong
post.
Thanks.
Rahul
On Sun, May
No. distcp is actually a mapreduce job under the hood.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Thanks to both of you!
Rahul
On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
I had said that if you use distcp to copy data *from localFS to HDFS* then
you won't be able to exploit parallelism as entire file is present on a
single machine. So no multiple TTs.
Please comment if you think I am wring somewhere.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, May 12,
yeah you are right I mis read your earlier post.
Thanks,
Rahul
On Sun, May 12, 2013 at 6:25 PM, Mohammad Tariq donta...@gmail.com wrote:
I had said that if you use distcp to copy data *from localFS to HDFS*then you
won't be able to exploit parallelism as entire file is present on
a single
This is what I would say :
The number of maps is decided as follows. Since it’s a good idea to get
each map to copy a reasonable amount of data to minimize overheads in task
setup, each map copies at least 256 MB (unless the total size of the input
is less, in which case one map handles it all).
On Sun, May 12, 2013 at 12:19 AM, Nitin Pawar nitinpawar...@gmail.comwrote:
normally if you want to copy the jar then hadoop admins setu
Submit you job to Job tracker it will distribute throughout the
tasktrackers.
*Thanks Regards*
∞
Shashwat Shriparv
The user through which you are trying to run the task should jave
permission on hdfs. just verify that
*Thanks Regards*
∞
Shashwat Shriparv
On Sat, May 11, 2013 at 1:02 AM, Amal G Jose amalg...@gmail.com wrote:
After starting the hdfs, ie NN, SN and DN, create an hdfs directory
The block size is for allocation not storage on the disk.
*Thanks Regards*
∞
Shashwat Shriparv
On Fri, May 10, 2013 at 8:54 PM, Harsh J ha...@cloudera.com wrote:
Thanks. I failed to add: It should be okay to do if those cases are
true and the cluster seems under-utilized right now.
Your connection setting to Mysql may not be correct check that.
*Thanks Regards*
∞
Shashwat Shriparv
On Fri, May 10, 2013 at 6:12 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
Have your checked your connection settings to the MySQL DB? Where and how
are you passing the connection
the user through which you are running your hadoop, set permission to tmp
dir for that user.
*Thanks Regards*
∞
Shashwat Shriparv
On Fri, May 10, 2013 at 5:24 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
What are the permission of your /tmp/ folder?
On May 10, 2013 5:03 PM, Khaleel
its a /tmp/ folder so I guess all the users will need access to it. better
it make it a routine linux like /tmp folder
On Sun, May 12, 2013 at 11:12 PM, shashwat shriparv
dwivedishash...@gmail.com wrote:
the user through which you are running your hadoop, set permission to tmp
dir for that
@shashwat shriparv
Can the a hadoop job be submitted to any datanode in the cluster and not to
jobTracker.
Correct me if it I am wrong , I was told that a hadoop job can be submitted
to datanode also apart from JobTracker. Is it correct?
Advanced thanks
On Sun, May 12, 2013 at 11:02 PM,
nope
in MRv1 only jobtracker can accept jobs. You can not trigger job on any
other process in hadoop other than jobtracker.
On Sun, May 12, 2013 at 11:25 PM, Shashidhar Rao raoshashidhar...@gmail.com
wrote:
@shashwat shriparv
Can the a hadoop job be submitted to any datanode in the cluster
As nitin said , its responsibility of Jobtracker to distribute the job to
task to the tasktrackers so you need to submitt the job to the job tracker
*Thanks Regards*
∞
Shashwat Shriparv
On Sun, May 12, 2013 at 11:26 PM, Nitin Pawar nitinpawar...@gmail.comwrote:
nope
in MRv1 only
Hi,
Is there a method to build the ECLIPSE plugin using HADOOP 2.0.3?
I am looking at the details in http://wiki.apache.org/hadoop/EclipsePlugIn,
but I am not able to find any eclipse-plugin folder in the src.
Thanks and Regards,
Gourav
Which doesn't imply that you should log yourself to the physical machine
where the JobTracker is hosted. It only implies that the hadoop client must
be able to reach the JobTracker. It could be from any physical machines
hosting the slaves (DataNode, Tasktracker) but it is rarely the case.
Often,
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.
Enable by adding the class
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
The block size controls lots of things in Hadoop.
It affects read parallelism, scalability, block allocation and other
aspects of operations either directly or indirectly.
On Sun, May 12, 2013 at 10:38 AM, shashwat shriparv
dwivedishash...@gmail.com wrote:
The block size is for allocation
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.
https://github.com/lemire/javaewah
See also https://code.google.com/p/sparsebitmap/ by the same authors.
On Sun, May 12, 2013 at 1:11 PM, Bertrand
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.
Regards
Bertrand
Note to myself
Hi,
I setup a cluster with 3 nodes, and after that I did not submit any job on
it. But, after few days, I found the cluster is unhealthy:
- No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop
dfsadmin -report' for a while
- The page of 'http://namenode:50070' could not be
do you get any error when trying to connect to cluster, something like
'tried n times' or replicated 0 times.
On Sun, May 12, 2013 at 7:28 PM, sam liu samliuhad...@gmail.com wrote:
Hi,
I setup a cluster with 3 nodes, and after that I did not submit any job on
it. But, after few days, I
Got some exceptions on node3:
1. datanode log:
2013-04-17 11:13:44,719 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_2478755809192724446_1477 received exception
java.net.SocketTimeoutException: 63000 millis timeout while waiting for
channel to be ready for read. ch :
For node3, the memory is:
total used free sharedbuffers
cached
Mem: 3834 3666167 0187 1136
-/+ buffers/cache: 2342 1491
Swap: 8196 0 8196
To a 3 nodes cluster as mine, what's
35 matches
Mail list logo