Running Hadoop client as a different user

2013-05-12 Thread Steve Lewis
I have been running Hadoop on a clister set to not check permissions. I would run a java client on my local machine and would run as the local user on the cluster. I say * String connectString = hdfs:// + host + : + port + /;* *Configuration config = new Configuration();* * * *

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
@Tariq can you point me to some resource which shows how distcp is used to upload files from local to hdfs. isn't distcp a MR job ? wouldn't it need the data to be already present in the hadoop's fs? Rahul On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq donta...@gmail.com wrote: You'r

Re: Hadoop noob question

2013-05-12 Thread Nitin Pawar
you can do that using file:/// example: hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/ On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: @Tariq can you point me to some resource which shows how distcp is used to upload files

Re: Need help about task slots

2013-05-12 Thread Mohammad Tariq
@Rahul : I'm sorry as I am not aware of any such document. But you could use distcp for local to HDFS copy : *bin/hadoop distcp file:///home/tariq/in.txt hdfs://localhost:9000/* * * And yes. When you use distcp from local to HDFS, you can't take the pleasure of parallelism as the data is stored

Re: Hadoop noob question

2013-05-12 Thread Mohammad Tariq
@Rahul : I'm sorry I answered this on a wrong thread by mistake. You could do that as Nitin has shown. Warm Regards, Tariq cloudfront.blogspot.com On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you can do that using file:/// example: hadoop distcp

Re: Need help about task slots

2013-05-12 Thread Mohammad Tariq
Sorry for the blunder guys. Warm Regards, Tariq cloudfront.blogspot.com On Sun, May 12, 2013 at 5:39 PM, Mohammad Tariq donta...@gmail.com wrote: @Rahul : I'm sorry as I am not aware of any such document. But you could use distcp for local to HDFS copy : *bin/hadoop distcp

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
Thanks to both of you! Rahul On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote: you can do that using file:/// example: hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/ On Sun, May 12, 2013 at 5:23 PM, Rahul Bhattacharjee

Re: Need help about task slots

2013-05-12 Thread yypvsxf19870706
Hi The concept of task slots is used in MRv1. In the new version of Hadoop ,MRv2 uses yarn instead of slots. You can read it from Hadoop definitive 3rd. 发自我的 iPhone 在 2013-5-12,20:11,Mohammad Tariq donta...@gmail.com 写道: Sorry for the blunder guys. Warm Regards,

Re: Need help about task slots

2013-05-12 Thread Rahul Bhattacharjee
Oh! I though distcp works on complete files rather then mappers per datablock. So I guess parallelism would still be there if there are multipel files.. please correct if ther is anything wrong. Thank, Rahul On Sun, May 12, 2013 at 5:39 PM, Mohammad Tariq donta...@gmail.com wrote: @Rahul :

Re: Need help about task slots

2013-05-12 Thread Rahul Bhattacharjee
sorry for my blunder as well. my previous post for for Tariq in a wrong post. Thanks. Rahul On Sun, May 12, 2013 at 6:03 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Oh! I though distcp works on complete files rather then mappers per datablock. So I guess parallelism would still

Re: Need help about task slots

2013-05-12 Thread Mohammad Tariq
Hahaha..I think we could continue this over there.. Warm Regards, Tariq cloudfront.blogspot.com On Sun, May 12, 2013 at 6:04 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: sorry for my blunder as well. my previous post for for Tariq in a wrong post. Thanks. Rahul On Sun, May

Re: Hadoop noob question

2013-05-12 Thread Mohammad Tariq
No. distcp is actually a mapreduce job under the hood. Warm Regards, Tariq cloudfront.blogspot.com On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Thanks to both of you! Rahul On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

Re: Hadoop noob question

2013-05-12 Thread Mohammad Tariq
I had said that if you use distcp to copy data *from localFS to HDFS* then you won't be able to exploit parallelism as entire file is present on a single machine. So no multiple TTs. Please comment if you think I am wring somewhere. Warm Regards, Tariq cloudfront.blogspot.com On Sun, May 12,

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
yeah you are right I mis read your earlier post. Thanks, Rahul On Sun, May 12, 2013 at 6:25 PM, Mohammad Tariq donta...@gmail.com wrote: I had said that if you use distcp to copy data *from localFS to HDFS*then you won't be able to exploit parallelism as entire file is present on a single

Re: Hadoop noob question

2013-05-12 Thread Mohammad Tariq
This is what I would say : The number of maps is decided as follows. Since it’s a good idea to get each map to copy a reasonable amount of data to minimize overheads in task setup, each map copies at least 256 MB (unless the total size of the input is less, in which case one map handles it all).

Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread shashwat shriparv
On Sun, May 12, 2013 at 12:19 AM, Nitin Pawar nitinpawar...@gmail.comwrote: normally if you want to copy the jar then hadoop admins setu Submit you job to Job tracker it will distribute throughout the tasktrackers. *Thanks Regards* ∞ Shashwat Shriparv

Re: Permissions

2013-05-12 Thread shashwat shriparv
The user through which you are trying to run the task should jave permission on hdfs. just verify that *Thanks Regards* ∞ Shashwat Shriparv On Sat, May 11, 2013 at 1:02 AM, Amal G Jose amalg...@gmail.com wrote: After starting the hdfs, ie NN, SN and DN, create an hdfs directory

Re: issues with decrease the default.block.size

2013-05-12 Thread shashwat shriparv
The block size is for allocation not storage on the disk. *Thanks Regards* ∞ Shashwat Shriparv On Fri, May 10, 2013 at 8:54 PM, Harsh J ha...@cloudera.com wrote: Thanks. I failed to add: It should be okay to do if those cases are true and the cluster seems under-utilized right now.

Re: hadoop map-reduce errors

2013-05-12 Thread shashwat shriparv
Your connection setting to Mysql may not be correct check that. *Thanks Regards* ∞ Shashwat Shriparv On Fri, May 10, 2013 at 6:12 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Have your checked your connection settings to the MySQL DB? Where and how are you passing the connection

Re: Problem while running simple WordCount program(hadoop-1.0.4) on eclipse.

2013-05-12 Thread shashwat shriparv
the user through which you are running your hadoop, set permission to tmp dir for that user. *Thanks Regards* ∞ Shashwat Shriparv On Fri, May 10, 2013 at 5:24 PM, Nitin Pawar nitinpawar...@gmail.comwrote: What are the permission of your /tmp/ folder? On May 10, 2013 5:03 PM, Khaleel

Re: Problem while running simple WordCount program(hadoop-1.0.4) on eclipse.

2013-05-12 Thread Nitin Pawar
its a /tmp/ folder so I guess all the users will need access to it. better it make it a routine linux like /tmp folder On Sun, May 12, 2013 at 11:12 PM, shashwat shriparv dwivedishash...@gmail.com wrote: the user through which you are running your hadoop, set permission to tmp dir for that

Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread Shashidhar Rao
@shashwat shriparv Can the a hadoop job be submitted to any datanode in the cluster and not to jobTracker. Correct me if it I am wrong , I was told that a hadoop job can be submitted to datanode also apart from JobTracker. Is it correct? Advanced thanks On Sun, May 12, 2013 at 11:02 PM,

Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread Nitin Pawar
nope in MRv1 only jobtracker can accept jobs. You can not trigger job on any other process in hadoop other than jobtracker. On Sun, May 12, 2013 at 11:25 PM, Shashidhar Rao raoshashidhar...@gmail.com wrote: @shashwat shriparv Can the a hadoop job be submitted to any datanode in the cluster

Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread shashwat shriparv
As nitin said , its responsibility of Jobtracker to distribute the job to task to the tasktrackers so you need to submitt the job to the job tracker *Thanks Regards* ∞ Shashwat Shriparv On Sun, May 12, 2013 at 11:26 PM, Nitin Pawar nitinpawar...@gmail.comwrote: nope in MRv1 only

Eclipse Plugin: HADOOP 2.0.3

2013-05-12 Thread Gourav Sengupta
Hi, Is there a method to build the ECLIPSE plugin using HADOOP 2.0.3? I am looking at the details in http://wiki.apache.org/hadoop/EclipsePlugIn, but I am not able to find any eclipse-plugin folder in the src. Thanks and Regards, Gourav

Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread Bertrand Dechoux
Which doesn't imply that you should log yourself to the physical machine where the JobTracker is hosted. It only implies that the hadoop client must be able to reach the JobTracker. It could be from any physical machines hosting the slaves (DataNode, Tasktracker) but it is rarely the case. Often,

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Harsh J
You can perhaps consider using the experimental JavaSerialization [1] enhancement to skip transforming to Writables/other-serialization-formats. It may be slower but looks like you are looking for a way to avoid transforming objects. Enable by adding the class

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Bertrand Dechoux
In order to make the code more readable, you could start by using the methods toByteArray() and valueOf(bytes) http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29 http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 Regards Bertrand

Re: issues with decrease the default.block.size

2013-05-12 Thread Ted Dunning
The block size controls lots of things in Hadoop. It affects read parallelism, scalability, block allocation and other aspects of operations either directly or indirectly. On Sun, May 12, 2013 at 10:38 AM, shashwat shriparv dwivedishash...@gmail.com wrote: The block size is for allocation

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Ted Dunning
Another interesting alternative is the EWAH implementation of java bitsets that allow efficient compressed bitsets with very fast OR operations. https://github.com/lemire/javaewah See also https://code.google.com/p/sparsebitmap/ by the same authors. On Sun, May 12, 2013 at 1:11 PM, Bertrand

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Bertrand Dechoux
You can disregard my links as their are only valid for java 1.7+. The JavaSerialization might clean your code but shouldn't bring a significant boost in performance. The EWAH implementation has, at least, the methods you are looking for : serialize / deserialize. Regards Bertrand Note to myself

The minimum memory requirements to datanode and namenode?

2013-05-12 Thread sam liu
Hi, I setup a cluster with 3 nodes, and after that I did not submit any job on it. But, after few days, I found the cluster is unhealthy: - No result returned after issuing command 'hadoop dfs -ls /' or 'hadoop dfsadmin -report' for a while - The page of 'http://namenode:50070' could not be

Re: The minimum memory requirements to datanode and namenode?

2013-05-12 Thread Rishi Yadav
do you get any error when trying to connect to cluster, something like 'tried n times' or replicated 0 times. On Sun, May 12, 2013 at 7:28 PM, sam liu samliuhad...@gmail.com wrote: Hi, I setup a cluster with 3 nodes, and after that I did not submit any job on it. But, after few days, I

Re: The minimum memory requirements to datanode and namenode?

2013-05-12 Thread sam liu
Got some exceptions on node3: 1. datanode log: 2013-04-17 11:13:44,719 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_2478755809192724446_1477 received exception java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch :

Re: The minimum memory requirements to datanode and namenode?

2013-05-12 Thread sam liu
For node3, the memory is: total used free sharedbuffers cached Mem: 3834 3666167 0187 1136 -/+ buffers/cache: 2342 1491 Swap: 8196 0 8196 To a 3 nodes cluster as mine, what's