who runs the map and reduce tasks in the unit tests

2013-02-21 Thread Pedro Sá da Costa
Hi, In Hadoop MR unit tests, the classes uses the ./core/org/apache/hadoop/util/Tool.java, and ./core/org/apache/hadoop/util/ToolRunner.java tosubmit the job. But to run the unit tests it seems that it's not needed the MR be running. If so, who runs the map and reduce tasks? -- Best regards, P

Re: who runs the map and reduce tasks in the unit tests

2013-02-21 Thread Vinod Kumar Vavilapalli
Not sure which specific tests you are talking about. There are two types of them: - Real unit tests which unit test code, shouldn't run any MR jobs - The remaining 'unit' tests are really integration tests. They start MiniMRCluster and MiniDFSCluster (which are basically in-JVM MR and DFS)

Locks in HDFS

2013-02-21 Thread abhishek
Hello, How can I impose read lock, for a file in HDFS So that only one user (or) one application , can access file in hdfs at any point of time. Regards Abhi

Re: Locks in HDFS

2013-02-21 Thread Harsh J
HDFS does not have such a client-side feature, but your applications can use Apache Zookeeper to coordinate and implement this on their own - it can be used to achieve distributed locking. While at ZooKeeper, also checkout https://github.com/Netflix/curator which makes using it for common needs

Re: Locks in HDFS

2013-02-21 Thread abhishek
Thanks for reply harsh.I would look into Zookeeper. Regards Abhi On Feb 22, 2013, at 1:03 AM, Harsh J ha...@cloudera.com wrote: HDFS does not have such a client-side feature, but your applications can use Apache Zookeeper to coordinate and implement this on their own - it can be used to

Re: ISSUE :Hadoop with HANA using sqoop

2013-02-21 Thread samir das mohapatra
Putting whole Logs from Task now -- --- Task Logs: 'attempt_201302202127_0021_m_00_0' *stdout logs* -- *stderr logs* log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs. DFSClient).

RE: Which class or method is called first when i run a command in hadoop

2013-02-21 Thread Agarwal, Nikhil
Thanks Manoj for your answer. :) That helped. From: Agarwal, Nikhil Sent: Tuesday, February 19, 2013 4:53 PM To: 'user@hadoop.apache.org' Subject: Which class or method is called first when i run a command in hadoop Hi All, Thanks for your answers till now. I was trying to debug Hadoop

How to add another file system in Hadoop

2013-02-21 Thread Agarwal, Nikhil
Hi, I am planning to add a file system called CDMI under org.apache.hadoop.fs in Hadoop, something similar to KFS or S3 which are already there under org.apache.hadoop.fs. I wanted to ask that say, I write my file system for CDMI and add the package under fs but then how do I tell the

Re: ISSUE :Hadoop with HANA using sqoop

2013-02-21 Thread bejoy . hadoop
Hi Samir Looks like there is some syntax issue with the sql query generated internally . Can you try doing a Sqoop import by specifying the query with -query option. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: samir das mohapatra

Re: Newbie Debuggin Question

2013-02-21 Thread Sai Sai
This may be a basic beginner debug question will appreciate if anyone can pour some light: Here is the method i have in Eclipse: *** @Override     protected void setup(Context context) throws java.io.IOException,             InterruptedException {         Path[]

Re: How to add another file system in Hadoop

2013-02-21 Thread Harsh J
What Hemanth points to (fs.TYPE.impl, i.e. fs.cdmi.impl being set to the classname of the custom FS impl.) is correct for the 1.x releases. In 2.x and ahead, the class for a URI is auto-discovered from the classpath (a 'service'). So as long as your jar is present on the user's runtime, the FS

Re: How to add another file system in Hadoop

2013-02-21 Thread Ling Kun
Hi Agarwal, This repository and the corresponding README file may give you some hint for the configuration. https://github.com/gluster/hadoop-glusterfs yours, Kun Ling On Thu, Feb 21, 2013 at 9:14 PM, Ling Kun lkun.e...@gmail.com wrote: Hi Agarwal, This repository and the

Re: How to test Hadoop MapReduce under another File System NOT HDFS

2013-02-21 Thread Julien Muller
Some hints: 1) For features, you could start with unit tests available with hadoop fs. For performance, compare various bench results. 3) I could see at least 2 reasons for that. It could be that your filesystem does not support locality, so tasks are not executed on the same node as the data.

Re: compile hadoop-2.0.x

2013-02-21 Thread Azuryy Yu
it indicates 'cannot find com.google.protobuf ' On Feb 21, 2013 7:38 PM, Ted yuzhih...@gmail.com wrote: What compilation errors did you get ? Thanks On Feb 21, 2013, at 1:37 AM, Azuryy Yu azury...@gmail.com wrote: Hi , I just want to share some experience on hadoop-2.x compiling. It

Re: Newbie Debuggin Question

2013-02-21 Thread bejoy . hadoop
Hi Sai The location you are seeing should be the mapred.local.dir . From my understanding the files in distributed cache would be available in that location while you are running the job and would be cleaned up at the end of it. Regards Bejoy KS Sent from remote device, Please excuse typos

Hadoop admin training recommendation?

2013-02-21 Thread Guy Matz
Hello! Anyone in the NYC area recommend any of the hadoop training classes for administrators? Thanks a lot! Guy

Re: Using hadoop streaming with binary data

2013-02-21 Thread Jay Hacker
I was able to write a little code to make this happen, and submitted a patch to Hadoop: https://issues.apache.org/jira/browse/MAPREDUCE-5018 There is a jar file and shell script there for anybody who wants to try this without recompiling all of Hadoop. It lets you run something like mapstream

How does Kerberos work with Hadoop ?

2013-02-21 Thread rohit sarewar
I am looking for an explanation of Kerberos working with Hadoop cluster . I need to know how KDC is used by hdfs and mapred. (Something like this :- An example of kerberos with mail server , https://www.youtube.com/watch?v=KD2Q-2ToloE) How the name node and data node are prone to attacks ? What

Re: How does Kerberos work with Hadoop ?

2013-02-21 Thread Vinod Kumar Vavilapalli
You should read the hadoop security design doc which you can find at https://issues.apache.org/jira/browse/HADOOP-4487 HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote: I am looking for an explanation of Kerberos

map reduce and sync

2013-02-21 Thread Lucas Bernardi
Hello there, I'm trying to use hadoop map reduce to process an open file. The writing process, writes a line to the file and syncs the file to readers. (org.apache.hadoop.fs.FSDataOutputStream.sync()). If I try to read the file from another process, it works fine, at least using

Re: About Hadoop Deb file

2013-02-21 Thread Jean-Marc Spaggiari
Hi Mayur, Where have you downloaded the DEB files? Are they Debian related? Or Unbuntu related? Unbuntu is not worst than CentOS. They are just different choices. Both should work. JM Hi Ma 2013/2/21 Harsh J ha...@cloudera.com Try the debs from the Apache Bigtop project 0.3 release, its a bit

Re: About Hadoop Deb file

2013-02-21 Thread Jean-Marc Spaggiari
Mayur, Have you looked at that? http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ I just created a VM, installed Debian 64bits, downloaded the .deb file and installed it without any issue. Are you using Unbuntu 64bits? Or 32bits? JM 2013/2/21 Mayur

Re: Text analytics

2013-02-21 Thread Solomon McCarthy
Hi Mallika, Have your tried neo4J ? -Solomon On Fri, Feb 22, 2013 at 5:22 AM, SUJIT PAL sujit@comcast.net wrote: Hi Mallika, Couldn't this be done from the relational database itself? To get the group counts: select count(*) from your_table where your_condition group by

Re: About Hadoop Deb file

2013-02-21 Thread Mayur Patil
I am using 32 bits. I will look out for your link JM sir. On Fri, Feb 22, 2013 at 8:17 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Mayur, Have you looked at that? http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ I just created a VM,

Re: MapReduce processing with extra (possibly non-serializable) configuration

2013-02-21 Thread Public Network Services
I have considered the DistributedCache and will probably be using it, but in order to have a file to cache I need to serialize the configuration object first. :-) On Thu, Feb 21, 2013 at 5:55 PM, feng lu amuseme...@gmail.com wrote: Hi May be you can see the useage of DistributedCache [0] ,

Re: MapReduce processing with extra (possibly non-serializable) configuration

2013-02-21 Thread feng lu
yes, you are right. First upload serialized configuration file to HDFS and retrieve that file in the Mapper#configure method for each Mapper, and deserialize the file to configuration object. It seem that the configuration file serialization is required. You can find many data serialization

Re: MapReduce processing with extra (possibly non-serializable) configuration

2013-02-21 Thread Public Network Services
Hazelcast is an interesting idea, but I was hoping that there is a way of doing this in MapReduce. :-) It didn't seem like that from the start, but I posted here just to make sure I was not missing something. So, I will serialize my data objects and use them accordingly. Thanks! On Thu, Feb

Re: Please define blacklisting, graylisting, and excluded nodes in Hadoop 1.0.3

2013-02-21 Thread Dan F
I also saw some reference to being able to run hadoop job -blacklist-host or some such, but http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#job doesn't show it.

Re: OutOfMemoryError during reduce shuffle

2013-02-21 Thread Shivaram Lingamneni
Thanks. You are correct that this strategy does not achieve a total sort, only a partial/local sort, since that's all the application requires. I think the technique is sometimes referred to as secondary sort, and KeyFieldBasedPartitioner is sometimes used as a convenience to implement it, but our

Re: How to test Hadoop MapReduce under another File System NOT HDFS

2013-02-21 Thread Ling Kun
Dear Harsh J, Firstly, Thanks for your quick and detailed reply. Your suggestion is very helpful to me ! 1. For the Hadoop MapReduce regression test: 1.1 In theory, as long as I have correctly implement all the org.apache.hadoop.fs.FileSystem interface, the Hadoop MR should work correctly.

Re: How to test Hadoop MapReduce under another File System NOT HDFS

2013-02-21 Thread Ling Kun
Dear Julien Muller and Harsh, Thanks very much for all your hints. Is there any recommended applications beside wordcount and Terasort? Thanks Ling Kun On Thu, Feb 21, 2013 at 9:26 PM, Julien Muller julien.mul...@ezako.comwrote: Some hints: 1) For features, you could start with