CDH4 version questions

2013-02-19 Thread jing wang
Hi User, We're facing the challenge of which hadoop version to choose. We prefer to CDH4, but have a few qustions: 1.Are MRV1 and MRV2 sharing the same hdfs? If so, can MRV1 upgrade to MRV2 smoothly? 2.If using MRV1, should our m/r code basing CDH3 be changed? 3.Is MRV1 stable enough to

Re: Contribute to Hadoop Community

2013-02-19 Thread Alexander Alten-Lorenz
Hi, http://wiki.apache.org/hadoop/HowToContribute I forget to post this before. Cheers, Alex On Feb 19, 2013, at 8:23 AM, Varsha Raveendran wrote: > Thank you very much for a quick response. > > > On Tue, Feb 19, 2013 at 12:12 PM, Alexander Alten-Lorenz > wrote: > Hey, > > Thank you

Re: product recommendations engine

2013-02-19 Thread Sofia Georgiakaki
Good morning, Myrrix provides a Recommender that implements a specific recommendation algorithm based on matrix factorization, which is generally efficient in most cases. However, depending on your data and access pattern, it may be better to use Mahout as well, as it provides many different Re

Re: Database insertion by HAdoop

2013-02-19 Thread Mohammad Tariq
Hello Masoud, So you want to pull your data from SQL server to your Hadoop cluster first and then do the processing. Please correct me if I am wrong. You can do that using Sqoop as mention by Hemanth sir. BTW, what exactly is the kind of processing which you are planning to do on your data.

Re: Piping output of hadoop command

2013-02-19 Thread Julian Wissmann
Hi, Thanks a lot! hadoop fs -cat did the trick. Julian 2013/2/18 Harsh J : > Hi, > > The command you're looking for is not -copyToLocal (it doesn't really > emit the file, which you seem to need here), but rather a simple -cat: > > Something like the below would make your command work: > > $ had

Re: Database insertion by HAdoop

2013-02-19 Thread Masoud
Dear Tariq No, exactly in opposite way, actually we compute the similarity between documents and insert them in database, in every table almost 2/000/000 records. Best Regards On 02/19/2013 06:41 PM, Mohammad Tariq wrote: Hello Masoud, So you want to pull your data from SQL server to

Which class or method is called first when i run a command in hadoop

2013-02-19 Thread Agarwal, Nikhil
Hi All, Thanks for your answers till now. I was trying to debug Hadoop commands. I just wanted to know that when I run any command say dfs or jar, then which method is called first. If I set a breakpoint then it executes till there but I do not come to know what all methods have been already ex

Re: Which class or method is called first when i run a command in hadoop

2013-02-19 Thread Manoj Babu
Hi Nikhil, Have a look inside the script file named hadoop inside hadoop bin folder. for example: C:\cygwin\home\hadoop-0.20.2\bin sample code: elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar Cheers! Manoj. On Tue, Feb 19, 2013 at 4:53 PM, Agarwal, Nikhil wrote: >

Re: CDH4 version questions

2013-02-19 Thread Arun C Murthy
Pls ask CDH lists. On Feb 19, 2013, at 12:03 AM, jing wang wrote: > Hi User, > > We're facing the challenge of which hadoop version to choose. > We prefer to CDH4, but have a few qustions: > 1.Are MRV1 and MRV2 sharing the same hdfs? If so, can MRV1 upgrade to MRV2 > smoothly? > 2.If u

ClassNotFoundException in Main

2013-02-19 Thread Fatih Haltas
Hi everyone, I know this is the common mistake to not specify the class adress while trying to run a jar, however, although I specified, I am still getting the ClassNotFound exception. What may be the reason for it? I have been struggling for this problem more than a 2 days. I just wrote differen

Re: CDH4 version questions

2013-02-19 Thread David Boyd
Jing: We are using CDH4.1.1 and CDH3 and the base Apache Hadoop on several different efforts. My best cut at answering your questions is below however the CDH lists will likely have better information on some of them. On 2/19/2013 3:03 AM, jing wang wrote: Hi User, We're facing the chall

Re: Namenode formatting problem

2013-02-19 Thread Keith Wiley
Hmmm, okay. Thanks. Umm, is this a Yarn thing because I also tried it with Hadoop 2.0 MR1 (which I think should behave almost exactly like older versions of Hadoop) and it had the exact same problem. Does H2.0MR1 us journal nodes? I'll try to read up more on this later today. Thanks for the

Re: Database insertion by HAdoop

2013-02-19 Thread Hemanth Yamijala
Sqoop can be used to export as well. Thanks Hemanth On Tuesday, February 19, 2013, Masoud wrote: > Dear Tariq > > No, exactly in opposite way, actually we compute the similarity between > documents and insert them in database, in every table almost 2/000/000 > records. > > Best Regards > > On 0

Trouble in running MapReduce application

2013-02-19 Thread Fatih Haltas
Hi everyone, I know this is the common mistake to not specify the class adress while trying to run a jar, however, although I specified, I am still getting the ClassNotFound exception. What may be the reason for it? I have been struggling for this problem more than a 2 days. I just wrote differen

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
Have you used the Api setJarByClass in your main program? http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/Job.html#setJarByClass(java.lang.Class) On Tuesday, February 19, 2013, Fatih Haltas wrote: > Hi everyone, > > I know this is the common mistake to not specify the class

Re: ClassNotFoundException in Main

2013-02-19 Thread Fatih Haltas
Thanks Hemanth for your reply. Unfortunately, I used in mycode. Also to control my system I tried to run Wordcount example of apache.hadoop itself by just changing the package info, this one also did not work On Tue, Feb 19, 2013 at 8:02 PM, Hemanth Yamijala wrote: > Have you used the Api setJ

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
Sorry. I did not read the mail correctly. I think the error is in how the jar has been created. The classes start with root as wordcount_classes, instead of org. Thanks Hemanth On Tuesday, February 19, 2013, Hemanth Yamijala wrote: > Have you used the Api setJarByClass in your main program? > >

RE: Namenode formatting problem

2013-02-19 Thread Vijay Thakorlal
Hi Keith, When you run the format command on the namenode machine it actually starts the namenode, formats it then shuts it down (see: http://hadoop.apache.org/docs/stable/commands_manual.html). Before you run the format command do you see any processes already listening on port 9212 via netstat -

Re: ClassNotFoundException in Main

2013-02-19 Thread Fatih Haltas
Thank you very much. When i tried with wordcount_classes.org.myorg.WordCount, I am getting the following error: [hadoop@ADUAE042-LAP-V project]$ hadoop jar wordcount_19_02.jar wordcount_classes.org.myorg.WordCount /home/hadoop/project/hadoop-data/NetFlow 19_02_wordcount.out Warning: $HADOOP_HOME i

Re: Trouble in running MapReduce application

2013-02-19 Thread Harsh J
Your point (4) explains the problem. The jar packed structure should look like the below, and not how it is presently (one extra top level dir is present): META-INF/ META-INF/MANIFEST.MF org/ org/myorg/ org/myorg/WordCount.class org/myorg/WordCount$TokenizerMapper.class org/myorg/WordCount$IntSumR

Re: Namenode formatting problem

2013-02-19 Thread Harsh J
Hey Keith, I'm guessing whatever "ip-13-0-177-110" is resolving to (ping to check), is not what is your local IP on that machine (or rather, it isn't the machine you intended to start it on)? Not sure if EC2 grants static IPs, but otherwise a change in the assigned IP (checkable via ifconfig) wou

Re: Namenode formatting problem

2013-02-19 Thread Harsh J
To simplify my previous post, your IPs for the master/slave/etc. in /etc/hosts file should match the ones reported by "ifconfig" always. In proper deployments, IP is static. If IP is dynamic, we'll need to think of some different ways. On Tue, Feb 19, 2013 at 9:53 PM, Harsh J wrote: > Hey Keith,

Re: Trouble in running MapReduce application

2013-02-19 Thread Fatih Haltas
Thank you very much Harsh, Now, as I promised earlier I am much obliged to you. But, now I solved that problem by just changing the directories then again creating a jar file of org. but I am getting this error: 1.) What I got -

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
I am not sure if that will actually work, because the class is defined to be in the org.myorg package. I suggest you repackage to reflect the right package structure. Also, the error you are getting seems to indicate that you aphave compiled using Jdk 7. Note that some versions of Hadoop are suppo

Re: ClassNotFoundException in Main

2013-02-19 Thread Fatih Haltas
Yes i reorganized the packages but still i am getting same error my hadoop version is 1.0.4 19 Şubat 2013 Salı tarihinde Hemanth Yamijala adlı kullanıcı şöyle yazdı: > I am not sure if that will actually work, because the class is defined to > be in the org.myorg package. I suggest you repackage

Newbie: HBase good for Tree like structure?

2013-02-19 Thread José Feiteirinha
Dear all, I hope this is the right place for this question. I'm currently in the starting stages of developing a software that may 'explode' in terms of users and data. I'm considering a very basic tree-like data-structure and would like to know your thoughts regarding HBase/Hadoop. My reason is

Re: ClassNotFoundException in Main

2013-02-19 Thread Hemanth Yamijala
Thats because the error is not related to the packaging. As mentioned in my last mail, downgrade the java version used for compiling your code from Jdk 7 to Jdk 6. If you are using an IDE, it will have an option to set the target compilation version. Google will help. On Tuesday, February 19, 201

JUint test failing in HDFS when building Hadoop from source.

2013-02-19 Thread Leena Rajendran
Hi, I am posting for the first time. Please let me know if this needs to go to any other mailing list. I am trying to build Hadoop from source code, and I am able to successfully build until the Hadoop-Common-Project. However, in case of HDFS the test called "TestHftpURLTimeouts" is failing inter

Re: InputFormat for some REST api

2013-02-19 Thread Robert Evans
I don't know of any input format that will do this out of the box. But it should not be that hard to write one. There are two big issues here. 1. the data you are reading form the API really needs to be static, or you could get some very odd inconsistencies. For example a node dies after a

Re: InputFormat for some REST api

2013-02-19 Thread Mohammad Tariq
Good points sir. Specially the second one. How the splits will get generated? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans wrote: > I don't know of any input format that will do this out of the box. But it > should not be t

Re: Trouble in running MapReduce application

2013-02-19 Thread Harsh J
Hi, The new error usually happens if you compile using Java 7 and try to run via Java 6 (for example). That is, an incompatibility in the runtimes for the binary artifact produced. On Tue, Feb 19, 2013 at 10:09 PM, Fatih Haltas wrote: > Thank you very much Harsh, > > Now, as I promised earlier I

Re: Trouble in running MapReduce application

2013-02-19 Thread Harsh J
Oops. I just noticed Hemanth has been answering on a dupe thread as well. Lets drop this thread and carry on there :) On Tue, Feb 19, 2013 at 11:14 PM, Harsh J wrote: > Hi, > > The new error usually happens if you compile using Java 7 and try to > run via Java 6 (for example). That is, an incompa

Re: Trouble in running MapReduce application

2013-02-19 Thread Fatih Haltas
Thank you all very much 19 Şubat 2013 Salı tarihinde Harsh J adlı kullanıcı şöyle yazdı: > Oops. I just noticed Hemanth has been answering on a dupe thread as > well. Lets drop this thread and carry on there :) > > On Tue, Feb 19, 2013 at 11:14 PM, Harsh J wrote: > > Hi, > > > > The new error us

Re: Correct way to unzip locally an archive in Yarn

2013-02-19 Thread Robert Evans
Yes if you can trace this down I would be very interested. We are running 0.23.6 without any issues, but that does not mean that there is not some bug in the code that is causing this to happen in your situation. --Bobby From: Sebastiano Vigna mailto:vi...@di.unimi.it>> Reply-To: "user@hadoop.

Re: InputFormat for some REST api

2013-02-19 Thread Yaron Gonen
Thanks, and excellent points. I just wanted to know if someone is working this way and if it is a common use-case. On Tue, Feb 19, 2013 at 7:39 PM, Mohammad Tariq wrote: > Good points sir. Specially the second one. How the splits will get > generated? > > Warm Regards, > Tariq > https://mtariq.

webapps/ CLASSPATH err

2013-02-19 Thread Keith Wiley
This is Hadoop 2.0, but using the separate MR1 package (hadoop-2.0.0-mr1-cdh4.1.3), not yarn. I formatted the namenode ("./bin/hadoop namenode -format") and saw no errors in the shell or in the logs/[namenode].log file (in fact, simply formatting the namenode doesn't even create the log file y

Re: webapps/ CLASSPATH err

2013-02-19 Thread Harsh J
Hi Keith, The webapps/hdfs bundle is present at $HADOOP_PREFIX/share/hadoop/hdfs/ directory of the Hadoop 2.x release tarball. This should get on the classpath automatically as well. What "bin/hadoop-daemon.sh" script are you using, the one from the MR1 "aside" tarball or the chief hadoop-2 one?

Re: InputFormat for some REST api

2013-02-19 Thread Alex Thieme
Are there examples detailing how to write input formats, record readers and related classes? I was hoping to write one against a Redis database and it seems that shares similar issues to accessing data from a rest API. Alex Thieme athi...@athieme.com 508-361-2788 On Feb 19, 2013, at 1:34 PM, R

Re: webapps/ CLASSPATH err

2013-02-19 Thread Keith Wiley
On Feb 19, 2013, at 11:43 , Harsh J wrote: > Hi Keith, > > The webapps/hdfs bundle is present at > $HADOOP_PREFIX/share/hadoop/hdfs/ directory of the Hadoop 2.x release > tarball. This should get on the classpath automatically as well. Hadoop 2.0 Yarn does indeed have a share/ dir but Hadoop 2.

Re: Newbie: HBase good for Tree like structure?

2013-02-19 Thread Wellington Chevreuil
Hi José, I think your structure is ok to define HBase row keys. The main issue you`ll have then is row you`ll be able to build these keys, so that you can properly access your tree nodes. Regarding your scalability concerns, you should not worry to start with a small Hadoop/Hbase cluster (even st

copy chunk of hadoop output

2013-02-19 Thread jamal sasha
Hi, I was wondering in the following command: bin/hadoop dfs -copyToLocal hdfspath localpath can we have specify to copy not full but like xMB's of file to local drive? Is something like this possible Thanks Jamal

Re: copy chunk of hadoop output

2013-02-19 Thread Harsh J
You can instead use 'fs -cat' and the 'head' coreutil, as one example: hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha wrote: > Hi, > I was wondering in the following command: > > bin/hadoop dfs -copyToLocal hdfspath localpath > can

Re: copy chunk of hadoop output

2013-02-19 Thread jamal sasha
Awesome thanks :) On Tue, Feb 19, 2013 at 2:14 PM, Harsh J wrote: > You can instead use 'fs -cat' and the 'head' coreutil, as one example: > > hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file > > On Wed, Feb 20, 2013 at 3:38 AM, jamal sasha > wrote: > > Hi, > > I was wonderin

Re: 答复: 答复: 答复: some ideas for QJM and NFS

2013-02-19 Thread Todd Lipcon
Yes, I agree with what Liang said. You can look at the sync time metrics on the JournalNodes and in the NameNode to determine whether this is indeed responsible for the throughput loss. We ran tests at scale (100+ node) and saw no degradation in performance when running QJM vs running NFS based HA

RE: Namenode formatting problem

2013-02-19 Thread Marcin Mejran
The issue may be that the nodes are trying to use the ec2 public ip (which would be used for external access) to access each other which does not work (or doesn't work trivially). You need to use the private ips which are given by ifconfig. ec2 gives you static ips as long as you don't restart

Re: Namenode formatting problem

2013-02-19 Thread Azuryy Yu
I want to update my answer, if you don't configure QJM HA in your hadoop-2.0.3, then just ignore my reply. Thanks. On Tue, Feb 19, 2013 at 11:09 PM, Keith Wiley wrote: > Hmmm, okay. Thanks. Umm, is this a Yarn thing because I also tried it > with Hadoop 2.0 MR1 (which I think should behave al

Re: JUint test failing in HDFS when building Hadoop from source.

2013-02-19 Thread Hemanth Yamijala
Hi, In the past, some tests have been flaky. It would be good if you can search jira and see whether this is a known issue. Else, please file it, and if possible, provide a patch. :) Regarding whether this will be a reliable build, it depends a little bit on what you are going to use it for. For

OutOfMemoryError during reduce shuffle

2013-02-19 Thread Shivaram Lingamneni
I'm experiencing the following crash during reduce tasks: https://gist.github.com/slingamn/04ff3ff3412af23aa50d on Hadoop 1.0.3 (specifically I'm using Amazon's EMR, AMI version 2.2.1). The crash is triggered by especially unbalanced reducer inputs, i.e., when one reducer receives too many record