Re: more reduce tasks

2013-01-04 Thread Pavel Hančar
Hello, thank you for the answer. Exactly: I want the parallelism but a single final output. What do you mean by "another stage"? I thought I should setmapred.reduce.tasks large enough and hadoop will run the reducers in so many rounds it will be optimal. But it isn't the case. When I tried to r

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Glen Mazza
Actually, those instructions are for Hadoop 0.24, not 2.0.2-alpha. Glen On 11/30/2012 03:40 PM, Cristian Cira wrote: Dear Glen, try http://blog.cloudera.com/blog/2011/11/building-and-deploying-mr2/ Cristian Cira Graduate Research Assistant Parallel Architecture and System Laboratory(PASL) She

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Chen He
0.24 is the developing version, 2.0.2 is the publication version. 0.23 or above is published as 2.0.x On Fri, Jan 4, 2013 at 5:45 AM, Glen Mazza wrote: > Actually, those instructions are for Hadoop 0.24, not 2.0.2-alpha. > > Glen > > > On 11/30/2012 03:40 PM, Cristian Cira wrote: > >> Dear Glen

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Glen Mazza
Thanks for your response. That's pretty vital information--I'm not used to separate development/publication versions. Is the 0.23-->2.0.2 renumbering stated anywhere on the Hadoop website or wiki? It's very confusing as people otherwise think there are three separate branches -- 0.2x.y, 1.x,

Possible to run an application jar as a hadoop daemon?

2013-01-04 Thread Krishna Rao
Hi al, I have a java application jar that converts some files and writes directly into hdfs. If I want to run the jar I need to run it using "hadoop jar ", so that it can access HDFS (that is running "java -jar results in a HDFS error"). Is it possible to run an jar as a hadoop daemon? Cheers,

Re: Writing a sequence file

2013-01-04 Thread bejoy . hadoop
Hi Peter Did you ensure that using SequenceFileOutputFormat from the right package? Based on the API you are using, mapred or mapreduce you need to use the OutputFormat from the corresponding package. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- Fr

Re: Writing a sequence file

2013-01-04 Thread Peter Cogan
Hi Bejoy, ah yes that is exactly the mistake I was making, I had import org.apache.hadoop.mapred.SequenceFileOutputFormat; instead of import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat; On Fri, Jan 4, 2013 at 4:04 PM, wrote: > ** > Hi Peter > > Did you ensure that using

RE: Hadoop throughput question

2013-01-04 Thread Artem Ervits
John, the two programs below, one is from the Definitive Guide chapter 4 with slight mods and the other is in-house but similar to Hadoop in Action chap 3. package sequencefileprocessor; // cc SequenceFileReadDemo Reading a SequenceFile import java.io.IOException; import java.net.URI; import or

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Chen He
Hi Glen I agree with you. There are many versions and confusing. If you really want to know, you can check the developing and publishing documents. In 0.23's document, they must announced how many patches are included, as well as the 2.0.x. Then you will exactly understand which one is which one.

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Glen Mazza
OK, looking at the Hadoop branches: http://svn.apache.org/viewvc/hadoop/common/branches/ and tags: http://svn.apache.org/viewvc/hadoop/common/tags/, it's thankfully not that bad. There is no 0.24 in Hadoop, and the Maven pom files for the 2.0.2-alpha branch indeed say 2.0.2-alpha. The Cloudera

Hello and request some advice.

2013-01-04 Thread Cristian Carranza
Hi all in this list! My name is Cristián Carranza, a statistician and quality consultant that for the second time, intends to learn Hadoop and Big Data related issues. I’am requesting advice in order to plan my learning. I read the page “ Products that include Apache Hadoop or derivative works a

Re: Hello and request some advice.

2013-01-04 Thread Nitin Pawar
- Is Ubuntu a good O.S. for running Hadoop? I’ve tried to learn in the past using Red Hat & Infosphere Biginsights, but I need a free O.S. If you want a free O.S , ubuntu is good but if you are familiar with RedHat then you may want to have look at Scientific Linux (Its free as well) - Is there a

Re: Hello and request some advice.

2013-01-04 Thread Jay Vyas
for the basics, all you need is a java IDE . Hadoop Map/Reduce can run in local filesystem mode without any kind of HDFS backing. On Fri, Jan 4, 2013 at 12:45 PM, Nitin Pawar wrote: > - Is Ubuntu a good O.S. for running Hadoop? I’ve tried to learn in the > past using Red Hat & Infosphere Bigin

Re: Hello and request some advice.

2013-01-04 Thread Gangadhar Ramini
Hi Nitin, I tried latest stable Hadoop version on windows with cygwin, I see following error in JobTracker logs. Do you have any advice? C:\cygwin\home\garamini\hadoop-1.0.4\logs\history to 0755^M at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)^M at org.ap

Re: Hello and request some advice.

2013-01-04 Thread Nitin Pawar
Does your user have permissions to read/write on the dfs directories you made? try changing the directory ownerships to the user which is running hadoop. On Fri, Jan 4, 2013 at 11:20 PM, Gangadhar Ramini wrote: > Hi Nitin, > >I tried latest stable Hadoop version on windows with cygwin, I se

Re: Hello and request some advice.

2013-01-04 Thread Gangadhar Ramini
Yes user owns the directory and had right permissions, still i don't understand what could be the issue. ls -ltr ~/hadoop-1.0.4/logs/history total 0 drwxr-xr-x+ 1 garamini mkgroup 0 Jan 2 22:15 Thanks -Gangadhar On Fri, Jan 4, 2013 at 9:55 AM, Nitin Pawar wrote: > Does your user have

RE: Hello and request some advice.

2013-01-04 Thread John Lilley
If you like RedHat, consider Centos also; it is a nearly-complete clone of the RHEL distro. John From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Friday, January 04, 2013 10:46 AM To: user@hadoop.apache.org Subject: Re: Hello and request some advice. - Is Ubuntu a good O.S. for running

Re: Hello and request some advice.

2013-01-04 Thread Gangadhar Ramini
Following is the configuration i put in config. core-site.xml hadoop.tmp.dir /usr/local/hadoop/datastore/hadoop-${user.name} hdfs-site.xml dfs.name.dir C:/cygwin/dfs/logs dfs.data.dir C:/cygwin/dfs/data Thanks -Gangadhar On Fri, Jan 4, 2013 at 10:19 AM, N

RE: Hello and request some advice.

2013-01-04 Thread Rajeev Yadav
Hi john,which would be a better option between Linux and windows from learning  perspective of Hadoop? --- On Fri, 4/1/13, John Lilley wrote: From: John Lilley Subject: RE: Hello and request some advice. To: "user@hadoop.apache.org" Date: Friday, 4 January, 2013, 6:12 PM If you like

RE: Hello and request some advice.

2013-01-04 Thread John Lilley
I personally find Windows easier to use, however it is not a supported Hadoop production environment, and I *think* you have to use Cygwin under Windows even for development. Given that, if you want to use a Windows machine and performance is not a consideration, you could spin up a VirtualBox V

Re: Hello and request some advice.

2013-01-04 Thread Michael Segel
Uhm... Well, you can talk to Microsoft and Hortonworks about Microsoft as a platform. Depending on the power of your laptop, you could create a VM and run hadoop in a pseudo distributed mode there. You could also get an Amazon Web Services account and build a small cluster via EMR... In ter

sporadic failure

2013-01-04 Thread Stan Rosenberg
Hi, Any ideas why a staging directory would suddenly become unavailable after the completion of the map phase but before the start of the reduce phase? We noticed a sporadic failure yesterday wherein all the map tasks completed successfully and all the reduce tasks failed. Upon examining task tr

Re: Hello and request some advice.

2013-01-04 Thread Glen Mazza
I would say Linux, because in your job you're most likely going to use a *nix-type system instead of Windows for hosting Hadoop, so it's good to gain experience with whatever headaches come along. Further, you're also learning Linux simultaneously, killing two birds with one stone. Glen On 0

Gridmix version 1.0.4 Error

2013-01-04 Thread Sean Barry
Hi, I am trying to use grid mix but I keep getting the error that is shown below. Does anyone have some suggestions. Thanks in advance. Sean Barry hostname:gridmix seanbarry$ pwd /usr/local/hadoop-1.0.4/contrib/gridmix hostname:gridmix seanbarry$ java -cp /usr/local/hadoop-1.0.4/contrib/grid

Re: Possible to run an application jar as a hadoop daemon?

2013-01-04 Thread Robert Molina
Hi Krishna, Do you simply want to schedule the job to run at specific times? If so, I believe oozie maybe what you are looking for. Regards, Robert On Fri, Jan 4, 2013 at 6:40 AM, Krishna Rao wrote: > Hi al, > > I have a java application jar that converts some files and writes directly > into

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Robert Evans
It is very long and confusing. Here is my understanding of what happened even though I was not around for all of it. 0.1 - 0.20 was mostly main line development. At that point there was a split and 0.20 was forked to add security, 0.20-security, and also to add in append support for H-BASE 0.2

Re: Instructions on how to run Apache Hadoop 2.0.2-alpha?

2013-01-04 Thread Glen Mazza
Wow. Thanks for the explanation, very helpful. Glen On 01/04/2013 06:28 PM, Robert Evans wrote: It is very long and confusing. Here is my understanding of what happened even though I was not around for all of it. 0.1 - 0.20 was mostly main line development. At that point there was a split

Re: sporadic failure

2013-01-04 Thread Harsh J
Hi Stan, I'd check the NN audit logs for the file /user/apache/.staging/ job_201211150255_237458/job.xml to see when/who deleted it away, perhaps that would give more insight. On Sat, Jan 5, 2013 at 2:32 AM, Stan Rosenberg wrote: > Hi, > > Any ideas why a staging directory would suddenly become

Re: Gridmix version 1.0.4 Error

2013-01-04 Thread Harsh J
Hi Sean, Two questions: Why are you running this in local mode? Placing a cluster's config directory on your java -cp will make it go distributed. And, does that reported output directory really exist? If so, you may want to delete it before you run GridMix. On Sat, Jan 5, 2013 at 3:55 AM, Sean

Re: Possible to run an application jar as a hadoop daemon?

2013-01-04 Thread Harsh J
Hi, On Fri, Jan 4, 2013 at 8:10 PM, Krishna Rao wrote: > If I want to run the jar I need to run it using "hadoop jar jar>", so that it can access HDFS (that is running "java -jar jar> results in a HDFS error"). The latter is because running a Hadoop program requires Hadoop dependencies and con

Re: more reduce tasks

2013-01-04 Thread Harsh J
What do you mean by a "final reduce"? Not all jobs require that the final output result be singular, since the reducer phase is provided to work on a per-partition basis (also why the files are named part-*). One job consists of only one reduce phase, wherein the reducers all work independently and