MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Fengyun RAO
our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to "A.avro" files, while all B objects written to "B.avro". I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html Th

Re: Impact of Tez/Spark to MapReduce

2014-03-06 Thread Emil A. Siemes
I think it is necessary to look at the question from multiple angles: First there is MapReduce as computing paradigm. Second there is the MapReduce API. And third you have an implementation. My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for

Re: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Fengyun RAO
add avro user mail-list 2014-03-06 16:09 GMT+08:00 Fengyun RAO : > our input is a line of text which may be parsed to e.g. A or B object. > We want all A objects written to "A.avro" files, while all B objects > written to "B.avro". > > I looked into AvroMultipleOutputs class: > http://avro.apache

Re:

2014-03-06 Thread Stanley Shi
Maybe your console and browser are using different settings, would you please try "wget http://repo.maven.apache.org/maven2/org/apache/felix/maven-bundle-plugin/2.4.0/maven-bundle-plugin-2.4.0.pom "? Regards, *Stanley Shi,* On Wed, Mar 5, 2014 at 6:59 PM, Avinash Kujur wrote: > yes ming. > >

[no subject]

2014-03-06 Thread Avinash Kujur
while impoting jar files using.. mvn clean install -DskipTests -Pdist i am getting this error, [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera). Please verify you invoked Maven from the correct directory. -> [Help 1] help me ou

Re:

2014-03-06 Thread Nitin Pawar
please start writing subject lines for your emails also look at the error message [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/home/cloudera) do ls -l pom.xml inside /home/cloudera directory change directory to where your codebase is and t

MR2 Job over LZO data

2014-03-06 Thread KingDavies
Running on Hadoop 2.2.0 The Java MR2 job works as expected on an uncompressed data source using the TextInputFormat.class. But when using the LZO format the job fails: import com.hadoop.mapreduce.LzoTextInputFormat; job.setInputFormatClass(LzoTextInputFormat.class); Dependencies from the maven re

Warning in secondary namenode log

2014-03-06 Thread Vimal Jain
Hi, I am setting up 2 node hadoop cluster ( 1.2.1) After formatting the FS and starting namenode,datanode and secondarynamenode , i am getting below warning in SecondaryNameNode logs. *WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint Period :3600 secs (60 min)* Please h

Re: Warning in secondary namenode log

2014-03-06 Thread Nitin Pawar
you can ignore this on 2 node cluster. This value means time it waits between two periodic checkpoints on secondary namenode. On Thu, Mar 6, 2014 at 4:10 PM, Vimal Jain wrote: > Hi, > I am setting up 2 node hadoop cluster ( 1.2.1) > After formatting the FS and starting namenode,datanode and >

Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan
Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The first command "mvn clean install -DskipTests" was successful. However wen I run       cd hadoop-mapreduce-project    mvn clean inst

Fetching configuration values from cluster

2014-03-06 Thread John Lilley
How would I go about fetching configuration values (e.g. yarn-site.xml) from the cluster via the API from an application not running on a cluster node? Thanks John

HDFS java client vs the Command Line

2014-03-06 Thread Geoffry Roberts
All, I'm running the 2.3.0 distribution as a single node on OSX 10.7. I want to create a directory. From the command line it works; from java it doesn't. I have Googled and read bits and pieces that this is an issue with the OSX "feature" of case insensitivity with its file system. Can anyone

Re: Fw: Hadoop at ApacheCon Denver

2014-03-06 Thread Oleg Zhurakousky
Wow. . . blast from the past ;)!! How the hell are you? Cheers Oleg On Wed, Mar 5, 2014 at 10:18 AM, Melissa Warnkin wrote: > Hello Hadoop enthusiasts, > > As you are no doubt aware, ApacheCon North America will be held in > Denver, Colorado starting on April 7th. Hadoop has 25 talks and

Partitions in Hive

2014-03-06 Thread nagarjuna kanamarlapudi
Hi, I have a table with 3 columns in hive. I want that table to be partitioned based on first letter of column 1. How do we define such partition condition in hive ? Regards, Nagarjuna K

Re: HDFS java client vs the Command Line

2014-03-06 Thread Harsh J
I've never faced an issue trying to run hadoop and related programs on my OSX. What is your error exactly? Have you ensured your Java classpath carries the configuration directory on it as well, if you aren't running the program via "hadoop jar ..." but via "java -cp ..." instead. On Thu, Mar 6,

Re: Partitions in Hive

2014-03-06 Thread Nitin Pawar
partition in hive is done on the column value and not on the sub portion of column value. If you want to separate data based on the first character then create another column to store that value On Thu, Mar 6, 2014 at 11:42 PM, nagarjuna kanamarlapudi < nagarjuna.kanamarlap...@gmail.com> wrote

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan
Stuck at this step. Hope to receive any idea...   Regards, Mahmood On Thursday, March 6, 2014 6:48 PM, Mahmood Naderan wrote: Hi I have downloaded hadoop-2.3.0-src and followed the guide from http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html Th

Running a Job in a Local Job Runner:Windows 7 64-bit

2014-03-06 Thread Radhe Radhe
Hi All, I'm trying to get some hands-on on the Map Reduce programming. I downloaded the code examples from Hadoop-The definitive guide, 3rd edition and build it using Maven: mvn package -DskipTests -Dhadoop.distro=apache-2 Next I imported the maven projects into Eclipse. Using Eclipse now I can

Re: HDFS java client vs the Command Line

2014-03-06 Thread Geoffry Roberts
Thanks for the response. I figured out what was wrong. I was doing this: Configuration conf = new Configuration(); conf.addResource(new Path(F.CFG_PATH + "/core-site.xml")); conf.addResource(new Path(F.CFG_PATH + "/hdfs-site.xml")); conf.addResource(new Path(F.CFG_PATH + "/mapred-site.xml"));

Re: HDFS java client vs the Command Line

2014-03-06 Thread Harsh J
You could avoid all that code by simply placing the configuration directory on the classpath - it will auto-load necessary properties. On Thu, Mar 6, 2014 at 11:36 AM, Geoffry Roberts wrote: > Thanks for the response. I figured out what was wrong. > > I was doing this: > > Configuration conf = n

Re: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Harsh J
If you have a reducer involved, you'll likely need a common map output data type that both A and B can fit into. On Thu, Mar 6, 2014 at 12:09 AM, Fengyun RAO wrote: > our input is a line of text which may be parsed to e.g. A or B object. > We want all A objects written to "A.avro" files, while al

Re: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Fengyun RAO
thanks, Harsh. any idea on how to build a common map output data type? The only way I can think of is "toString()", which would be very inefficient, since A and B are big objects and may change with time, which is also the reason we want to use Avro serialization. 2014-03-07 9:55 GMT+08:00 Harsh

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Akira AJISAKA
Hi Mahmood, > I have downloaded hadoop-2.3.0-src and followed the guide from > http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html The documentation is still old, and you don't need to compile the source code to build a cluster. I built the latest documen

Re: MR2 Job over LZO data

2014-03-06 Thread Stanley Shi
May be you can try download the LZO class and rebuild it against Hadoop 2.2.0; If build success, you should be good to go; if failed, then maybe you need to wait for the LZO guys to update their code. Regards, *Stanley Shi,* On Thu, Mar 6, 2014 at 6:29 PM, KingDavies wrote: > Running on Hadoo

Re: Fetching configuration values from cluster

2014-03-06 Thread Stanley Shi
You can read from http://resource-manager.host.ip:8088/conf This is an xml format file you can use directly. Regards, *Stanley Shi,* On Fri, Mar 7, 2014 at 1:46 AM, John Lilley wrote: > How would I go about fetching configuration values (e.g. yarn-site.xml) > from the cluster via the API from

Re: Running a Job in a Local Job Runner:Windows 7 64-bit

2014-03-06 Thread Rakesh Davanum
Hi RR, You don't need to have the actual Hadoop daemons running on windows machince. Just install Cygwin and ensure that you have all the required Hadoop jars in the class path of your program. You can test/debug directly from the IDE itself just by saying "Run As" -> "Java Application" on the dri

RE: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Alan Paulsen
Hi Fengyun, Here's what I've done in the past when facing a similar issue: 1) Set the map output schema to a UNION of both of your target schemas, A and B. 2) Serialize the data in the mappers, using the avro datum as the value. 3) Figure out what the avro schema is for eac

Re: Running a Job in a Local Job Runner:Windows 7 64-bit

2014-03-06 Thread Harsh J
Running your Driver class from Eclipse should automatically run it in the local runner mode (as thats the default mode). You shouldn't need a local Hadoop install for this. On Thu, Mar 6, 2014 at 11:36 AM, Radhe Radhe wrote: > Hi All, > > I'm trying to get some hands-on on the Map Reduce program

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mahmood Naderan
Thanks for the update. Let me ask a question before continuing the installation. It has been stated >>To get a Hadoop distribution, download a recent stable release from one of >>the Apache Download Mirrors. Do you mean the source package or the other? hadoop-2.3.0-src.tar.gz  (14MB) hadoop-2.

how to import the hadoop code into eclipse.

2014-03-06 Thread Avinash Kujur
hi, i have downloaded the hadoop code. And executed maven command successfully. how to import hadoop source code cleanly. because its showing red exclamation mark on some of the modules while i am importing it. help me out. thanks in advance.

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen
mvn eclipse:eclipse, and then import the existing projects in eclipse. - Zhijie On Thu, Mar 6, 2014 at 9:00 PM, Avinash Kujur wrote: > hi, > > i have downloaded the hadoop code. And executed maven command > successfully. how to import hadoop source code cleanly. because its showing > red excla

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Avinash Kujur
i did that. but i have some doubt while importing code. because its showing some warning and error on imported modules. i was wondering if u could give me any proper procedure link. On Thu, Mar 6, 2014 at 9:21 PM, Zhijie Shen wrote: > mvn eclipse:eclipse, and then import the existing projects i

why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread hequn cheng
Hi~ First, i use FileSystem to open a file in hdfs. FSDataInputStream m_dis = fs.open(...); Second, read the data in m_dis to a byte array. byte[] inputdata = new byte[m_dis.available()]; //m_dis.available = 47185920 m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3); the

Re: Assertion error while builing hdoop 2.3.0

2014-03-06 Thread Mingjiang Shi
If you just want to install a cluster to play with, download the hadoop-2.3.0.tar.gz (127MB). On Fri, Mar 7, 2014 at 12:32 PM, Mahmood Naderan wrote: > hadoop-2.3.0.tar.gz (127MB) -- Cheers -MJ

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread Binglin Chang
the semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng wrote: > Hi~ > First, i use FileSystem to open a file in hdfs. > FSDataInputStream m_dis = fs.open(...); > > Second, read th

Re: MR2 Job over LZO data

2014-03-06 Thread Gordon Wang
You can try to get the source code https://github.com/twitter/hadoop-lzo and then compile it against hadoop 2.2.0. In my memory, as long as rebuild it, lzo should work with hadoop 2.2.0 On Thu, Mar 6, 2014 at 6:29 PM, KingDavies wrote: > Running on Hadoop 2.2.0 > > The Java MR2 job works as ex

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread hequn cheng
yep that did the job :) I use readFully instead and it works well~~thank you~ 2014-03-07 13:48 GMT+08:00 Binglin Chang : > the semantic of read does not guarantee read as much as possible. you need > to call read() many times or use readFully > > > On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng wr

Re: how to import the hadoop code into eclipse.

2014-03-06 Thread Zhijie Shen
ah, yes, I was experiencing some errors on the imported modules, but I fixed it myself manually. Not sure other people has encounter the same problem. Here's a link: http://wiki.apache.org/hadoop/EclipseEnvironment On Thu, Mar 6, 2014 at 9:30 PM, Avinash Kujur wrote: > i did that. but i have so

Re: App Master issue.

2014-03-06 Thread Sai Prasanna
Hi MJ, Extremely sorry for a late response...Had some infrastructure issues here... I am using Hadoop 2.3.0. Actually when i was trying to solve this AppMaster issue, i came up with a strange observation. "STICKY SLOT" of app-master to only the data node at the Master node if i set the following p

GC overhead limit exceeded

2014-03-06 Thread haihong lu
Hi: i have a problem when run Hibench with hadoop-2.2.0, the wrong message list as below 14/03/07 13:54:53 INFO mapreduce.Job: map 19% reduce 0% 14/03/07 13:54:54 INFO mapreduce.Job: map 21% reduce 0% 14/03/07 14:00:26 INFO mapreduce.Job: Task Id : attempt_1394160253524_0010_m_20_0, St

Re: MapReduce: How to output multiplt Avro files?

2014-03-06 Thread Fengyun RAO
thanks, Alan, it works! 2014-03-07 11:21 GMT+08:00 Alan Paulsen : > Hi Fengyun, > > > > Here's what I've done in the past when facing a similar issue: > > > > 1) Set the map output schema to a UNION of both of your target > schemas, A and B. > > 2) Serialize the data in the mappers, us