Re: What else can be built on top of YARN.

2013-05-29 Thread Krishna Kishore Bonagiri
Hi Rahul, It is at least because of the reasons that Vinod listed that makes my life easy for porting my application on to YARN instead of making it work in the Map Reduce framework. The main purpose of me using YARN is to exploit the resource management capabilities of YARN. Thanks, Kishore

Re: What else can be built on top of YARN.

2013-05-29 Thread Vinod Kumar Vavilapalli
Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(H

Re: Reading json format input

2013-05-29 Thread Rahul Bhattacharjee
Whatever you have mentioned Jamal should work.you can debug this. Thanks, Rahul On Thu, May 30, 2013 at 5:14 AM, jamal sasha wrote: > Hi, > For some reason, this have to be in java :( > I am trying to use org.json library, something like (in mapper) > JSONObject jsn = new JSONObject(value.to

Re: Reading json format input

2013-05-29 Thread Michael Segel
You have the entire string. If you tokenize on commas ... Starting with : >> {"author":"foo", "text": "hello"} >> {"author":"foo123", "text": "hello world"} >> {"author":"foo234", "text": "hello this world"} You end up with {"author":"foo",and "text":"hello"} So you can ignore the first t

Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-29 Thread Robert Molina
Hi Neeraj, This error doesn't look to be kerberos related initially. Can you verify if 192.168.49.51 has the tasktracker process running? Regards, Robert On Tue, May 28, 2013 at 7:58 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > The error looks a little low level , network level .

Re: Reading json format input

2013-05-29 Thread Rishi Yadav
for that, you have to only write intermediate data if word = "text" String[] words = line.split("\\W+"); for (String word : words) { if (word.equals("text")) context.write(new Text(word), new IntWritable(1)); } I am assuming you have huge volume of data for it, otherwise Map

Re: Reading json format input

2013-05-29 Thread jamal sasha
Hi Rishi, But I dont want the wordcount of all the words.. In json, there is a field "text".. and those are the words I wish to count? On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav wrote: > Hi Jamal, > > I took your input and put it in sample wordcount program and it's working > just fine and

Re: Reading json format input

2013-05-29 Thread jamal sasha
Hi, For some reason, this have to be in java :( I am trying to use org.json library, something like (in mapper) JSONObject jsn = new JSONObject(value.toString()); String text = (String) jsn.get("text"); StringTokenizer itr = new StringTokenizer(text); But its not working :( It would be better t

Re: Reading json format input

2013-05-29 Thread Rishi Yadav
Hi Jamal, I took your input and put it in sample wordcount program and it's working just fine and giving this output. author 3 foo234 1 text 3 foo 1 foo123 1 hello 3 this 1 world 2 When we split using String[] words = input.split("\\W+"); it takes care of all non-alphanumeric characters. Tha

Re: Reading json format input

2013-05-29 Thread Michael Segel
Yeah, I have to agree w Russell. Pig is definitely the way to go on this. If you want to do it as a Java program you will have to do some work on the input string but it too should be trivial. How formal do you want to go? Do you want to strip it down or just find the quote after the text par

Re: Reading json format input

2013-05-29 Thread Russell Jurney
Seriously consider Pig (free answer, 4 LOC): my_data = LOAD 'my_data.json' USING com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; words = FOREACH my_data GENERATE $0#'author' as author, FLATTEN(TOKENIZE($0#'text')) as word; word_counts = FOREACH (GROUP words BY word) GENERATE group AS

Reading json format input

2013-05-29 Thread jamal sasha
Hi, I am stuck again. :( My input data is in hdfs. I am again trying to do wordcount but there is slight difference. The data is in json format. So each line of data is: {"author":"foo", "text": "hello"} {"author":"foo123", "text": "hello world"} {"author":"foo234", "text": "hello this world"}

Re: Writing data in db instead of hdfs

2013-05-29 Thread Mohammad Tariq
Hello Jamal, Yes, it is possible. You could use TableReducer to do that. Use it instead of the normal reducer in your wordcount example. Alternatively you could use HFileOutputFormat to write directly to HFiles. Warm Regards, Tariq cloudfront.blogspot.com On Thu, May 30, 2013 at 2:08 AM, ja

Writing data in db instead of hdfs

2013-05-29 Thread jamal sasha
Hi, Is it possible to save data in database (HBase, cassandra??) directly from hadoop. so that there is no output in hdfs but that it directly writes data into this db? If I want to modify wordcount example to achive this, what/where should I made these modifications. Any help/ suggestions. Tha

Re: What else can be built on top of YARN.

2013-05-29 Thread Viral Bajaria
There is a project at Yahoo which makes it possible to run Storm on Yarn. I think the team behind it is going to give a talk at Hadoop Summit and plan to open source it after that. -Viral On Wed, May 29, 2013 at 11:04 AM, John Conwell wrote: > Storm, a distributed realtime computation framework

Re: Help: error in hadoop build

2013-05-29 Thread Ted Yu
What's the output of: protoc --version You should be using 2.4.1 Cheers On Wed, May 29, 2013 at 11:33 AM, John Lilley wrote: > Sorry if this is a dumb question, but I’m not sure where to start. I am > following BUILDING.txt instructions for source checked out today using git: > > > git

Help: error in hadoop build

2013-05-29 Thread John Lilley
Sorry if this is a dumb question, but I'm not sure where to start. I am following BUILDING.txt instructions for source checked out today using git: git clone git://git.apache.org/hadoop-common.git Hadoop Following build steps and adding -X for more logging: mvn compile -X But I get this error i

Re: What else can be built on top of YARN.

2013-05-29 Thread John Conwell
Two scenarios I can think of are re-implementations of Twitter's Storm ( http://storm-project.net/) and DryadLinq ( http://research.microsoft.com/en-us/projects/dryadlinq/). Storm, a distributed realtime computation framework used for analyzing realtime steams of data, doesn't really need to be po

Re: What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
Thanks for the response Krishna. I was wondering if it were possible for using MR to solve you problem instead of building the whole stack on top of yarn. Most likely its not possible , thats why you are building it . I wanted to know why is that ? I am in just trying to find out the need or why

Re: OpenJDK?

2013-05-29 Thread Lenin Raj
Yup. Thats right. Thanks, Lenin On Wed, May 29, 2013 at 10:23 PM, John Lilley wrote: > Great, that’s what I’ve done. At least I think so. This is JRE6 right?* > *** > > ** ** > > # java -version > > java version "1.6.0_43" > > Java(TM) SE Runtime Environment (build 1.6.0_43-b01)

RE: OpenJDK?

2013-05-29 Thread John Lilley
Great, that's what I've done. At least I think so. This is JRE6 right? # java -version java version "1.6.0_43" Java(TM) SE Runtime Environment (build 1.6.0_43-b01) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode) john From: Lenin Raj [mailto:emaille...@gmail.com] Sent: Wednesday

Unsubscribe

2013-05-29 Thread sahil soni
Unsubscribe

Re: OpenJDK?

2013-05-29 Thread Lenin Raj
Yes. Use Sun/Oracle JDK I have had memory issues while using Oozie. When I replaced OpenJDK with Sun JDK 6. the memory issue was resolved. Thanks, Lenin On Wed, May 29, 2013 at 8:22 PM, John Lilley wrote: > I am having trouble finding a definitive answer about OpenJDK vs Sun JDK > in regards

Re: What else can be built on top of YARN.

2013-05-29 Thread Krishna Kishore Bonagiri
Hi Rahul, I am porting a distributed application that runs on a fixed set of given resources to YARN, with the aim of being able to run it on a dynamically selected resources whichever are available at the time of running the application. Thanks, Kishore On Wed, May 29, 2013 at 8:04 PM, Rahu

OpenJDK?

2013-05-29 Thread John Lilley
I am having trouble finding a definitive answer about OpenJDK vs Sun JDK in regards to building Hadoop. This: http://wiki.apache.org/hadoop/HadoopJavaVersions Indicates that OpenJDK is not recommended, but is that an authoritative answer? BUILDING.txt states no preference. Thanks John

Reduce side question on MR

2013-05-29 Thread Rahul Bhattacharjee
Hi, I have one question related to the reduce phase of MR jobs. The intermediate outputs of map tasks are pulled in from the nodes which ran map tasks to the node where reducers is going to run and those intermediate data is written to the reducers local fs. My question is that if there is a job

What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
Hi all, I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too. I am not able to think of any other parallel processing use case that woul

Re: Please help me with heartbeat storm

2013-05-29 Thread Philippe Signoret
This might be relevant: https://issues.apache.org/jira/browse/MAPREDUCE-4478 "There are two configuration items to control the TaskTracker's heartbeat interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is* mapreduce.tasktracker.outofband.heartbeat.damper*. If we set * mapreduc