Hi Rahul,
It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.
Thanks,
Kishore
Historically, many applications/frameworks wanted to take advantage of just the
resource management capabilities and failure handling of Hadoop (via
JobTracker/TaskTracker), but were forced to used MapReduce even though they
didn't have to. Obvious examples are graph processing (Giraph), BSP(H
Whatever you have mentioned Jamal should work.you can debug this.
Thanks,
Rahul
On Thu, May 30, 2013 at 5:14 AM, jamal sasha wrote:
> Hi,
> For some reason, this have to be in java :(
> I am trying to use org.json library, something like (in mapper)
> JSONObject jsn = new JSONObject(value.to
You have the entire string.
If you tokenize on commas ...
Starting with :
>> {"author":"foo", "text": "hello"}
>> {"author":"foo123", "text": "hello world"}
>> {"author":"foo234", "text": "hello this world"}
You end up with
{"author":"foo",and "text":"hello"}
So you can ignore the first t
Hi Neeraj,
This error doesn't look to be kerberos related initially. Can you
verify if 192.168.49.51
has the tasktracker process running?
Regards,
Robert
On Tue, May 28, 2013 at 7:58 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:
> The error looks a little low level , network level .
for that, you have to only write intermediate data if word = "text"
String[] words = line.split("\\W+");
for (String word : words) {
if (word.equals("text"))
context.write(new Text(word), new IntWritable(1));
}
I am assuming you have huge volume of data for it, otherwise Map
Hi Rishi,
But I dont want the wordcount of all the words..
In json, there is a field "text".. and those are the words I wish to count?
On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav wrote:
> Hi Jamal,
>
> I took your input and put it in sample wordcount program and it's working
> just fine and
Hi,
For some reason, this have to be in java :(
I am trying to use org.json library, something like (in mapper)
JSONObject jsn = new JSONObject(value.toString());
String text = (String) jsn.get("text");
StringTokenizer itr = new StringTokenizer(text);
But its not working :(
It would be better t
Hi Jamal,
I took your input and put it in sample wordcount program and it's working
just fine and giving this output.
author 3
foo234 1
text 3
foo 1
foo123 1
hello 3
this 1
world 2
When we split using
String[] words = input.split("\\W+");
it takes care of all non-alphanumeric characters.
Tha
Yeah,
I have to agree w Russell. Pig is definitely the way to go on this.
If you want to do it as a Java program you will have to do some work on the
input string but it too should be trivial.
How formal do you want to go?
Do you want to strip it down or just find the quote after the text par
Seriously consider Pig (free answer, 4 LOC):
my_data = LOAD 'my_data.json' USING
com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
words = FOREACH my_data GENERATE $0#'author' as author,
FLATTEN(TOKENIZE($0#'text')) as word;
word_counts = FOREACH (GROUP words BY word) GENERATE group AS
Hi,
I am stuck again. :(
My input data is in hdfs. I am again trying to do wordcount but there is
slight difference.
The data is in json format.
So each line of data is:
{"author":"foo", "text": "hello"}
{"author":"foo123", "text": "hello world"}
{"author":"foo234", "text": "hello this world"}
Hello Jamal,
Yes, it is possible. You could use TableReducer to do that. Use it
instead of the normal reducer in your wordcount example. Alternatively you
could use HFileOutputFormat to write directly to HFiles.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Thu, May 30, 2013 at 2:08 AM, ja
Hi,
Is it possible to save data in database (HBase, cassandra??) directly
from hadoop.
so that there is no output in hdfs but that it directly writes data into
this db?
If I want to modify wordcount example to achive this, what/where should I
made these modifications.
Any help/ suggestions.
Tha
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.
-Viral
On Wed, May 29, 2013 at 11:04 AM, John Conwell wrote:
> Storm, a distributed realtime computation framework
What's the output of:
protoc --version
You should be using 2.4.1
Cheers
On Wed, May 29, 2013 at 11:33 AM, John Lilley wrote:
> Sorry if this is a dumb question, but I’m not sure where to start. I am
> following BUILDING.txt instructions for source checked out today using git:
>
>
> git
Sorry if this is a dumb question, but I'm not sure where to start. I am
following BUILDING.txt instructions for source checked out today using git:
git clone git://git.apache.org/hadoop-common.git Hadoop
Following build steps and adding -X for more logging:
mvn compile -X
But I get this error i
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).
Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be po
Thanks for the response Krishna.
I was wondering if it were possible for using MR to solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?
I am in just trying to find out the need or why
Yup. Thats right.
Thanks,
Lenin
On Wed, May 29, 2013 at 10:23 PM, John Lilley wrote:
> Great, that’s what I’ve done. At least I think so. This is JRE6 right?*
> ***
>
> ** **
>
> # java -version
>
> java version "1.6.0_43"
>
> Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Great, that's what I've done. At least I think so. This is JRE6 right?
# java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
john
From: Lenin Raj [mailto:emaille...@gmail.com]
Sent: Wednesday
Unsubscribe
Yes. Use Sun/Oracle JDK
I have had memory issues while using Oozie. When I replaced OpenJDK with
Sun JDK 6. the memory issue was resolved.
Thanks,
Lenin
On Wed, May 29, 2013 at 8:22 PM, John Lilley wrote:
> I am having trouble finding a definitive answer about OpenJDK vs Sun JDK
> in regards
Hi Rahul,
I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.
Thanks,
Kishore
On Wed, May 29, 2013 at 8:04 PM, Rahu
I am having trouble finding a definitive answer about OpenJDK vs Sun JDK in
regards to building Hadoop. This:
http://wiki.apache.org/hadoop/HadoopJavaVersions
Indicates that OpenJDK is not recommended, but is that an authoritative answer?
BUILDING.txt states no preference.
Thanks
John
Hi,
I have one question related to the reduce phase of MR jobs.
The intermediate outputs of map tasks are pulled in from the nodes which
ran map tasks to the node where reducers is going to run and those
intermediate data is written to the reducers local fs. My question is that
if there is a job
Hi all,
I was going through the motivation behind Yarn. Splitting the
responsibility of JT is the major concern.Ultimately the base (Yarn) was
built in a generic way for building other generic distributed applications
too.
I am not able to think of any other parallel processing use case that woul
This might be relevant: https://issues.apache.org/jira/browse/MAPREDUCE-4478
"There are two configuration items to control the TaskTracker's heartbeat
interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is*
mapreduce.tasktracker.outofband.heartbeat.damper*. If we set *
mapreduc
28 matches
Mail list logo