This is a Graph problem, you want to find all joined sub graphs, so i don't
think it's easy using Map/Reduce.
but you can try Yarn, it can be iterated easily, at least compared with
M/R.
On Fri, Jun 14, 2013 at 12:41 PM, parnab kumar wrote:
> Consider a following input file of format :
> inpu
That looks like graph algorithms using MapReduce.
Sorry couldn't give you specific answer!
On 14/06/2013 05:41, parnab kumar wrote:
Consider a following input file of format :
input File :
1 2
2 3
3 4
6 7
7 9
10 11
The output Should be as follows :
1 2 3 4
6 7 9
10 11
Hey Parnab,
Please checkout Giraph (http://giraph.apache.org), which should help
you develop a program to solve this.
On Fri, Jun 14, 2013 at 10:11 AM, parnab kumar wrote:
> Consider a following input file of format :
> input File :
> 1 2
> 2 3
> 3 4
> 6 7
> 7 9
> 10 11
>
> The output Should be
Some flexibility is there when it comes to changing the name of the output.
Check out MultipleOutputs
Never used it with a map only job.
Thanks,
Rahul
On Thu, Jun 13, 2013 at 8:33 AM, Maysam Yabandeh wrote:
> Hi,
>
> I was wondering if it is possible in hadoop to assign the same partition
> nu
Hi,
has it ever happened that a migration of persistent data has been needed (or
automatically executed) when updating a Hadoop installation within a release?
If so, where could I find information regarding such needed migration?
I would be interested because the runtime of such migration would
Hi Björn,
> has it ever happened that a migration of persistent data has been needed (or
> automatically executed) when updating a Hadoop installation within a release?
> If so, where could I find information regarding such needed migration?
Normally, when you change the minor release, you need
Excuse the typo, should be :
Normally, when you change the >major< release, you need to upgrade HDFS
(http://hadoop.apache.org/docs/stable/hdfs_user_guide.html#Upgrade+and+Rollback).
This will happen when you switch major branches.
On Jun 14, 2013, at 12:10 PM, Alexander Alten-Lorenz
wrote:
Thanks Mayank, Any clue on why was only one disk was getting all writes.
Rahul
On Thu, Jun 13, 2013 at 11:47 AM, Mayank wrote:
> So we did a manual rebalance (followed instructions at:
> http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3
No, as of this moment we've no ideas about the reasons for that behavior.
On Fri, Jun 14, 2013 at 4:04 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:
> Thanks Mayank, Any clue on why was only one disk was getting all writes.
>
> Rahul
>
>
> On Thu, Jun 13, 2013 at 11:47 AM, Mayank wr
Hello Alexander,
thanks for your reply. This is very interesting for me indeed.
- But what about minor updates, e. g. from 1.0.1 to 1.0.4? Has this ever
happened for such updates?
Also I have got a similar question regarding HBase. I understand that HBase has
its own datamodel on top of/ withi
Rahul,
In general this issue happens some times in Hadoop. There is no exact reason
for this.To mitigate this you need to run balancer in regular intervals.
Thanks,Sandeep.
Date: Fri, 14 Jun 2013 16:39:02 +0530
Subject: Re: Application errors with one disk on datanode getting filled up to
100%
Fr
Hi Björn,
> - But what about minor updates, e. g. from 1.0.1 to 1.0.4? Has this ever
> happened for such updates?
You will probably see log messages like 'RPC version mismatch', in this case
you have to upgrade the filesystem. If not - all well :)
> - What about HBase minor releases in this co
Thanks Sandeep,
I was thinking that the overall hdfs cluster might get unbalanced over the
time and balancer might be useful in that case.
I was more interested to know why only one disk out of configured 4 disks
of the DN is getting all the writes.As per whatever I have read , writes
should be in
I wasnt aware of data node level balancer procedure , I was thinking about
the hdfs balancer .
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
Thanks,
Rahul
On Fri, Jun 14, 2013 at 5:50 PM, Rahul Bhattacharjee <
rahul.rec@gmail.co
Rahul,
In general most of the times Hadoop tries to compute data locally that is, if
run a MapReduce task on particular input, Hadoop will try compute data locally
and write data locally(Majority of times this will happen), replicate in other
nodes.
In your scenario majority of your input data m
Thanks Sandeep.
Yes , thats correct , I was more interested to know about the uneven
distribution within the DN.
Thanks,
Rahul
On Fri, Jun 14, 2013 at 6:12 PM, Sandeep L wrote:
> Rahul,
>
> In general most of the times Hadoop tries to compute data locally that is,
> if run a MapReduce task on p
An input file where each line corresponds to a document .Each document is
identfied by some fingerPrints .For example a line in the input file
is of the following form :
input:
-
DOCID1 HASH1 HASH2 HASH3 HASH4
DOCID2 HASH5 HASH3 HASH1 HASH4
The output of the mapreduce job
Hi
My quick and dirty non-optimized solution would be as follows
MAPPER
===
OUTPUT from Mapper
REDUCER
Iterate over keys
For a key = (say) {HASH1,HASH2,HASH3,HASH4}
Format the collection of values into some StringBuilder kind of class
Output
KEY = {DOCID1
Hi
My environment is like this
INPUT FILES
==
400 GZIP files , one from each server - average size gzipped 25MB
REDUCER
===
Uses MultipleOutput
OUTPUT (Snappy)
===
/path/to/output/dir1
/path/to/output/dir2
/path/to/output/dir3
/path/to/output/dir4
Number of output directories
Hi,
I'm having some trouble with webhdfs read after running a Pig job that
completed successfully.
Here are some details:
-I am using Hadoop CDH-4.1.3 and the compatible Pig that goes with this (0.10.0
I think)
-The Pig job writes out about 10 files. I'm programmatically attempting to
read e
You might want to investigate if your issue is aways on the same node.
On Fri, Jun 14, 2013 at 11:43 AM, Adam Silberstein wrote:
> Hi,
> I'm having some trouble with webhdfs read after running a Pig job that
> completed successfully.
>
> Here are some details:
>
> -I am using Hadoop CDH-4.1.3 an
I have modified dfs.data.dir from the default value to another value which is
outside 'HADOOP_HOME' and it is >/SD1/hadoop_data.
dfs.data.dir
/SD1/hadoop_data
Where DataNodes store their blocks
My data node is not starting after the above change. Can anyone tell whats the
issue?
_
Change the permissions of /SD1/hadoop_data to 755 and restart the process.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Fri, Jun 14, 2013 at 11:10 PM, Raj Hadoop wrote:
> I have modified dfs.data.dir from the default value to another value which
> is outside 'HADOOP_HOME' and it is >/SD1/had
But Tariq - /SD1/hadoop_data is in a separate group.
My hadoop is in group 'grp1'
Filesystem /SD1 is under group 'grp2'
As I have space under SD1, I want to use SD1 for hadoop.
so setting permissions like '755' on /SD1/hadoop_data -> will it work?
From: Mo
Thanks that is good to know. Is there any way to say "please fail if I don't
get the node I want?" Do I just release the container and try again?
I'd like to understand the implications of this policy. Suppose I have 1000
data splits and cluster capacity of 100 containers. If I try to schedul
Bertrand,
Thanks for taking the time to explain this!
I understand your point about contiguous blocks; they just aren't likely to
exist. I am still curious about two things:
1) The map-per-block strategy. If we have a lot more blocks than
containers, wouldn't there be some advantage to ha
Hi John,
At this time, releasing containers is the preferred way to be strict about
your locality requirements. This is not included in a release yet, but
https://issues.apache.org/jira/browse/YARN-392 allows expressing hard
locality constraints on requests, so you can tell the scheduler to never
Hi,
I am using MapReduce on YARN. I want to make tasks of the same job run
in containers with different sizes. For example: Job1 = Task8>. Task1 := 1280 MB; Task2 to 8 := 1024 MB.
To achieve this, I manually call
reqEvent.getCapability().setMemory(MEMORY_SIZE) in
RMContainerAllocator.java wit
Hi, I found the SkippingRecordReader is no longer supported in the new api and
I am curious about the reason, can anyone tell me.
Besides, when I look into the old api and try to figure out what skip mode was
doing, I am a little confused about the logic there.
In my comprehension, if java api
29 matches
Mail list logo