Hi,
I want to broadcast some data to all nodes under Hadoop 0.20.2. I tested
DistributedCache module. Unfortunately, it was time-consuming
and runtime is important for my work.
I want to write a MR job so that a copy of input data are generated in
output of all reducers.
Is that possible? How?
I
So you are trying to run a single reducer on each machine, and all input
data regardless of its location gets streamed to each reducer?
On Thu, Aug 23, 2012 at 10:41 AM, Hamid Oliaei oli...@gmail.com wrote:
Hi,
I want to broadcast some data to all nodes under Hadoop 0.20.2. I tested
Sorry to ask too many questions, but it will help the user list best offer
you advice, as this is not a typical MR use case.
- Do you foresee the reducer store the data on a local files system to the
machine?
- Do you need to use specific input formats for the job, or is it really
just text
Hi,
First of all, thank you Tim for giving your time.
The answer of first question is yes.
My inputs are in format of triples (sub,pre,obj) and they are stored on
the HDFS.
The problem is: After running some MR jobs,some data generated in all
machines and I want to each machine send part of
Then I think you might be best exploring running a getmerge on each
client. How you trigger that is up to you, but something like Fabric [1]
might help. Others might propose different solutions, but it doesn't sound
like MR is a natural choice to me.
I would expect this is the very fastest way
Hi,
I take a look to that, hope it can be useful for my purpose.
Thank you so much.
Hamid
Hi All,
In Sqoop:
When exporting from HDFS to DB, If an export map task fails due to these or
other reasons, it will cause the export job to fail. The results of a
failed export are undefined. Each export map task operates in a separate
transaction. Furthermore, individual map tasks commit their
Hamid,
I would recommend taking a relook at your current algorithm and making sure
you are utilizing the MR framework to its strengths. You can evaluate
having multiple passes for your map reduce program, or doing a map side
join. You mention runtime is important for your system, so make sure you
I don't think so. The client is responsible for deleting the resource
before, if it might exist.
Correct me if I am wrong.
Higher solution (such as Cascading) usually provides a way to define a
strategy to handle it : KEEP, REPLACE, UPDATE ...
I think this specific behavior irritates a lot of new users. We may as
well provide a Generic Option to overwrite the output directory if
set. That way, we at least help avoid typing a whole delete command.
If you agree, please file an improvement request against MAPREDUCE
project on the ASF JIRA.
Well, I'm using the MultipleOutputs capability to create a directory
Structure with Dates.
So I'm managing this myself.
What I've found, and I could be doing this wrong... is that I still have to
tell the Tool that I want to use a:
TextOutputFormat or a FileOutputFormat, and then, have to tell
Daniel,
Perhaps you want your OutputFormat set as NullOutputFormat. That does
not carry any checks for output directory pre-existence.
On Thu, Aug 23, 2012 at 9:47 PM, Daniel Hoffman
hoffmandani...@gmail.com wrote:
Well, I'm using the MultipleOutputs capability to create a directory
Structure
Thanks for the prompt reply!
Unfortunately, it's not that small.
I'm using the new API; are map side joins accomplished using
http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/contrib/utils/join/package-summary.html?
Are there any examples which use this package or map side
Hi all,
we are seeing strange behaviour of JobTracker in the following scenario:
- job finishes map phase and starts reduce
- after the shuffle phase of all reducers we loose a tasktracker, that
doesn't run any reducer - so all remaining reducers are still running in
the reduce phase
- map
Hi,
There is good example here:
http://hadoopchicago.com/tips-tricks/custom-xmlreader-boris-lublinsky-michael-segel/
Regards,
Dino Kečo
msn: xdi...@hotmail.com
mail: dino.k...@gmail.com
skype: dino.keco
phone: +387 61 507 851
On Thu, Aug 23, 2012 at 11:56 AM, Siddharth Tiwari
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
jan.lukav...@firma.seznam.cz wrote:
Hi all,
we are seeing strange behaviour of JobTracker in the following scenario:
- job finishes map phase and starts reduce
- after the
Hi,
sorry I forgot to mention. We are using cdh3u3.
Jan
On 23.8.2012 12:08, Harsh J wrote:
Hey Jan,
What version/distribution of Hadoop are you noticing this on?
On Thu, Aug 23, 2012 at 2:55 PM, Jan Lukavský
jan.lukav...@firma.seznam.cz wrote:
Hi all,
we are seeing strange behaviour of
Install Pig: http://pig.apache.org/docs/r0.10.0/start.html
Install Hive:
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallingHivefromaStableRelease
These blog posts should help you to get started after that.
Thank you very much.
On Tue, Aug 21, 2012 at 11:46 PM, nagarjuna kanamarlapudi
nagarjuna.kanamarlap...@gmail.com wrote:
Dear Mahsa,
Yes what you have observed is defined to happen that way.
On a single node cluster -- everything is local. There is network transfer
and every thing else
Hi All,
While trying to build hadoop source code in eclipse using maven following the
instructions on -
http://wiki.apache.org/hadoop/EclipseEnvironment
I noticed that the project layout has changed in the latest development
version, as such the instruction didn't quite match. I was wondering
I just ran through the same thing. The addition for me was an additional Import
step, but pointing at the hadoop-yarn-project, and then grabbing all the
projects from that.
With this I drop to the command line for building with maven.
I've tried to work with m2eclipse and this, but so far with
Hey Adam,
I use m2e and it seems to work pretty well for me. Of course, I do not
look for a perfectly clean project state (some projects show build
issues), and rely on CLI maven commands when I need to compile
something properly. But as a reference/editor, using m2e seems to work
just fine.
On
Hi Pravin,
Studying Hadoop or MapReduce can look a daunting task if you get your hand
dirty at the start.
Some of the prerequisites for learning Hadoop are having a good experience in
Java. Good Analytical skills help a lot as well and final secret sauce for
being successful is – you need to
Hey Harsh,
I came across a video on building hadoop source code on cloudera site, but it
was using Ant (on an older project layout). If you're able to use m2eclipse,
would you like to make a similar video post or document it somewhere.
The other issue I ran into - the unit tests weren't
Please see http://hadoop.apache.org/common/mailing_lists.html. You should send
an email to user-unsubscr...@hadoop.apache.org to unsubscribe.
HTH,
+Vinod
On Aug 23, 2012, at 5:43 AM, sathyavageeswaran wrote:
Once in hadoop, no free exit
From: msridha...@inautix.co.in
Hi Pravin,
I have installation instructions on my blog:
hadoopway.blogspot.com
Regards,
Serge
From: Keith Wiley kwi...@keithwiley.commailto:kwi...@keithwiley.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Thu, 23
Hey Pravin,
I am highly recommend you to start through Big Data
University[www.bigdatauniversity.com].It have covered all basics for
hadoop and hadoop architecture.
Thanks
Vignesh
On 8/23/12, Serge Blazhiyevskyy serge.blazhiyevs...@nice.com wrote:
Hi Pravin,
I have installation
Hi,
I am curious about interpretation of the output from iostat on a datanode
during a M/R run.I want to understand how to diagnosis a disk i/o issue in
a hadoop cluster.
Is there any good documentation to help me understand the results from
iostats in Hadoop context ?
Here are the iostat
After sending this message I issued the iostat -dxm 5 command on the
DNs the %util column shows 70-80 average value sometimes going up to
90-100 for few seconds
Does this mean the disk is becoming the bottleneck ? or is this normal ?
On Thu, Aug 23, 2012 at 3:14 PM, Himanish Kushary
I have map-side join example here
http://askhadoop.blogspot.com/2011/12/map-side-join_27.html
It is a great way to load data into memory on multiple machines
Regards,
Serge
On 8/23/12 3:57 PM, Michael Parker michael.g.par...@gmail.com wrote:
Actually, I was able to do some tricks and
That would work, but wouldn't a much simpler solution just be to force the
machines in the cluster to always pass around their external FQDNs, since those
will properly resolve to the internal or external IP depending on what machine
is asking? Is there no way to just do that?
On Aug 23,
Hi, all
There are many users in hadoop platform. Can they install their own
hadoop version on the same clusters platform?
I tried to do this but failed. There exsited a user account and the user
install his hadoop. I create another account and install his hadoop. The
logs display ERROR
Hi,
Do your users want different versions of Hadoop? Or can they share the same
hadoop cluster and schedule their jobs? If the latter, Hadoop can be
configured to run for multiple users, and each user can submit their data
and jobs to the same cluster. Hence you can maintain a single cluster and
Hi Igor,
I don't think theres anything in Hadoop thats going to allow you to have an
internal IP assigned to a machines network interface and to have it
advertise the external IP. Even if that were in place, you'd then have to
differentiate between requests coming from the other nodes in the
You might also want to look at Hadoop On Demand.
http://hadoop.apache.org/common/docs/r0.17.0/hod.html
But I would not recommend to make one cluster per user.
Regards
Bertrand
On Fri, Aug 24, 2012 at 5:50 AM, Sonal Goyal sonalgoy...@gmail.com wrote:
Hi,
Do your users want different
hi,
I have doc files in msword doc and docx format. These have entries which are
seperated by an empty line. Is it possible for me to read these lines separated
from empty lines at a time. Also which inpurformat shall I use to read doc
docx. Please help
**
Cheers !!!
36 matches
Mail list logo