Hamid,
I would recommend taking a relook at your current algorithm and making sure
you are utilizing the MR framework to its strengths. You can evaluate
having multiple passes for your map reduce program, or doing a map side
join. You mention runtime is important for your system, so make sure you
Hi All,
In Sqoop:
When exporting from HDFS to DB, If an export map task fails due to these or
other reasons, it will cause the export job to fail. The results of a
failed export are undefined. Each export map task operates in a separate
transaction. Furthermore, individual map tasks commit their c
Hi,
I take a look to that, hope it can be useful for my purpose.
Thank you so much.
Hamid
Then I think you might be best exploring running a getmerge on each
client. How you trigger that is up to you, but something like Fabric [1]
might help. Others might propose different solutions, but it doesn't sound
like MR is a natural choice to me.
I would expect this is the very fastest way o
Hi,
First of all, thank you Tim for giving your time.
The answer of first question is yes.
My inputs are in format of triples (sub,pre,obj) and they are stored on
the HDFS.
The problem is: After running some MR jobs,some data generated in all
machines and I want to each machine send part of that
Sorry to ask too many questions, but it will help the user list best offer
you advice, as this is not a typical MR use case.
- Do you foresee the reducer store the data on a local files system to the
machine?
- Do you need to use specific input formats for the job, or is it really
just text files?
exactly!!
So you are trying to run a single reducer on each machine, and all input
data regardless of its location gets streamed to each reducer?
On Thu, Aug 23, 2012 at 10:41 AM, Hamid Oliaei wrote:
> Hi,
>
> I want to broadcast some data to all nodes under Hadoop 0.20.2. I tested
> DistributedCache modu
Hi,
I want to broadcast some data to all nodes under Hadoop 0.20.2. I tested
DistributedCache module. Unfortunately, it was time-consuming
and runtime is important for my work.
I want to write a MR job so that a copy of input data are generated in
output of all reducers.
Is that possible? How?
I m