hi all...
i'm using hadoop-17.2 version.. i'm tryin to cluster 3 nodes.
when i run applns, map tasks gets over and reduce tasks just halts..
And i get this Too-many fetch failures..
[EMAIL PROTECTED] hadoop-0.17.2.1]# bin/hadoop jar word/word.jar
org.myorg.WordCount input output2
08/10/01
I am trying to plan out my map-reduce implementation and I have some questions
of where computation should be split in order to take advantage of the
distributed nodes.
Looking at the architecture diagram
(http://hadoop.apache.org/core/images/architecture.gif), are the map boxes the
major
Hi Terrence,
It really depends on your job I think. Often reduce steps can be the
bottleneck if you want a single output file (one reducer).
Hope this helps.
Alex
On Wed, Oct 1, 2008 at 10:17 AM, Terrence A. Pietrondi
[EMAIL PROTECTED] wrote:
I am trying to plan out my map-reduce
I normally find the intermediate stage of copying data to the reducers
from the mappers to be a significant step - but that's not over the best
quality switches...
The mappers and reducers work on the same boxes, close to the data.
On Wed, 2008-10-01 at 10:59 -0700, Alex Loddengaard wrote:
On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote:
I am trying to plan out my map-reduce implementation and I have some
questions of where computation should be split in order to take
advantage of the distributed nodes.
Looking at the architecture diagram
On Oct 1, 2008, at 11:07 AM, Per Jacobsson wrote:
I ran a job last night with Hadoop 0.18.0 on EC2, using the standard
small
AMI. The job was producing gzipped output, otherwise I haven't
changed the
configuration.
The final reduce steps failed with this error that I haven't seem
before:
I've collected the syslogs from the failed reduce jobs. What's the best way
to get them to you? Let me know if you need anything else, I'll have to shut
down these instances some time later today.
Overall I've run this same job before with no problems. The only change is
the added gzip of the
Yes, this is exactly what I'm seeing. To be honest, I don't know which
LZO native library it should be looking for. The LZO install dropped
liblzo2.la and liblzo2.a in my /usr/local/lib directory, but not a
file with a .so extension. Hardcoding would be fine as a temporary
solution, but I
So to be distributed in a sense, you would want to do your computation on the
disconnected parts of data in the map phase I would guess?
Terrence A. Pietrondi
http://del.icio.us/tepietrondi
--- On Wed, 10/1/08, Arun C Murthy [EMAIL PROTECTED] wrote:
From: Arun C Murthy [EMAIL PROTECTED]
On Oct 1, 2008, at 12:54 PM, Nathan Marz wrote:
Yes, this is exactly what I'm seeing. To be honest, I don't know
which LZO native library it should be looking for. The LZO install
dropped liblzo2.la and liblzo2.a in my /usr/local/lib directory,
but not a file with a .so extension.
On Oct 1, 2008, at 12:04 PM, Per Jacobsson wrote:
I've collected the syslogs from the failed reduce jobs. What's the
best way
to get them to you? Let me know if you need anything else, I'll have
to shut
down these instances some time later today.
Could you please attach them to the
Attached to the ticket. Hope this helps.
/ Per
On Wed, Oct 1, 2008 at 1:33 PM, Arun C Murthy [EMAIL PROTECTED] wrote:
On Oct 1, 2008, at 12:04 PM, Per Jacobsson wrote:
I've collected the syslogs from the failed reduce jobs. What's the best
way
to get them to you? Let me know if you need
I'm not sure what you mean by disconnected parts of data, but Hadoop is
implemented to try and perform map tasks on machines that have input data.
This is to lower the amount of network traffic, hence making the entire job
run faster. Hadoop does all this for you under the hood. From a user's
I have 5 running jobs, each has 2 reducers. Because I set max number of
reducers as 10 so any incoming job will be hold until some of the 5 jobs finish
and release reducer quota.
Now the problem is that an incoming job has a higher priority that I want to
pause some of the 5 jobs, let the new
Hi Edward,
You can have multiple instances of hive by pointing the hive cli to different
configs (This is very similar to the hadoop model). Take a look at
hive-default.xml in you hive instance. You can create different copies of this
file and change the following properties:
Hi Edward,
By default, the embedded version of apache derby database is used as a
metadb. You can run multiple queries against same metadb by providing a jdbc
connection (where the metadata is located) to a mysql/derby or any other
relational database in the options
hi all...
i have doubt..
If we dont specify numSplits in getsplits(), then what is the default
number of splits taken...
--
Best Regards
S.Chandravadana
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and
Dear All,
I have few questions on dfs.datanode.du.reserved property in
hadoop-site.xml configuration...
Assuming that I have dfs.datanode.du.reserved = 10GB and the partition
assigned for HDFS has already been filled up to its capacity.
(In this case, it will be Total disk size minus 10GB)
What
18 matches
Mail list logo