On 12/11/2010 10:48 PM, Varadharajan Mukundan wrote:
Hi,
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/rock/input/fair-scheduler.xml could only be replicated to 0 nodes,
instead of 1
I think none of your datanodes are actually running. why not use jps
and make sure whe
That's right.
You have to make sure the datanode is running.
If you are using the virtual machine, like Virtual-box, sometime, you should
wait for a moment until the datanode is active. seems like the performance
issue, the datanode in vm will be active after several mins.
On Sun, Dec 12, 2010 at
Hi,
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/rock/input/fair-scheduler.xml could only be replicated to 0 nodes,
> instead of 1
I think none of your datanodes are actually running. why not use jps
and make sure whether they are running. Also check the datanode log
Folks,
I'm a Hadoop newbie, and I hope this is an appropriate place to post
this question.
I'm trying to work through the initial examples. When I try to copy
files into HDFS, hadoop throws exceptions. I imagine it's something in
my configuration, but I'm at a loss to figure out what.
I
Try adding the commons-logging jar to your build path. It is available
in the lib/ folder of your Hadoop distribution.
If you use the MapReduce eclipse plugin which comes with the Hadoop
distro, it would add all required jars to create a Hadoop project
automatically (i.e. everything in lib/*.jar +
This is a compilation error I get in Eclipse ... So, I don't so how putting the
hadoop-core.jar in the lib/ directory will change the error.
Do you suggest another way of running a hadoop java program ?
The way I do it is .. Create an Eclipse project , Build Paths --> Add External
Archives: had
Thanks. Then I should definitely try that. Thanks for all the info :-)
Ed
From mp2893's iPhone
On 2010. 12. 12., at 오전 3:00, Ted Dunning wrote:
> Of course. It is just a set of Hadoop programs.
>
> 2010/12/11 edward choi
>
>> Can I operate Bixo on a cluster other than Amazon EC2?
>>
Can you try to add the jar file in your Hadoop lib directory.
On Sun, Dec 12, 2010 at 8:00 AM, Maha A. Alabduljalil
wrote:
>
> Hi all,
>
> I extended my project path with the hadoop-0.20.2-core.jar file, but I can
> see that some of the classes I need aren't there, so for example an error I
> ge
Hi all,
I extended my project path with the hadoop-0.20.2-core.jar file,
but I can see that some of the classes I need aren't there, so for
example an error I get:
" The type org.apache.commons.logging.Log cannot be resolved. It
is indirectly referenced from required .class files- i
The job history program tells you this. The syntax is hideous, but there is
a parser provided.
On Sat, Dec 11, 2010 at 8:23 AM, Mithila Nagendra wrote:
> Just curious and off topic :) How do you find the time taken by each
> reducer? What command/method do you use? I need that for my research.
It sounds like your key distribution is being reflected in the size of your
reduce tasks, thus
making some of them take much longer than the rest.
There are three solutions to this:
a) down-sample. Particularly for statistical computations, once you have
seen a thousand instances, you have seen
Of course. It is just a set of Hadoop programs.
2010/12/11 edward choi
> Can I operate Bixo on a cluster other than Amazon EC2?
>
Hi Rob,
Just curious and off topic :) How do you find the time taken by each
reducer? What command/method do you use? I need that for my research.
Thanks,
Mithila
On Sat, Dec 11, 2010 at 4:05 AM, Rob Stewart wrote:
> Hi,
>
> I have a problem with a MapReduce job I am trying to run on a 32 node
On Sat, Dec 11, 2010 at 7:41 PM, Rob Stewart
wrote:
> Sorry my fault - It's someone running a network simulator on the cluster !
>
Culprit found? *wide grin*
--
Harsh J
www.harshj.com
Sorry my fault - It's someone running a network simulator on the cluster !
Rob
On 11 December 2010 14:09, Rob Stewart wrote:
> OK, slight update:
>
> Immediately underneath public void reduce(), I have added a:
> System.out.println("Key: " + key.toString());
>
> And I am logged on a node that is
OK, slight update:
Immediately underneath public void reduce(), I have added a:
System.out.println("Key: " + key.toString());
And I am logged on a node that is still working on a reducer. However,
it stopped printing "Key:" long ago, so it is not processing new keys.
But looking more closely at
On Sat, Dec 11, 2010 at 5:25 PM, Rob Stewart
wrote:
> Oh,
>
> I should add, of the Java processes running on the remaining nodes for
> the final wave of reducers, the one taking all the CPU is the "Child"
> process (not TaskTracker). I log into the Master, and also, the Java
> process taking all t
On Sat, Dec 11, 2010 at 5:10 PM, Rob Stewart
wrote:
> Ah,
>
> that is very interesting indeed.
>
> I am running on a homogeneous cluster, where each node has 8 cores.
>
> Does that mean that Hadoop would need to be carefully configured, so
> that 8 core machines had a max.tasks value of 8, and dua
Oh,
I should add, of the Java processes running on the remaining nodes for
the final wave of reducers, the one taking all the CPU is the "Child"
process (not TaskTracker). I log into the Master, and also, the Java
process taking all the CPU is "Child".
Is this normal?
thanks,
Rob
On 11 December
Ah,
that is very interesting indeed.
I am running on a homogeneous cluster, where each node has 8 cores.
Does that mean that Hadoop would need to be carefully configured, so
that 8 core machines had a max.tasks value of 8, and dual core
machines had the value 2 ?
Very useful to know,
Rob
On 1
Hi, many thanks for your response.
A few observations:
- I know that for a fact my key distribution is quite radically skewed
(some keys with *many* value, most keys with few).
- I have overlooked the fact that I need a partitioner. I suspect that
this will help dramatically.
I realize that the n
Hi,
On Sat, Dec 11, 2010 at 4:39 PM, Rob Stewart
wrote:
> Hi,
>
> When trying to compare Hadoop against other parallel paradigms, it is
> important to consider heterogeneous systems. Some may have 100 nodes,
> each single core. Some may have 100 nodes, with 8 cores on each, and
> others may have
Hi,
Certain reducers may receive a higher share of data than others
(Depending on your data/key distribution, the partition function,
etc.). Compare the longer reduce tasks' counters with the quicker
ones.
Are you sure that the reducers that take long are definitely the last
wave, as in with IDs
Hi,
When trying to compare Hadoop against other parallel paradigms, it is
important to consider heterogeneous systems. Some may have 100 nodes,
each single core. Some may have 100 nodes, with 8 cores on each, and
others may have 5 nodes, 32 cores per node.
As Hadoop runs on JVM's on each node, a
Hi,
I have a problem with a MapReduce job I am trying to run on a 32 node cluster.
The final few reducers take a *lot* longer than the rest. e.g. If I
specify 100 reducers, the first 90 will complete in 5 minutes, and
then the remaining 10 reducers might take 10 minutes.
Same is true for any num
Excuse me but could I ask one more question?
Can I operate Bixo on a cluster other than Amazon EC2?
I already am running a Hadoop cluster of my own, so I'd like run Bixo on top
of my cluster.
But I don't see how to do it in the Bixo's "Getting Started" page.
All I see are "running locally", "runnin
I'd start with only a few rss feeds at first, but I plan to expand it to the
scale of a thousands of rss feeds every 30 minutes eventually.
That's why I am so eager to implement my system in Hadoop.
I skimmed through Nutch and Bixo but I feel that eventually I'm gonna have
to build the system from
27 matches
Mail list logo