This looks somewhat similar to my Subtle Classloader Issue from
yesterday. I'll be watching this thread too.
Jeff
Saptarshi Guha wrote:
Hello,
I'm using some JNI interfaces, via a R. My classpath contains all the
jar files in $HADOOP_HOME and $HADOOP_HOME/lib
My class is
public
I'm trying to run the Dirichlet clustering example from
(http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html). The command
line:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-0.1.job
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
... loads our
Hi Josh,
It seemed like you had a conceptual wire crossed and I'm glad to help
out. The neat thing about Hadoop mappers is - since they are given a
replicated HDFS block to munch on - the job scheduler has replication
factor number of node choices where it can run each mapper. This means
If you send a single point to the mapper, your mapper logic will be
clean and simple. Otherwise you will need to loop over your block of
points in the mapper. In Mahout clustering, I send the mapper individual
points because the input file is point-per-line. In either case, the
record reader
Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com]
Sent: Tuesday, March 17, 2009 5:11 PM
To: core-user@hadoop.apache.org
Subject: Re: RecordReader design heuristic
If you send a single point to the mapper, your mapper logic will be
clean and simple. Otherwise you will need
,
0.17.0 was released yesterday, from what I can tell.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jeff Eastman [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Wednesday, May 21, 2008 11:18:56 AM
Subject: Re: Hadoop 0.17 AMI?
Any
I uploaded the slides from my Mahout overview to our wiki
(http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with
another recent talk by Isabel Drost. Both are similar in content but
their differences reflect the rapid evolution of the project in the
month that separates them in
find the code?
Thanks!
Tanton
On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman
[EMAIL PROTECTED] wrote:
I uploaded the slides from my Mahout overview to our wiki
(http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another
recent talk by Isabel Drost. Both are similar in content
been released yet. I (or Mukund) is hoping to call
a vote this afternoon or tomorrow.
Nige
On May 14, 2008, at 12:36 PM, Jeff Eastman wrote:
I'm trying to bring up a cluster on EC2 using
(http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
version to use because of the DNS
I'm trying to bring up a cluster on EC2 using
(http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
version to use because of the DNS improvements, etc. Unfortunately, I
cannot find a public AMI with this build. Is there one that I'm not
finding or do I need to create one?
Jeff
My experience running with the Java API is that subdirectories in the input
path do cause an exception, so the streaming file input processing must be
different.
Jeff Eastman
-Original Message-
From: Norbert Burger [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 01, 2008 9:46 AM
I don't know if there was a live version, but the entire summit was recorded
on video so it will be available. BTW, it was an overwhelming success and
the speakers are all well worth waiting for. I personally got a lot of
positive feedback and interest in Mahout, so expect your inbox to explode in
21, 2008 2:36 PM
To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question
3 - the default one...
Jeff Eastman wrote:
What's your replication factor?
Jeff
-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21
-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question
Right, I totally forgot about the replication factor... However
sometimes I even noticed ratios of 5:1 for block numbers to files...
Is the delay for block deletion/reclaiming an intended behavior?
Jeff Eastman wrote
Consider that your mapper and driver execute in different JVMs and cannot
share static values.
Jeff
-Original Message-
From: ma qiang [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 15, 2008 10:35 PM
To: core-user@hadoop.apache.org
Subject: why the value of attribute in map function
The key provided by the default FileInputFormat is not Text, but an
integer offset into the split(which is not very usful IMHO). Try
changing your mapper back to WritableComparable, Text. If you are
expecting the file name to be the key, you will (I think) need to write
your own InputFormat.
Jeff
Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 26, 2008 3:47 PM
To: core-user@hadoop.apache.org
Subject: Re: Decompression Blues
Jeff,
On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote:
I'm processing a number of .gz compressed Apache and other logs using
Hadoop
about this, but now I won't.
Thanks,
Jeff
-Original Message-
From: Owen O'Malley [mailto:[EMAIL PROTECTED]
Sent: Monday, February 11, 2008 10:40 AM
To: core-user@hadoop.apache.org
Subject: Re: Best Practice?
On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote:
I'm trying to wait until
What's the best way to get additional configuration arguments to my
mappers and reducers?
Jeff
Well, I tried saving the OutputCollectors in an instance variable and
writing to them during close and it seems to work.
Jeff
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Saturday, February 09, 2008 4:21 PM
To: core-user@hadoop.apache.org
Subject: RE: Best
I noticed that phenomena right off the bat. Is that a designed feature
or just an unhappy consequence of how blocks are allocated? Ted
compensates for this by aggressively rebalancing his cluster often by
adjusting the replication up and down, but I wonder if an improvement in
the allocation
Oops, should be TaskTracker.
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 07, 2008 12:24 PM
To: core-user@hadoop.apache.org
Subject: RE: Starting up a larger cluster
Hi Ben,
I've been down this same path recently and I think I understand your
Hi Ben,
I've been down this same path recently and I think I understand your
issues:
1) Yes, you need the hadoop folder to be in the same location on each
node. Only the master node actually uses the slaves file, to start up
DataNode and JobTracker daemons on those nodes.
2) If you did not
23 matches
Mail list logo