Dear All,
I have been trying to use HOD on Scyld as a common user(not root), but I
have some problem to get it start. I am wondering did anyone use HOD on
Scyld cluster successfully? Any help would be appreciated, thanks!
Boyu
I love working on a problem for an hour, sending an email for help, then
solving it. Problem was the space after the comma in the -files option:
$ hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=true -files \
...[SEVERAL .so FILES TO DISTRIBUTED CACHE]
hdfs
I've been steadily adding more and more shared libraries to the -files option
of my pipes command and have had moderate success in that each time I add a new
library the app no longer fails on that library, but rather on the next one.
However, I've hit a snag. I'm getting the following error ev
I guess my question below can be rephrased as, "What is the absolute minimum hw
requirements for me to still see 'better-than-a-single-machine' performance?"
Thanks!
On Apr 12, 2010, at 1:45 PM, Andrew Nguyen wrote:
> I don't think you can :-). Sorry, they are 100Mbps NIC's... I get
> 95Mbit
I don't think you can :-). Sorry, they are 100Mbps NIC's... I get 95Mbit/sec
from one node to another with iperf.
Should I still be expecting such dismal performance with just 100Mbps?
On Apr 12, 2010, at 1:31 PM, Todd Lipcon wrote:
> On Mon, Apr 12, 2010 at 1:05 PM, Andrew Nguyen <
> andrew-
On Mon, Apr 12, 2010 at 1:05 PM, Andrew Nguyen <
andrew-lists-had...@ucsfcti.org> wrote:
> 5 identically spec'ed nodes, each has:
>
> 2 GB RAM
> Pentium 4 3.0G with HT
> 250GB HDD on PATA
> 10Mbps NIC
>
This is probably your issue - 10mbps nic? I didn't know you could even get
those anymore!
Had
@Todd:
I do need the sorting behavior, eventually. However, I'll try it with zero
reduce jobs to see.
@Alex:
Yes, I was planning on incrementally building my mapper and reducer functions
so currently, the mapper takes the value and multiplies by the gain and adds
the offset and outputs a new
Andrew,
I would also suggest to run DFSIO benchmark to isolate io related issues
hadoop jar hadoop-0.20.2-test.jar TestDFSIO -write -nrFiles 10 -fileSize
1000
hadoop jar hadoop-0.20.2-test.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
there are additional tests specific for mapreduce - run "h
Hey Keith,
The way we (LHC) approach a similar problem (not using Hadoop, but basically
the same thing) is we distributed the common software everywhere (either
through a shared file system or an RPM which is installed as part of the base
image), and allow users to fly in changed code with the
See this environment http://bit.ly/4ekN8G. Subsequently I used the 3
server setup, each configured with 8gig of heap in the jvm and 4
CPUs/jvm (I think I used 10second session timeouts for this) for some
additional testing that I've not written up yet. I was able to run ~500
clients (same test
I am have partial success chipping away at the shared library dependencies of
my hadoop job by submitting them to the distributed cache with the -files
option. When I add another library to the -files list, it seems to work in
that the run no longer fails on that library, but rather fails on an
On Apr 12, 2010, at 10:22 , Edward Capriolo wrote:
> On Mon, Apr 12, 2010 at 1:17 PM, Keith Wiley wrote:
>
>> So I ^C a job from the command line and get my prompt back, but sometimes
>> the job remains on the cluster, I can see it on the admin web UI, and
>> sometimes it lingers there for hours
Update: on another mailing list it was shown how to use the hadoop binary with
the 'jobs -kill' command to kill a job.
On Apr 12, 2010, at 10:17 , Keith Wiley wrote:
> So I ^C a job from the command line and get my prompt back, but sometimes the
> job remains on the cluster, I can see it on the
Hi Andrew,
Do you need the sorting behavior that having an identity reducer gives you?
If not, set the number of reduce tasks to 0 and you'll end up with a map
only job, which should be significantly faster.
-Todd
On Mon, Apr 12, 2010 at 9:43 AM, Andrew Nguyen <
andrew-lists-had...@ucsfcti.org>
Mahadev Konar:
> Hi Thomas,
> There are a couple of projects inside Yahoo! that use ZooKeeper as an
> event manager for feed processing.
>
> I am little bit unclear on your example below. As I understand it-
>
> 1. There are 1 million feeds that will be stored in Hbase.
> 2. A map reduce job wi
On Mon, Apr 12, 2010 at 1:17 PM, Keith Wiley wrote:
> So I ^C a job from the command line and get my prompt back, but sometimes
> the job remains on the cluster, I can see it on the admin web UI, and
> sometimes it lingers there for hours before finally getting flushed.
>
> Is there a way to kill
So I ^C a job from the command line and get my prompt back, but sometimes the
job remains on the cluster, I can see it on the admin web UI, and sometimes it
lingers there for hours before finally getting flushed.
Is there a way to kill a hadoop job once the command line prompt has returned,
onc
On Mon, 2010-04-12 at 00:32 +, Allen Wittenauer wrote:
> On Apr 10, 2010, at 7:10 PM, Shevek wrote:
> >
> > * Full cross-platform support
> > - Job submission, HDFS and S3 browsing from Windows, MacOS or Linux.
>
>
> If you list three OSes, that isn't cross platform. :)
We support quite a
Hi Thomas,
There are a couple of projects inside Yahoo! that use ZooKeeper as an
event manager for feed processing.
I am little bit unclear on your example below. As I understand it-
1. There are 1 million feeds that will be stored in Hbase.
2. A map reduce job will be run on these feeds to f
Hello,
I recently setup a 5 node cluster (1 master, 4 slaves) and am looking to use it
to process high volumes of patient physiologic data. As an initial exercise to
gain a better understanding, I have attempted to run the following problem
(which isn't the type of problem that Hadoop was real
So how does the example work where the second argument is simply
separated by a space and indicates some sort of "label" by which to
find the file in the distributed cache:
... -files URI_TO_FILE name ...
where 'name' is canonically the file name in the uri but without a
scheme or path, ju
Dear all,
I'm running HOD (.20.2) on my cluster. Each machine in my cluster has
more than one processor. How can I make use of them?
I can see there is a argument which can control the number nodes to
allocate, but I cant see any parameters which can specify the number of
processors per
Hi,
I'd like to implement a feed loader with Hadoop and most likely HBase. I've
got around 1 million feeds, that should be loaded and checked for new entries.
However the feeds have different priorities based on their average update
frequency in the past and their relevance.
The feeds (url, las
23 matches
Mail list logo