Re: Testing with Distributed Cache

2009-02-10 Thread Amareshwari Sriramadasu
Nathan Marz wrote: I have some unit tests which run MapReduce jobs and test the inputs/outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///

Testing with Distributed Cache

2009-02-10 Thread Nathan Marz
I have some unit tests which run MapReduce jobs and test the inputs/ outputs in standalone mode. I recently started using DistributedCache in one of these jobs, but now my tests fail with errors such as: Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:/// tmp/file.data at

Re: Distributed cache testing in local mode

2009-01-23 Thread Tom White
on the whole end-to-end > process. > > - Aaron > > On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal wrote: > >> Hey folks, >> >> I am trying to use Distributed cache in hadoop jobs to pass around >> configuration files , external-jars (job sepecific) and some

Re: Distributed cache testing in local mode

2009-01-22 Thread Aaron Kimball
ingle-stepping debugger on the whole end-to-end process. - Aaron On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal wrote: > Hey folks, > > I am trying to use Distributed cache in hadoop jobs to pass around > configuration files , external-jars (job sepecific) and some archive data. > >

Distributed cache testing in local mode

2009-01-22 Thread Bhupesh Bansal
Hey folks, I am trying to use Distributed cache in hadoop jobs to pass around configuration files , external-jars (job sepecific) and some archive data. I want to test Job end-to-end in local mode, but I think the distributed caches are localized in TaskTracker code which is not called in local

Re: distributed cache

2008-11-11 Thread Amareshwari Sriramadasu
Jeremy Pinkham wrote: We are using the distributed cache in one of our jobs and have noticed that the local copies on all of the task nodes never seem to get cleaned up. Is there a mechanism in the API to tell the framework that those copies are no longer needed so they can be deleted. I&#x

distributed cache

2008-11-11 Thread Jeremy Pinkham
We are using the distributed cache in one of our jobs and have noticed that the local copies on all of the task nodes never seem to get cleaned up. Is there a mechanism in the API to tell the framework that those copies are no longer needed so they can be deleted. I've tried using release

Re: Distributed cache Design

2008-10-20 Thread Ted Dunning
billion nodes. See >>> http://tinyurl.com/4fgok6 >>> . To invert the links, you process the graph in pieces and resort >>> based on the target. You'll get much better performance and scale to >>> almost any size. >>> >>> >>&g

Re: Distributed cache Design

2008-10-16 Thread Bhupesh Bansal
f the web that has >>> roughly 1 trillion links and 100 billion nodes. See >>> http://tinyurl.com/4fgok6 >>> . To invert the links, you process the graph in pieces and resort >>> based on the target. You'll get much better performance and scale to >>

Re: Distributed cache Design

2008-10-16 Thread Colin Evans
of doing that ?? Is there a way so that multiple mappers on same machine can access a RAM cache ?? I read about hadoop distributed cache looks like it's copies the file (hdfs / http) locally on the slaves but not necessrily in RAM ?? You could mmap the file from distributed cache usi

Re: Distributed cache Design

2008-10-16 Thread Owen O'Malley
On Oct 16, 2008, at 3:09 PM, Bhupesh Bansal wrote: Lets say I want to implement a DFS in my graph. I am not able to picturise implementing it with doing graph in pieces without putting a depth bound to (3-4). Lets say we have 200M (4GB) edges to start with Start by watching the lecture on

Re: Distributed cache Design

2008-10-16 Thread Bhupesh Bansal
00 billion nodes. See http://tinyurl.com/4fgok6 > . To invert the links, you process the graph in pieces and resort > based on the target. You'll get much better performance and scale to > almost any size. > >> Whats is the best way of doing that ?? Is there a way so that mu

Re: Distributed cache Design

2008-10-16 Thread Owen O'Malley
based on the target. You'll get much better performance and scale to almost any size. Whats is the best way of doing that ?? Is there a way so that multiple mappers on same machine can access a RAM cache ?? I read about hadoop distributed cache looks like it's copies the file (h

Re: Distributed cache Design

2008-10-16 Thread Colin Evans
a copy of whole Graph in RAM at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores machine with 8G on each. Whats is the best way of doing that ?? Is there a way so that multiple mappers on same machine can access a RAM cache ?? I read about hadoop distributed cache looks like i

Re: Distributed cache Design

2008-10-16 Thread Doug Cutting
Bhupesh Bansal wrote: Minor correction the graph size is about 6G and not 8G. Ah, that's better. With the jvm reuse feature in 0.19 you should be able to load it once per job into a static, since all tasks of that job can share a JVM. Things will get tight if you try to run two such jobs at

Re: Distributed cache Design

2008-10-16 Thread Bhupesh Bansal
ep a copy of whole Graph in RAM > at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores > machine with 8G on each. > > Whats is the best way of doing that ?? Is there a way so that multiple > mappers on same machine can access a RAM cache ?? I read about hadoop

Distributed cache Design

2008-10-16 Thread Bhupesh Bansal
?? Is there a way so that multiple mappers on same machine can access a RAM cache ?? I read about hadoop distributed cache looks like it's copies the file (hdfs / http) locally on the slaves but not necessrily in RAM ?? Best Bhupesh

Re: distributed cache

2008-01-28 Thread Amareshwari Sri Ramadasu
jerrro wrote: Hello, Is there a way to use Distributed Cache with a pipes (C++ code) job? I want to be able to access a file on the local disk all over the data nodes, so hadoop would copy it to all data nodes before a map reduce job. Thanks. Hi, First of all you need to copy the files

distributed cache

2008-01-28 Thread jerrro
Hello, Is there a way to use Distributed Cache with a pipes (C++ code) job? I want to be able to access a file on the local disk all over the data nodes, so hadoop would copy it to all data nodes before a map reduce job. Thanks. -- View this message in context: http://www.nabble.com