Nathan Marz wrote:
I have some unit tests which run MapReduce jobs and test the
inputs/outputs in standalone mode. I recently started using
DistributedCache in one of these jobs, but now my tests fail with
errors such as:
Caused by: java.io.IOException: Incomplete HDFS URI, no host:
hdfs:///
I have some unit tests which run MapReduce jobs and test the inputs/
outputs in standalone mode. I recently started using DistributedCache
in one of these jobs, but now my tests fail with errors such as:
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///
tmp/file.data
at
on the whole end-to-end
> process.
>
> - Aaron
>
> On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal wrote:
>
>> Hey folks,
>>
>> I am trying to use Distributed cache in hadoop jobs to pass around
>> configuration files , external-jars (job sepecific) and some
ingle-stepping debugger on the whole end-to-end
process.
- Aaron
On Thu, Jan 22, 2009 at 11:29 AM, Bhupesh Bansal wrote:
> Hey folks,
>
> I am trying to use Distributed cache in hadoop jobs to pass around
> configuration files , external-jars (job sepecific) and some archive data.
>
>
Hey folks,
I am trying to use Distributed cache in hadoop jobs to pass around
configuration files , external-jars (job sepecific) and some archive data.
I want to test Job end-to-end in local mode, but I think the distributed
caches are localized in TaskTracker code which is not called in local
Jeremy Pinkham wrote:
We are using the distributed cache in one of our jobs and have noticed
that the local copies on all of the task nodes never seem to get cleaned
up. Is there a mechanism in the API to tell the framework that those
copies are no longer needed so they can be deleted. I
We are using the distributed cache in one of our jobs and have noticed
that the local copies on all of the task nodes never seem to get cleaned
up. Is there a mechanism in the API to tell the framework that those
copies are no longer needed so they can be deleted. I've tried using
release
billion nodes. See
>>> http://tinyurl.com/4fgok6
>>> . To invert the links, you process the graph in pieces and resort
>>> based on the target. You'll get much better performance and scale to
>>> almost any size.
>>>
>>>
>>&g
f the web that has
>>> roughly 1 trillion links and 100 billion nodes. See
>>> http://tinyurl.com/4fgok6
>>> . To invert the links, you process the graph in pieces and resort
>>> based on the target. You'll get much better performance and scale to
>>
of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ?? I read about hadoop
distributed cache looks like it's copies the file (hdfs / http)
locally on
the slaves but not necessrily in RAM ??
You could mmap the file from distributed cache usi
On Oct 16, 2008, at 3:09 PM, Bhupesh Bansal wrote:
Lets say I want to implement a DFS in my graph. I am not able to
picturise
implementing it with doing graph in pieces without putting a depth
bound to
(3-4). Lets say we have 200M (4GB) edges to start with
Start by watching the lecture on
00 billion nodes. See http://tinyurl.com/4fgok6
> . To invert the links, you process the graph in pieces and resort
> based on the target. You'll get much better performance and scale to
> almost any size.
>
>> Whats is the best way of doing that ?? Is there a way so that mu
based on the target. You'll get much better performance and scale to
almost any size.
Whats is the best way of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ?? I read about hadoop
distributed cache looks like it's copies the file (h
a copy of whole Graph in RAM
at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores
machine with 8G on each.
Whats is the best way of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ?? I read about hadoop
distributed cache looks like i
Bhupesh Bansal wrote:
Minor correction the graph size is about 6G and not 8G.
Ah, that's better.
With the jvm reuse feature in 0.19 you should be able to load it once
per job into a static, since all tasks of that job can share a JVM.
Things will get tight if you try to run two such jobs at
ep a copy of whole Graph in RAM
> at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores
> machine with 8G on each.
>
> Whats is the best way of doing that ?? Is there a way so that multiple
> mappers on same machine can access a RAM cache ?? I read about hadoop
?? Is there a way so that multiple
mappers on same machine can access a RAM cache ?? I read about hadoop
distributed cache looks like it's copies the file (hdfs / http) locally on
the slaves but not necessrily in RAM ??
Best
Bhupesh
jerrro wrote:
Hello,
Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
to be able to access a file on the local disk all over the data nodes, so
hadoop would copy it to all data nodes before a map reduce job.
Thanks.
Hi,
First of all you need to copy the files
Hello,
Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
to be able to access a file on the local disk all over the data nodes, so
hadoop would copy it to all data nodes before a map reduce job.
Thanks.
--
View this message in context:
http://www.nabble.com
19 matches
Mail list logo