I am relatively new here and starting the CDH3u1 (on vmware). The
nameserver is not coming up due to the following error:
2011-10-25 22:47:00,547 INFO org.apache.hadoop.hdfs.server.common.Storage:
Cannot access storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name
2011-10-25 22:47:00,549
Hi Steve,
You may use the shell command "hadoop fs -count" or calling
FileSystem.getContentSummary(Path f) in Java.
Hope it helps.
Tsz-Wo
From: Steve Lewis
To: mapreduce-user ;
hdfs-u...@hadoop.apache.org
Sent: Tuesday, October 25, 2011 5:51 PM
Subject: Has
While I can see file sizes with the web interface, it is very difficult to
tell which directories are taking up space especially
when nested by several levels
--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
Hi,
I am trying to create output files of fixed size by using :
-Dmapred.max.split.size=6442450812 (6 Gb)
But the problem is that the input Data size and metadata varies and I have to
adjust above value manually to achieve fixed size.
Is there a way I can programmatically determine split size
It looks like input data is not splited correctly. It always generates only
one map task and gives it to one of the nodes. I tried to pass parameters
like -D mapred.max.split.size but it doesn't seem to have any effect.
So the question would be: how to specify the maximum amount of input records
> Is the configured amount of tasks for reuse a suggestion or will it actually
> use it? For example, if I’ve configured it to use a JVM for 4 tasks, will a
> TaskTracker that has 8 tasks to process use 2 JVMs? Or does it decide if it
> actually wants to reuse one up to the maximum configured num
Hello All,
I have a few questions concerning the TaskTracker's JVM re-use that I couldn't
unearth some details about:
Is the configured amount of tasks for reuse a suggestion or will it actually
use it? For example, if I've configured it to use a JVM for 4 tasks, will a
TaskTracker that has 8
On Tue, Oct 25, 2011 at 8:35 AM, Radim Kolar wrote:
> Dne 25.10.2011 14:21, Niels Basjes napsal(a):
>
> Why not do something very simple: Use the MD5 of the URL as the key you do
>> the sorting by.
>> This scales very easy and highly randomized order.
>> Maybe not the optimal maximum distance, b
Dne 25.10.2011 14:21, Niels Basjes napsal(a):
Why not do something very simple: Use the MD5 of the URL as the key
you do the sorting by.
This scales very easy and highly randomized order.
Maybe not the optimal maximum distance, but certainly a very good
distribution and very easy to built.
I tr
Why not do something very simple: Use the MD5 of the URL as the key you do
the sorting by.
This scales very easy and highly randomized order.
Maybe not the optimal maximum distance, but certainly a very good
distribution and very easy to built.
Niels Basjes
2011/10/25 Radim Kolar
> Hi, i am hav
Hi, i am having problem implementing unsort for crawler in map/reduce.
I have list of URLs waiting to fetch, they needs to be reordered for
maximum distance between URLs from one domain.
idea is to do
map URL -> domain, URL
test.com, http://www.test.com/page1.html
test.com, http://www.test
11 matches
Mail list logo