I didn't have any problems using the scripts that are in CDH3 (beta, March
2010) to bring up and tear down Hadoop cluster instances with EC2.
I think there were some differences between the documentation and the actual
scripts but it's been a few weeks and I don't have access to my notes right
Sorry to hijack but after following this thread, I had a related question to
the secondary location of dfs.name.dir.
Is the approach outlined below the preferred/suggested way to do this? Is this
people mean when they say, "stick it on NFS" ?
Thanks!
On May 17, 2010, at 11:14 PM, Todd Lipco
Sorry for bothering everyone, I accidentally configured my dfs.data.dir and
mapred.local.dir to the same directory... Bad copy/paste job.
Thanks for everyone's help!
re the hadoop files via NFS.
>>>> The log and pid directories are local.
>>>>
>>>> Thanks!
>>>>
>>>> --Andrew
>>>>
>>>> On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
>>>>
>>
at 6:51 PM, Jeff Zhang wrote:
>>
>>> It is not suggested to deploy hadoop on NFS, there will be conflict
>>> between data nodes, because NFS share the same namespace of file
>>> system.
>>>
>>>
>>>
>>> On Thu
14, 2010, at 1:06 PM, Andrew Nguyen wrote:
> I'm pretty sure I just set my dfs.data.dir to be /srv/hadoop/dfs/1
>
>
> dfs.data.dir
> /srv/hadoop/dfs/1
>
>
> I don't have hadoop.tmp.dir set to anything so it's whatever the default is.
>
> I don'
red NFS was the fastest way to
propagate changes.
Thanks!
On May 14, 2010, at 9:17 AM, Allen Wittenauer wrote:
>
> On May 14, 2010, at 8:53 AM, Andrew Nguyen wrote:
>
>> Just to be clear, I'm only sharing the Hadoop binaries and config files via
>> NFS. I don
, 2010, at 6:51 PM, Jeff Zhang wrote:
> It is not suggested to deploy hadoop on NFS, there will be conflict
> between data nodes, because NFS share the same namespace of file
> system.
>
>
>
> On Thu, May 13, 2010 at 9:52 PM, Andrew Nguyen wrote:
>>
>> Yes,
Yes, in this deployment, I'm attempting to share the hadoop files via NFS. The
log and pid directories are local.
Thanks!
--Andrew
On May 12, 2010, at 7:40 PM, Jeff Zhang wrote:
> These 4 nodes share NFS ?
>
>
> On Thu, May 13, 2010 at 8:19 AM, Andrew Nguyen
> wrot
I'm working on bringing up a second test cluster and am getting these
intermittent errors on the DataNodes:
2010-05-12 17:17:15,094 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.FileNotFoundException: /srv/hadoop/dfs/1/current/VERSION (No such file
or directory)
at java
Chris,
Thanks for the heads up!
--Andrew
On May 3, 2010, at 10:45 AM, Mattmann, Chris A (388J) wrote:
> Hi Andrew,
>
> There has been some work in the Tika [1] project recently on looking at
> NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them.
> Though this doesn't di
Does anyone know of any existing work integrating HDF5
(http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop?
I don't know much about HDF5 but it was recently brought to my attention as a
way to store high-density scientific data. Since I've confirmed that having
Hadoop dramatically speed
As I may have mentioned, my main goal currently is the processing of
physiologic data using hadoop and MR. The steps are:
Convert ADC units to physical units (input is , output
is
Perform a peak detection to detect the systolic blood pressure (input is
, output is but the
output is only a s
And, I'm getting the following errors:
10/04/15 06:00:50 INFO mapred.JobClient: Task Id :
attempt_201004150557_0001_m_00_1, Status : FAILED
java.io.IOException: Cannot open filename
/benchmarks/TestDFSIO/io_data/test_io_0
A bunch show up and then the job fails. Running the job directly on
I thought I saw a way to specify the block size for individual files using the
command-line using "hadoop dfs -put/copyFromLocal..." However, I can't seem to
find the reference anywhere.
I see that I can do it via the API but no references to a command-line
mechanism. Am I just remembering so
money flow...
--Andrew
On Tue, 13 Apr 2010 10:29:06 -0700, Todd Lipcon wrote:
> On Mon, Apr 12, 2010 at 1:45 PM, Andrew Nguyen <
> andrew-lists-had...@ucsfcti.org> wrote:
>
>> I don't think you can :-). Sorry, they are 100Mbps NIC's... I get
>> 95Mbit/sec from one node
Correction, they are 100Mbps NIC's...
iperf shows that we're getting about 95 Mbits/sec from one node to another.
On Apr 12, 2010, at 1:05 PM, Andrew Nguyen wrote:
> @Todd:
>
> I do need the sorting behavior, eventually. However, I'll try it with zero
> red
I guess my question below can be rephrased as, "What is the absolute minimum hw
requirements for me to still see 'better-than-a-single-machine' performance?"
Thanks!
On Apr 12, 2010, at 1:45 PM, Andrew Nguyen wrote:
> I don't think you can :-). Sorry, they
I don't think you can :-). Sorry, they are 100Mbps NIC's... I get 95Mbit/sec
from one node to another with iperf.
Should I still be expecting such dismal performance with just 100Mbps?
On Apr 12, 2010, at 1:31 PM, Todd Lipcon wrote:
> On Mon, Apr 12, 2010 at 1:05 PM, Andrew Nguy
0 and you'll end up with a map
> only job, which should be significantly faster.
>
> -Todd
>
> On Mon, Apr 12, 2010 at 9:43 AM, Andrew Nguyen <
> andrew-lists-had...@ucsfcti.org> wrote:
>
> > Hello,
> >
> > I recently setup a 5 node clust
Hello,
I recently setup a 5 node cluster (1 master, 4 slaves) and am looking to use it
to process high volumes of patient physiologic data. As an initial exercise to
gain a better understanding, I have attempted to run the following problem
(which isn't the type of problem that Hadoop was real
21 matches
Mail list logo