Reading files from HDFS is very easy since there is a URL based mechanism
for that.
On 3/20/08 5:21 PM, "Michael Bieniosek" <[EMAIL PROTECTED]> wrote:
> If you want to talk to the hdfs from flash, your best bet is probably to set
> up a java server and talk to it over http. There's a webdav se
If you want to talk to the hdfs from flash, your best bet is probably to set up
a java server and talk to it over http. There's a webdav server patch here:
https://issues.apache.org/jira/browse/HADOOP-496 (I worked on this a while, but
never finished it). I think some other people have written
On Mar 20, 2008, at 3:56 PM, Otis Gospodnetic wrote:
Hi,
The MapReduce tutorial mentions Combiners only in passing. Is
there a default Combiner or default combining behaviour?
No, there is *no* default combiner at all. It has to be explicitly
set in the JobConf to take effect.
Arun
Hi,
>The map/reduce tasks are not threads, they are run in separate JVMs
which are forked by the tasktracker.
I don't understand why? is it a design to support task failures? I think
that on the other hand running a thread queue (of tasks) per job per JVM
would grealy improve performance, since f
If you want to output data to different files based on date or any value
parts, you may want to check
https://issues.apache.org/jira/browse/HADOOP-2906
Runping
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 20, 2008 4:00 PM
> To: core-us
Thank you, Doug and Ted, this pointed me in the right direction, which lead to
a custom OutputFormat and a RecordWriter that opens and closes the
DataOutputStream based on the current key (if current key diff from previous
key, close previous output and open a new one, then write)
As for pa
Hi,
The MapReduce tutorial mentions Combiners only in passing. Is there a default
Combiner or default combining behaviour?
Concretely, I want to make sure that records are not getting combined behind
the scenes in some way without me seeing it, and causing me to lose data. For
instance, if t
On Tue, 18 Mar 2008 19:53:04 -0500, Ted Dunning <[EMAIL PROTECTED]> wrote:
I think the original request was to limit the sum of maps and reduces
rather than limiting the two parameters independently.
Ted, yes this is exactly what I'm looking for. I just found an issue that
seems to state th
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)
MapFileOutputFormat#getEntry() does this.
Use MapFileOutputFormat#getReaders() to create the readers
Amareshwari, thanks for your help. This turned out to be user error (when
packaging my JAR, I inadvertently included a lib directory, so the libraries
actually existed in HDFS as ./lib/lib/perl..., when I was only expecting
./lib/perl...
Thanks again,
Norbert
On Thu, Mar 20, 2008 at 3:03 AM, Ama
Hi all--
I would like to have a reducer generate a MapFile so that in later
processes I can look up the values associated with a few keys without
processing an entire sequence file. However, if I have N reducers, I
will generate N different map files, so to pick the right map file I
will need to u
you can't do this with the contrib/ec2 scripts/ami.
but passing the master private dns name to the slaves on boot as 'user-
data' works fine. when a slave starts, it contacts the master and
joins the cluster. there isn't any need for a slave to rsync from the
master, thus removing the depend
Chris,
What do you mean when you say boot the slaves with "the master private name" ?
===
Chris K Wensel <[EMAIL PROTECTED]> wrote:
I found it much better to start the master first, then boot the slaves
with the master private name.
i do not use the start|stop-a
There is a C-language based API to access HDFS. You can find more
details at:
http://wiki.apache.org/hadoop/LibHDFS
If you download the Hadoop source code from
http://hadoop.apache.org/core/releases.html, you will see this API in
src/c++/libhdfs/hdfs.c
hope this helps,
dhruba
-Original Mess
Actually, I personally use the following "2 part" copy technique to copy
files to a cluster of boxes:
tar cf - myfile | dsh -f host-list-file -i -c -M tar xCfv /tmp -
The first tar packages myfile into a tar file.
dsh runs a tar that unpacks the tar (in the above case all boxes listed
in host-li
Yes, this isn't ideal for larger clusters. There's a jira to address
this: https://issues.apache.org/jira/browse/HADOOP-2410.
Tom
On 20/03/2008, Prasan Ary <[EMAIL PROTECTED]> wrote:
> Hi All,
> I have been trying to configure Hadoop on EC2 for large number of clusters
> ( 100 plus). It seems
I found it much better to start the master first, then boot the slaves
with the master private name.
i do not use the start|stop-all scrips, so i do not need to maintain
the slaves file. thus i don't need to push private keys around to
support those scripts.
this lets me start 20 nodes, t
Hi,
Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It
already contains such part:
echo "Copying private key to slaves"
for slave in `cat slaves`; do
scp $SSH_OPTS $PRIVATE_KEY_PATH "[EMAIL PROTECTED]:/root/.ssh/id_rsa"
ssh $SSH_OPTS "[EMAIL PROTECTED]" "chmod 600 /root/
Hi All,
I have been trying to configure Hadoop on EC2 for large number of clusters (
100 plus). It seems that I have to copy EC2 private key to all the machines in
the cluster so that they can have SSH connections.
For now it seems I have to run a script to copy the key file to each of the
E
Yes, this is a bug. This only occurs when a job's input path contains the
closures. JobConf.getInputPaths interprets mr/input/glob/2008/02/{02.08} as
two input paths: mr/input/glob/2008/02/{02 and 08}. Let's see how to fix it.
Hairong
On 3/20/08 9:43 AM, "Tom White" <[EMAIL PROTECTED]> wrote:
I'm trying to use file globbing to select various input paths, like so:
conf.setInputPath(new Path("mr/input/glob/2008/02/{02,08}"));
But this gives an exception:
Exception in thread "main" java.io.IOException: Illegal file pattern:
Expecting set closure character or end of range, or } for glob
Rong-en Fan wrote:
I have two questions regarding the mapfile in hadoop/hdfs. First, when using
MapFileOutputFormat as reducer's output, is there any way to change
the index interval (i.e., able to call setIndexInterval() on the
output MapFile)?
Not at present. It would probably be good to cha
Actually, the fs.trash.interval number has no significance on the client. If it
is non-zero, then the client does a rename instead of a delete. The value
specified in fs.trash.interval is used only by the namenode to periodically
remove files from Trash: the periodicity is the value specified by
Hi,
I have two questions regarding the mapfile in hadoop/hdfs. First, when using
MapFileOutputFormat as reducer's output, is there any way to change
the index interval (i.e., able to call setIndexInterval() on the
output MapFile)?
Second, is it possible to tell what is the position in data file fo
Thank you for the clarification.
Here is my another question.
If two different clients ordered "move to trash" with different interval,
(e.g. client #1 with fs.trash.interval = 60; client #2 with
fs.trash.interval = 120)
what would happen?
Does namenode keep track of all these info?
/Taeho
On
Norbert Burger wrote:
I'm trying to use the cacheArchive command-line options with the
hadoop-0.15.3-streaming.jar. I'm using the option as follows:
-cacheArchive hdfs://host:50001/user/root/lib.jar#lib
Unfortunately, my PERL scripts fail with an error consistent with not being
able to find th
26 matches
Mail list logo