Hi,
I wanted to know if it is possible to use different file systems for Map
Reduce job input and output.
I.e. have a M/R job input reside on one file system and the M/R output be
written to another file system (e.g. input on HDFS, output on KFS. Input on
HDFS output on local file system, or
Hi Naama,
Yes. It is possible to specify using the apis
FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath().
You can specify the FileSystem uri for the path.
Thanks,
Amareshwari
Naama Kraus wrote:
Hi,
I wanted to know if it is possible to use different file systems for Map
Hi, I am a new user.I need to develop a huge mediagallery. My reqs in a
nutshell are a high scalability on the number of users, reliability of
users' data (photos, videos, docs, etc.. uploaded by users) and an internal
search engine.
I've seen some posts about the applicability of Hadoop on web
Thanks ! Naama
On Mon, Oct 6, 2008 at 10:27 AM, Amareshwari Sriramadasu
[EMAIL PROTECTED] wrote:
Hi Naama,
Yes. It is possible to specify using the apis
FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath().
You can specify the FileSystem uri for the path.
Thanks,
Dmitry Pushkarev wrote:
Dear hadoop users,
I'm lucky to work in academic environment where information security is not
the question. However, I'm sure that most of the hadoop users aren't.
Here is the question: how secure hadoop is? (or let's say foolproof)
Right now hadoop is
You bring up some valid points. This would be a great topic for a
white paper. The first line of defense should be to apply inbound and
outbound iptables rules. Only source IPs that have a direct need to
interact with the cluster should be allowed to. The same is true with
the web access. Only a
Can you explain The location of these splits is semi-arbitrary? What if the
example was...
AAA|BBB|CCC|DDD
EEE|FFF|GGG|HHH
Does this mean the split might be between CCC such that it results in AAA|BBB|C
and C|DDD for the first line? Is there a way to control this behavior to split
on my
On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote:
Edward Capriolo wrote:
You bring up some valid points. This would be a great topic for a
white paper.
-a wiki page would be a start too
I was thinking about doing Deploying Hadoop Securely for a ApacheCon EU
talk, as by that
Edward Capriolo wrote:
You bring up some valid points. This would be a great topic for a
white paper.
-a wiki page would be a start too
The first line of defense should be to apply inbound and
outbound iptables rules. Only source IPs that have a direct need to
interact with the cluster
Allen Wittenauer wrote:
On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote:
Edward Capriolo wrote:
You bring up some valid points. This would be a great topic for a
white paper.
-a wiki page would be a start too
I was thinking about doing Deploying Hadoop Securely for a
As far as I know, splits will never be made within a line, only between
rows. To answer your question about ways to control the splits, see below:
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
Alex
Hi Everyone!
I would like to implement Nagios health monitoring of a Hadoop grid.
Some of you have some experience here, do you hace any approach or advice I
could use.
At this time I've been only playing with jsp's files that hadoop has
integrated into it. so I;m not sure if it could be a
I'm trying to index a large dataset using Hadoop+Lucene. I used the example
under hadoop/trunk/src/conrib/index/ for indexing. I'm unable to find a way
to search the index that was successfully built.
I tried copying over the index to one machine and merging them using
Please use the HBase mailing list for HBase-related questions:
http://hadoop.apache.org/hbase/mailing_lists.html#Users
Regards your question, have you looked at
http://wiki.apache.org/hadoop/Hbase/HbaseRest ?
J-D
On Mon, Oct 6, 2008 at 12:05 AM, Trinh Tuan Cuong [EMAIL PROTECTED]
wrote:
Hi,
you might find http://katta.wiki.sourceforge.net/ interesting. If you
have any katta releated question please use the katta mailing list.
Stefan
~~~
101tec Inc., Menlo Park, California
web: http://www.101tec.com
blog: http://www.find23.net
On Oct 6, 2008,
Hi all,
I have a weird problem regarding running the wordcount example from eclipse.
I was able to run the wordcount example from the command line like:
$ ...MyHadoop/bin/hadoop jar ../MyHadoop/hadoop-xx-examples.jar wordcount
myinputdir myoutputdir
However, if I try to run the wordcount
Hi,
I have a configuration file (similar to hadoop-site.xml) and I want to
include this file as a resource while running Map-Reduce jobs. Similarly, I
want to add a jar file that is required by Mappers and Reducers
ToolRunner.run( ...) allows me to do this easily, my question is can I add
these
We see this on Maps and only on incrementBytesRead (not on
incrementBytesWritten). It is on HDFS where we are seeing the time
spent. It seems that this is because incrementBytesRead is called
every time a record is read, while incrementBytesWritten is only
called when a buffer is spilled.
Hi,
I want to add a jar file (that is required by mappers and reducers) to the
classpath. Initially I had copied the jar file to all the slave nodes in the
$HADOOP_HOME/lib directory and it was working fine.
However when I tried the libjars option to add jar files -
$HADOOP_HOME/bin/hadoop jar
HI Tarandeep,
the libjars options does not add the jar on the client side. Their is an
open jira for that ( id ont remember which one)...
Oyu have to add the jar to the
HADOOP_CLASSPATH on the client side so that it gets picked up on the client
side as well.
mahadev
On 10/6/08 2:30 PM,
So looking at the following mapper...
http://csvdatamix.svn.sourceforge.net/viewvc/csvdatamix/branches/datamix_mapreduce/src/com/datamix/pivot/PivotMapper.java?view=markup
On line 32, you can see the row split via a delimiter. On line 43, you can see
that the field index (the column index) is
thanks Mahadev for the reply.
So that means I have to copy my jar file in the $HADOOP_HOME/lib folder on
all slave machines like before.
One more question- I am adding a conf file (just like HADOOP_SITE.xml) via
-conf option and I am able to query parameters in mapper/reducers. But is
there a way
This mapper does follow my original suggestion, though I'm not familiar with
how the delimiter works in this example. Anyone else?
Alex
On Mon, Oct 6, 2008 at 2:55 PM, Terrence A. Pietrondi [EMAIL PROTECTED]
wrote:
So looking at the following mapper...
On 10/2/08 11:33 PM, Frank Singleton [EMAIL PROTECTED] wrote:
Just to clarify, this is for when the chown will modify all files owner
attributes
eg: toggle all from frank:frank to hadoop:hadoop (see below)
When we converted from 0.15 to 0.16, we chown'ed all of our files. The
local
Hey all,
I noticed something really funny about fuse-dfs: because super-user
privileges are required to run the getStats function in
FSNamesystem.java, my file systems show up as having 16 exabytes total
and 0 bytes free. If I mount fuse-dfs as root, then I get the correct
results from
Dears,
Sorry, I did not mean to cross post. But the previous article was
accidentally posted to the HBase user list. I would like to bring it back
to the Hadoop user since it is confusing me a lot and it is mainly MapReduce
related.
Currently running version hadoop-0.18.1 on 25 nodes. Map and
I think what Alex talked about 'split' is the mapreduce system's action.
What you said about 'split' is your mapper's action.
I guess that your map/reduce application uses *TextInputFormat* to treat
your input file.
your input file will first be splitted into a few splits. these splits may
be
Mapper's Number depends on your inputformat.
Default Inputformat try to treat every file block of a file as a InputSplit.
And you will get the same number of mappers as the number of your
inputsplits.
try to configure mapred.min.split.size to reduce the number of your mapper
if you want to.
And I
Adding your jar files in the $HADOOP_HOME/lib folder works, but you would
have to restart all your tasktrackers to have your jar files loaded.
If you repackage your map-reduce jar file (e.g. hadoop-0.18.0-examples.jar)
with your jar file and run your job with the newly repackaged jar file, it
The easiest approach I can think of is to write a simple Nagios plugin that
checks if the datanode JVM process is alive. Or you may
write a Nagios-plugin that checks for error or warning messages in datanode
logs. (I am sure you can find quite a few log-checking Nagios plugin in
nagiosplugin.org)
You can just add the jar to the env variable HADOOP_CLASSPATH
If using bash
Just do this :
Export HADOOP_CLASSPATH=path to your class path on the client
And then use the libjars option.
mahadev
On 10/6/08 2:55 PM, Tarandeep Singh [EMAIL PROTECTED] wrote:
thanks Mahadev for the reply.
Hi,
From 0.19, the jars added using -libjars are available on the client
classpath also, fixed by HADOOP-3570.
Thanks
Amareshwari
Mahadev Konar wrote:
HI Tarandeep,
the libjars options does not add the jar on the client side. Their is an
open jira for that ( id ont remember which one)...
32 matches
Mail list logo