Using different file systems for Map Reduce job input and output

2008-10-06 Thread Naama Kraus
Hi, I wanted to know if it is possible to use different file systems for Map Reduce job input and output. I.e. have a M/R job input reside on one file system and the M/R output be written to another file system (e.g. input on HDFS, output on KFS. Input on HDFS output on local file system, or

Re: Using different file systems for Map Reduce job input and output

2008-10-06 Thread Amareshwari Sriramadasu
Hi Naama, Yes. It is possible to specify using the apis FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath(). You can specify the FileSystem uri for the path. Thanks, Amareshwari Naama Kraus wrote: Hi, I wanted to know if it is possible to use different file systems for Map

A scalable gallery with hadoop?

2008-10-06 Thread Alberto Cusinato
Hi, I am a new user.I need to develop a huge mediagallery. My reqs in a nutshell are a high scalability on the number of users, reliability of users' data (photos, videos, docs, etc.. uploaded by users) and an internal search engine. I've seen some posts about the applicability of Hadoop on web

Re: Using different file systems for Map Reduce job input and output

2008-10-06 Thread Naama Kraus
Thanks ! Naama On Mon, Oct 6, 2008 at 10:27 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] wrote: Hi Naama, Yes. It is possible to specify using the apis FileInputFormat#setInputPaths(), FileOutputFormat#setOutputPath(). You can specify the FileSystem uri for the path. Thanks,

Re: Hadoop and security.

2008-10-06 Thread Steve Loughran
Dmitry Pushkarev wrote: Dear hadoop users, I'm lucky to work in academic environment where information security is not the question. However, I'm sure that most of the hadoop users aren't. Here is the question: how secure hadoop is? (or let's say foolproof) Right now hadoop is

Re: Hadoop and security.

2008-10-06 Thread Edward Capriolo
You bring up some valid points. This would be a great topic for a white paper. The first line of defense should be to apply inbound and outbound iptables rules. Only source IPs that have a direct need to interact with the cluster should be allowed to. The same is true with the web access. Only a

Re: architecture diagram

2008-10-06 Thread Terrence A. Pietrondi
Can you explain The location of these splits is semi-arbitrary? What if the example was... AAA|BBB|CCC|DDD EEE|FFF|GGG|HHH Does this mean the split might be between CCC such that it results in AAA|BBB|C and C|DDD for the first line? Is there a way to control this behavior to split on my

Re: Hadoop and security.

2008-10-06 Thread Allen Wittenauer
On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote: Edward Capriolo wrote: You bring up some valid points. This would be a great topic for a white paper. -a wiki page would be a start too I was thinking about doing Deploying Hadoop Securely for a ApacheCon EU talk, as by that

Re: Hadoop and security.

2008-10-06 Thread Steve Loughran
Edward Capriolo wrote: You bring up some valid points. This would be a great topic for a white paper. -a wiki page would be a start too The first line of defense should be to apply inbound and outbound iptables rules. Only source IPs that have a direct need to interact with the cluster

Re: Hadoop and security.

2008-10-06 Thread Steve Loughran
Allen Wittenauer wrote: On 10/6/08 6:39 AM, Steve Loughran [EMAIL PROTECTED] wrote: Edward Capriolo wrote: You bring up some valid points. This would be a great topic for a white paper. -a wiki page would be a start too I was thinking about doing Deploying Hadoop Securely for a

Re: architecture diagram

2008-10-06 Thread Alex Loddengaard
As far as I know, splits will never be made within a line, only between rows. To answer your question about ways to control the splits, see below: http://wiki.apache.org/hadoop/HowManyMapsAndReduces http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html Alex

nagios to monitor hadoop datanodes!

2008-10-06 Thread Gerardo Velez
Hi Everyone! I would like to implement Nagios health monitoring of a Hadoop grid. Some of you have some experience here, do you hace any approach or advice I could use. At this time I've been only playing with jsp's files that hadoop has integrated into it. so I;m not sure if it could be a

Searching Lucene Index built using Hadoop

2008-10-06 Thread Saranath
I'm trying to index a large dataset using Hadoop+Lucene. I used the example under hadoop/trunk/src/conrib/index/ for indexing. I'm unable to find a way to search the index that was successfully built. I tried copying over the index to one machine and merging them using

Re: How to GET row name/column name in HBase using JAVA API

2008-10-06 Thread Jean-Daniel Cryans
Please use the HBase mailing list for HBase-related questions: http://hadoop.apache.org/hbase/mailing_lists.html#Users Regards your question, have you looked at http://wiki.apache.org/hadoop/Hbase/HbaseRest ? J-D On Mon, Oct 6, 2008 at 12:05 AM, Trinh Tuan Cuong [EMAIL PROTECTED] wrote:

Re: Searching Lucene Index built using Hadoop

2008-10-06 Thread Stefan Groschupf
Hi, you might find http://katta.wiki.sourceforge.net/ interesting. If you have any katta releated question please use the katta mailing list. Stefan ~~~ 101tec Inc., Menlo Park, California web: http://www.101tec.com blog: http://www.find23.net On Oct 6, 2008,

Weird problem running wordcount example from within Eclipse

2008-10-06 Thread Ski Gh3
Hi all, I have a weird problem regarding running the wordcount example from eclipse. I was able to run the wordcount example from the command line like: $ ...MyHadoop/bin/hadoop jar ../MyHadoop/hadoop-xx-examples.jar wordcount myinputdir myoutputdir However, if I try to run the wordcount

Questions regarding adding resource via Configuration

2008-10-06 Thread Tarandeep Singh
Hi, I have a configuration file (similar to hadoop-site.xml) and I want to include this file as a resource while running Map-Reduce jobs. Similarly, I want to add a jar file that is required by Mappers and Reducers ToolRunner.run( ...) allows me to do this easily, my question is can I add these

Re: Turning off FileSystem statistics during MapReduce

2008-10-06 Thread Nathan Marz
We see this on Maps and only on incrementBytesRead (not on incrementBytesWritten). It is on HDFS where we are seeing the time spent. It seems that this is because incrementBytesRead is called every time a record is read, while incrementBytesWritten is only called when a buffer is spilled.

Add jar file via -libjars - giving errors

2008-10-06 Thread Tarandeep Singh
Hi, I want to add a jar file (that is required by mappers and reducers) to the classpath. Initially I had copied the jar file to all the slave nodes in the $HADOOP_HOME/lib directory and it was working fine. However when I tried the libjars option to add jar files - $HADOOP_HOME/bin/hadoop jar

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Mahadev Konar
HI Tarandeep, the libjars options does not add the jar on the client side. Their is an open jira for that ( id ont remember which one)... Oyu have to add the jar to the HADOOP_CLASSPATH on the client side so that it gets picked up on the client side as well. mahadev On 10/6/08 2:30 PM,

Re: architecture diagram

2008-10-06 Thread Terrence A. Pietrondi
So looking at the following mapper... http://csvdatamix.svn.sourceforge.net/viewvc/csvdatamix/branches/datamix_mapreduce/src/com/datamix/pivot/PivotMapper.java?view=markup On line 32, you can see the row split via a delimiter. On line 43, you can see that the field index (the column index) is

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Tarandeep Singh
thanks Mahadev for the reply. So that means I have to copy my jar file in the $HADOOP_HOME/lib folder on all slave machines like before. One more question- I am adding a conf file (just like HADOOP_SITE.xml) via -conf option and I am able to query parameters in mapper/reducers. But is there a way

Re: architecture diagram

2008-10-06 Thread Alex Loddengaard
This mapper does follow my original suggestion, though I'm not familiar with how the delimiter works in this example. Anyone else? Alex On Mon, Oct 6, 2008 at 2:55 PM, Terrence A. Pietrondi [EMAIL PROTECTED] wrote: So looking at the following mapper...

Re: is 12 minutes ok for dfs chown -R on 45000 files ?

2008-10-06 Thread Allen Wittenauer
On 10/2/08 11:33 PM, Frank Singleton [EMAIL PROTECTED] wrote: Just to clarify, this is for when the chown will modify all files owner attributes eg: toggle all from frank:frank to hadoop:hadoop (see below) When we converted from 0.15 to 0.16, we chown'ed all of our files. The local

Why is super user privilege required for FS statistics?

2008-10-06 Thread Brian Bockelman
Hey all, I noticed something really funny about fuse-dfs: because super-user privileges are required to run the getStats function in FSNamesystem.java, my file systems show up as having 16 exabytes total and 0 bytes free. If I mount fuse-dfs as root, then I get the correct results from

Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Andy Li
Dears, Sorry, I did not mean to cross post. But the previous article was accidentally posted to the HBase user list. I would like to bring it back to the Hadoop user since it is confusing me a lot and it is mainly MapReduce related. Currently running version hadoop-0.18.1 on 25 nodes. Map and

Re: architecture diagram

2008-10-06 Thread Samuel Guo
I think what Alex talked about 'split' is the mapreduce system's action. What you said about 'split' is your mapper's action. I guess that your map/reduce application uses *TextInputFormat* to treat your input file. your input file will first be splitted into a few splits. these splits may be

Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Samuel Guo
Mapper's Number depends on your inputformat. Default Inputformat try to treat every file block of a file as a InputSplit. And you will get the same number of mappers as the number of your inputsplits. try to configure mapred.min.split.size to reduce the number of your mapper if you want to. And I

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Taeho Kang
Adding your jar files in the $HADOOP_HOME/lib folder works, but you would have to restart all your tasktrackers to have your jar files loaded. If you repackage your map-reduce jar file (e.g. hadoop-0.18.0-examples.jar) with your jar file and run your job with the newly repackaged jar file, it

Re: nagios to monitor hadoop datanodes!

2008-10-06 Thread Taeho Kang
The easiest approach I can think of is to write a simple Nagios plugin that checks if the datanode JVM process is alive. Or you may write a Nagios-plugin that checks for error or warning messages in datanode logs. (I am sure you can find quite a few log-checking Nagios plugin in nagiosplugin.org)

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Mahadev Konar
You can just add the jar to the env variable HADOOP_CLASSPATH If using bash Just do this : Export HADOOP_CLASSPATH=path to your class path on the client And then use the libjars option. mahadev On 10/6/08 2:55 PM, Tarandeep Singh [EMAIL PROTECTED] wrote: thanks Mahadev for the reply.

Re: Add jar file via -libjars - giving errors

2008-10-06 Thread Amareshwari Sriramadasu
Hi, From 0.19, the jars added using -libjars are available on the client classpath also, fixed by HADOOP-3570. Thanks Amareshwari Mahadev Konar wrote: HI Tarandeep, the libjars options does not add the jar on the client side. Their is an open jira for that ( id ont remember which one)...