regarding a query on the support of hadoop on windows.

2008-04-21 Thread Anish Damodaran
Hello Sir, I'm currently evaluating hadoop for windows. I would like to know the following 1. It is possible for us to use hadoop without Cygwin as of now? If not, how feasible is it to use modify the scripts to support windows? 2. Does the efficiently decrease on account of the fact that hadoop

Re: How to instruct Job Tracker to use certain hosts only

2008-04-21 Thread Owen O'Malley
On Apr 18, 2008, at 1:52 PM, Htin Hlaing wrote: I would like to run the first job to run on all the compute hosts in the cluster (which is by default) and then, I would like to run the second job with only on a subset of the hosts (due to some licensing issue). One option would be to set

Re: Run DfsShell command after your job is complete?

2008-04-21 Thread lohit
Yes FsShell.java implements most of the Shell commands. You could also use the FileSystem API http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html Simple example http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Thanks, Lohit - Original Message From:

Run DfsShell command after your job is complete?

2008-04-21 Thread Kayla Jay
Hello - Is there any way to run a DfsShell command after your job is complete within that same job run/main class? I.e after you're done with the maps and the reduces i want to directly move out of hdfs into local file system to load data into a database. can you run a DfsShell within your job

Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Ted Dunning
Another option that is little mentioned is webdav. This software, in particular, makes this pretty danged simple. http://could.it/main/a-simple-approach-to-webdav.html Since most of these requests involve read-only loads, HTTP access begins to be of real interest. On 4/21/08 4:59 PM, "Allen

Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Allen Wittenauer
On 4/21/08 2:18 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > I agree with the "fair and balanced" part. I always try to keep my clusters > fair and balanced! > > Joydeep should mention his background. In any case, I agree that high-end > filers may provide good enough NFS service, but I would

Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Ted Dunning
I agree with the "fair and balanced" part. I always try to keep my clusters fair and balanced! Joydeep should mention his background. In any case, I agree that high-end filers may provide good enough NFS service, but I would also contend that HDFS has been better for me than NFS from generic s

RE: jar files on NFS instead of DistributedCache

2008-04-21 Thread Joydeep Sen Sarma
as opposed to 200 boxes all not being able to talk to the Namenode? or the jobtracker? i think this is a topic that requires a little nuance. if there's a small cluster and a reliable (netapp) filer - then getting jar's off seems like a good alternative to consider. in 8 months of all of our u

Re: Using ArrayWritable of type IntWritable

2008-04-21 Thread Doug Cutting
CloudyEye wrote: What else do I have to override in "ArrayWritable " to get the IntWritable values written to the output files by the reducers? public String toString(); Doug

Re: incremental re-execution

2008-04-21 Thread Ted Dunning
It isn't that bad. Remember the input is sparse. Actually, the size of the original data provides a sharp bound on the size of the semi-aggregated data. In practice, the semi-aggregated data will have 1/k as many records as the original data where k is the average count. The records in the semi

Re: incremental re-execution

2008-04-21 Thread Shirley Cohen
Hi Ted, Thanks for your example. It's very interesting to learn about specific map reduce applications. It's non-obvious to me that it's a good idea to combine two map- reduce pairs by using the cross product of the intermediate states- you might wind up building an O(n^2) intermediate dat

Re: Hadoop "remembering" old mapred.map.tasks

2008-04-21 Thread Otis Gospodnetic
It turns out Hadoop was not remembering anything and the answer is in the FAQ: http://wiki.apache.org/hadoop/FAQ#13 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Otis Gospodnetic <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sen

Re: datanode files list

2008-04-21 Thread Konstantin Shvachko
> * It involves the NameNode. There is no other way. Namespace (file & directory) information is stored only on the name-node. Data-nodes know only about their blocks, but not about files they belong to. Shimi K wrote: I am using Hadoop HDFS as a distributed file system. On each DFS node I hav

Re: datanode files list

2008-04-21 Thread lohit
> > Right now I'm calling the NameNode in order to get the list of all the files > > in the cluster. For each file I check if it is a local file (one of the > > locations is the host of the node), if it is I read it. Instead of all datanodes trying to get list of locations, you could use FileSyst

Re: Any API used to get the last modified time of the File in HDFS?

2008-04-21 Thread Shimi K
String fileName = "file.txt" Configuration config = new Configuration(); FileSystem fs = FileSystem.get(config); Path filePath = new Path(fileName); FileStatus fileStatus = fs.getFileStatus(filePath); long modificationTime = fileStatus.getModificationTime(); On Mon, Apr 21, 2008 at 5:35 AM, Samuel

Re: datanode files list

2008-04-21 Thread Shimi K
Do you remember the "Caching frequently map input files" thread? http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200802.mbox/[EMAIL PROTECTED] On Mon, Apr 21, 2008 at 8:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > This is kind of odd that you are doing this. It really sounds lik

Re: datanode files list

2008-04-21 Thread Ted Dunning
This is kind of odd that you are doing this. It really sounds like a replication of what hadoop is doing. Why not just run a map process and have hadoop figure out which blocks are where? Can you say more about *why* you are doing this, not just what you are trying to do? On 4/21/08 10:28 AM

Re: datanode files list

2008-04-21 Thread Shimi K
I am using Hadoop HDFS as a distributed file system. On each DFS node I have another process which needs to read the local HDFS files. Right now I'm calling the NameNode in order to get the list of all the files in the cluster. For each file I check if it is a local file (one of the locations is th

Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Ted Dunning
+1 to that. Better to be able to store jar files in HDFS. On 4/21/08 3:20 AM, "Steve Loughran" <[EMAIL PROTECTED]> wrote: > Joydeep Sen Sarma wrote: >> i would love this feature. it does not exist currently. if we set the >> classpath for the tasktracker - then as mentioned - it's for all the

Re: datanode files list

2008-04-21 Thread Ted Dunning
Datanodes don't necessarily contain complete files. It is possible to enumerate all files and to find out which datanodes host different blocks from these files. What did you need to do? On 4/21/08 2:11 AM, "Shimi K" <[EMAIL PROTECTED]> wrote: > Is there a way to get the list of files on each

hadoop 0.16.3 problems to submit job

2008-04-21 Thread Andreas Kostyrka
Hi! When I'm trying to submit a streaming job to my EC2/S3 cluster, I'm getting the following errors: additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar53992/] [] /tmp/streamjob53993.jar tmpDir=null ^[[1;5A08/04/2

Re: New bee quick questions :-)

2008-04-21 Thread Allen Wittenauer
On 4/21/08 3:36 AM, "vikas" <[EMAIL PROTECTED]> wrote: Most of your questions have been answered by Luca, from what I can see, so let me tackle the rest a bit... > 4) Let us suppose I want to shutdown one datanode for maintenance purpose. > is there any way to inform Hadoop saying that th

Re: New bee quick questions :-)

2008-04-21 Thread Luca
vikas wrote: Hi, I'm new to HADOOP. And aiming to develop good amount of code with it. I've some quick questions it would be highly appreciable if some one can answer them. I was able to run HADOOP in cygwin environment. run the examples both in standalone mode as well as in a 2 node cluster.

Using ArrayWritable of type IntWritable

2008-04-21 Thread CloudyEye
Hi, >From the API , I should create new class as follows: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } } In the reducer, When executing OutputCollector.collect(WritableComparable key, IntArrayW

Re: Error in start up

2008-04-21 Thread Aayush Garg
Could anyone please help me with this error below ? I am not able to start HDFS due to this? Thanks, On Sat, Apr 19, 2008 at 7:25 PM, Aayush Garg <[EMAIL PROTECTED]> wrote: > I have my hadoop-site.xml correct !! but it creates error in this way > > > On Sat, Apr 19, 2008 at 6:35 PM, Stuart Sierr

Re: Splitting in various files

2008-04-21 Thread Aayush Garg
I just tried the same thing (mapred.task.id)as you told..But I am getting one file named null in my directory. On Mon, Apr 21, 2008 at 8:33 AM, Amar Kamat <[EMAIL PROTECTED]> wrote: > Aayush Garg wrote: > > > Could anyone please tell? > > > > On Sat, Apr 19, 2008 at 1:33 PM, Aayush Garg <[EMAIL P

RE: datanode files list

2008-04-21 Thread dhruba Borthakur
You should be able to run "bin/hadoop fsck -files -blocks -locations /" and get a listing of all files and the datanode(s) that each block of the file resides in. Thanks, dhruba -Original Message- From: Shimi K [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 2:12 AM To: core-user@

New bee quick questions :-)

2008-04-21 Thread vikas
Hi, I'm new to HADOOP. And aiming to develop good amount of code with it. I've some quick questions it would be highly appreciable if some one can answer them. I was able to run HADOOP in cygwin environment. run the examples both in standalone mode as well as in a 2 node cluster. 1) How can I ov

Re: jar files on NFS instead of DistributedCache

2008-04-21 Thread Steve Loughran
Joydeep Sen Sarma wrote: i would love this feature. it does not exist currently. if we set the classpath for the tasktracker - then as mentioned - it's for all the tasks. if the classpath can be set on a per task basis - that works as an excellent solution with a nfs based environment for spec

Re: Interleaving maps/reduces from multiple jobs on the same tasktracker

2008-04-21 Thread Amar Kamat
Amar Kamat wrote: Jiaqi Tan wrote: Hi, Will Hadoop ever interleave multiple maps/reduces from different jobs on the same tasktracker? No. Suppose I have 2 jobs submitted to a jobtracker, one after the other. Must all maps/reduces from the first submitted job be completed before the tasktra

datanode files list

2008-04-21 Thread Shimi K
Is there a way to get the list of files on each datanode? I need to be able to get all the names of the files on a specific datanode? is there a way to do it?