Re: data partitioning question

2008-08-04 Thread Shirley Cohen
Thanks, Qin. It sounds like you're saying that this type of partitioning needs its own map-reduce set. I was hoping it could be done in the InputFormat class :)) Shirley On Aug 4, 2008, at 2:49 PM, Qin Gao wrote: For the first question, I think it is better to do it at reduce stage, because

Re: Examples of using DFS without MapReduce

2008-08-04 Thread Michael Bieniosek
To be honest, I have permissions turned off on my DFS (set the config variable "dfs.permissions" to be "false"). Poking around in the source code, from hadoop/core/trunk/src/core/org/apache/hadoop/security/UnixUserGroupInformation.java it looks like you can set the config variable "hadoop.job.u

Re: mapper input file name

2008-08-04 Thread Kevin
OK. I guess I find out how. Override the "configure" method of user defined Map class so that you can take note of the filename. -Kevin On Mon, Aug 4, 2008 at 3:53 PM, Kevin <[EMAIL PROTECTED]> wrote: > Is it possible to get this information in user defined map function? > i.e., how do we get t

Re: having different HADOOP_HOME for master and slaves?

2008-08-04 Thread Meng Mao
assumption -- if I run stop-all.sh _successfully_ on a Hadoop deployment (which means every node in the grid is using the same path to Hadoop), then that Hadoop installation becomes invisible, and then any other Hadoop deployment could start up and take its place on the grid. Let me know if this as

Re: mapper input file name

2008-08-04 Thread Kevin
Is it possible to get this information in user defined map function? i.e., how do we get the JobConf object in map() function? Another way is to subclass RecordReader to embed file-name in the data, which does not look simple. -Kevin On Sun, Aug 3, 2008 at 10:17 PM, Amareshwari Sriramadasu <[E

Re: Examples of using DFS without MapReduce

2008-08-04 Thread Kevin
Thank you! The java code is exactly what I want. Following your code, I encounter the user permission issue when trying to write to a file. I wonder if the user id could be manipulated in the protocol. -Kevin On Mon, Aug 4, 2008 at 2:27 PM, Michael Bieniosek <[EMAIL PROTECTED]> wrote: > You ca

Re: Examples of using DFS without MapReduce

2008-08-04 Thread Michael Bieniosek
You can make shell calls: hadoop/bin/hadoop fs -fs namenode.example.com:1 -ls / If you're in java, you can use the org.apache.hadoop.fs.FileSystem class: Configuration config = new Configuration(); config.set("fs.default.name", "namenode.example.com:1") FileSystem fs = FileSystem.get(con

Examples of using DFS without MapReduce

2008-08-04 Thread Kevin
Hi there, I am trying to use the DFS of hadoop in other applications. It is not clear to me how that could be carried out easily. Could any one give a direction to go or examples? Thank you. -Kevin

Re: having different HADOOP_HOME for master and slaves?

2008-08-04 Thread Meng Mao
I see. I think I could also modify the hadoop-env.sh in the new conf/ folders per datanode to point to the right place for HADOOP_HOME. On Mon, Aug 4, 2008 at 3:21 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > > > On 8/4/08 11:10 AM, "Meng Mao" <[EMAIL PROTECTED]> wrote: > > I suppose I cou

Re: data partitioning question

2008-08-04 Thread Qin Gao
For the first question, I think it is better to do it at reduce stage, because the partitioner only consider the size of blocks in bytes. Instead you can output the intermediate key/value pair as this: key: 1 if C=1,3,5,7. 0 otherwise value: the tuple. In reducer you can have a reducer deal w

data partitioning question

2008-08-04 Thread Shirley Cohen
Hi, I want to implement some data partitioning logic where a mapper is assigned a specific range of values. Here is a concrete example of what I have in mind: Suppose I have attributes A, B, C and the following tuples: (A, B, C) (1, 3, 1) (1, 2, 2) (1, 2, 3) (12, 3, 4) (12, 2, 5) (12, 8, 6

Re: having different HADOOP_HOME for master and slaves?

2008-08-04 Thread Allen Wittenauer
On 8/4/08 11:10 AM, "Meng Mao" <[EMAIL PROTECTED]> wrote: > I suppose I could, for each datanode, symlink things to point to the actual > Hadoop installation. But really, I would like the setup that is hinted as > possible by statement 1). Is there a way I could do it, or should that bit > of do

having different HADOOP_HOME for master and slaves?

2008-08-04 Thread Meng Mao
I'm trying to set up 2 Hadoop installations on my master node, one of which will have permissions that allow more users to run Hadoop. But I don't really need anything different on the datanodes, so I'd like to keep those as-is. With that switch, the HADOOP_HOME on the master will be different from

Re: EOFException while starting name node

2008-08-04 Thread lohit
We had seen similar exception earlier reported by others on the list. What you might want to try is to use a hex editor or equivalent to open up 'edits' and get rid of the last record. In all cases, the last record might not be complete so your namenode is not starting. Once you update your edit

Re: EOFException while starting name node

2008-08-04 Thread steph
2008-08-03 21:58:33,108 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000 2008-08-03 21:58:33,109 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.UTF8.readFields(UTF8.ja

Re: EOFException while starting name node

2008-08-04 Thread steph
I have the same thing: ERROR dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:87) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:455) at o

EOFException while starting name node

2008-08-04 Thread Wanjari, Amol
I'm getting the following exceptions while starting the name node - ERROR dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:87) at org.apache.hadoop.dfs.FSEditL

RE: How can I control Number of Mappers of a job?

2008-08-04 Thread Goel, Ankur
This can be done very easily setting the number of mappers you want - jobConf.setNumMapTasks() and use input format - MultiFileWordCount.MyInputFormat.class which is a concrete implementation of MultiFileInputFormat. -Original Message- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Sa