Re: Hadoop Installation

2008-11-25 Thread Steve Loughran
Mithila Nagendra wrote: Hey Steve I deleted what ever I needed to.. still no luck.. You said that the classpath might be messed up.. Is there some way I can reset it? For the root user? What path do I set it to. Let's start with what kind of machine is this? Windows? or Linux. If Linux,

Datanode log for errors

2008-11-25 Thread Taeho Kang
Hi, I have encountered some IOExceptions in Datanode, while some intermediate/temporary map-reduce data is written to HDFS. 2008-11-25 18:27:08,070 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-460494523413678075 received exception java.io.IOException: Block blk_-460494523413678075 is

Re: Getting Reduce Output Bytes

2008-11-25 Thread Sharad Agarwal
Is there an easy way to get Reduce Output Bytes? Reduce Output bytes not available directly but perhaps can be inferred from File system Read/Write bytes counters.

java.lang.OutOfMemoryError: Direct buffer memory

2008-11-25 Thread tim robertson
Hi all, I am doing a very simple Map that determines an integer value to assign to an input (1-64000). The reduction does nothing, but I then use this output formatter to put the data in a file per Key. public class CellBasedOutputFormat extends MultipleTextOutputFormatWritableComparable,

Re: Getting Reduce Output Bytes

2008-11-25 Thread Paco NATHAN
Hi Lohit, Our teams collects those kinds of measurements using this patch: https://issues.apache.org/jira/browse/HADOOP-4559 Some example Java code in the comments shows how to access the data, which is serialized as JSON. Looks like the red_hdfs_bytes_written value would give you that.

Re: Block placement in HDFS

2008-11-25 Thread Dhruba Borthakur
Hi Dennis, There were some discussions on this topic earlier: http://issues.apache.org/jira/browse/HADOOP-3799 Do you have any specific use-case for this feature? thanks, dhruba On Mon, Nov 24, 2008 at 10:22 PM, Owen O'Malley [EMAIL PROTECTED] wrote: On Nov 24, 2008, at 8:44 PM, Mahadev

Hadoop complex calculations

2008-11-25 Thread Chris Quach
Hi, I'm testing Hadoop to see if we could use for complex calculations next to the 'standard' implementation. I've set up a grid with 10 nodes and if I run the RandomTextWriter example only 2 nodes are used as mappers, while I specified 10 mappers to be used. The other nodes are used for storage,

Re: Getting Reduce Output Bytes

2008-11-25 Thread Lohit
Thanks sharad and paco. Lohit On Nov 25, 2008, at 5:34 AM, Paco NATHAN [EMAIL PROTECTED] wrote: Hi Lohit, Our teams collects those kinds of measurements using this patch: https://issues.apache.org/jira/browse/HADOOP-4559 Some example Java code in the comments shows how to access the data,

Re: Hadoop Installation

2008-11-25 Thread Mithila Nagendra
Hey steve The version is: Linux enpc3740.eas.asu.edu 2.6.9-67.0.20.EL #1 Wed Jun 18 12:23:46 EDT 2008 i686 i686 i386 GNU/Linux, this is what I got when I used the command uname -a On Tue, Nov 25, 2008 at 1:50 PM, Steve Loughran [EMAIL PROTECTED] wrote: Mithila Nagendra wrote: Hey Steve I

Re: Hadoop Installation

2008-11-25 Thread Steve Loughran
Mithila Nagendra wrote: Hey steve The version is: Linux enpc3740.eas.asu.edu 2.6.9-67.0.20.EL #1 Wed Jun 18 12:23:46 EDT 2008 i686 i686 i386 GNU/Linux, this is what I got when I used the command uname -a On Tue, Nov 25, 2008 at 1:50 PM, Steve Loughran [EMAIL PROTECTED] wrote: Mithila Nagendra

Question about ChainMapper and ChainReducer

2008-11-25 Thread Tarandeep Singh
Hi, I would like to know how does ChainMapper and ChainReducer save IO ? The doc says the output of first mapper becomes the input of second and so on. So does this mean, the output of first map is *not* written to HDFS and a second map process is started that operates on the data generated by

Lookup HashMap available within the Map

2008-11-25 Thread tim robertson
Hi all, If I want to have an in memory lookup Hashmap that is available in my Map class, where is the best place to initialise this please? I have a shapefile with polygons, and I wish to create the polygon objects in memory on each node's JVM and have the map able to pull back the objects by id

Re: Lookup HashMap available within the Map

2008-11-25 Thread Alex Loddengaard
You should use the DistributedCache: http://www.cloudera.com/blog/2008/11/14/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/ and http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Hope this helps! Alex On Tue, Nov 25, 2008 at 11:09 AM, tim robertson

Re: Lookup HashMap available within the Map

2008-11-25 Thread tim robertson
Hi Thanks Alex - this will allow me to share the shapefile, but I need to one time only per job per jvm read it, parse it and store the objects in the index. Is the Mapper.configure() the best place to do this? E.g. will it only be called once per job? Thanks Tim On Tue, Nov 25, 2008 at 8:12

Re: Lookup HashMap available within the Map

2008-11-25 Thread Doug Cutting
tim robertson wrote: Thanks Alex - this will allow me to share the shapefile, but I need to one time only per job per jvm read it, parse it and store the objects in the index. Is the Mapper.configure() the best place to do this? E.g. will it only be called once per job? In 0.19, with

Re: Lookup HashMap available within the Map

2008-11-25 Thread tim robertson
Hi Doug, Thanks - it is not so much I want to run in a single JVM - I do want a bunch of machines doing the work, it is just I want them all to have this in-memory lookup index, that is configured once per job. Is there some hook somewhere that I can trigger a read from the distributed cache, or

Re: Block placement in HDFS

2008-11-25 Thread Pete Wyckoff
Fyi - Owen is referring to: https://issues.apache.org/jira/browse/HADOOP-2559 On 11/24/08 10:22 PM, Owen O'Malley [EMAIL PROTECTED] wrote: On Nov 24, 2008, at 8:44 PM, Mahadev Konar wrote: Hi Dennis, I don't think that is possible to do. No, it is not possible. The block placement

Re: Lookup HashMap available within the Map

2008-11-25 Thread tim robertson
Thanks Chris, I have a different test running, then will implement that. Might give cascading a shot for what I am doing. Cheers Tim On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel [EMAIL PROTECTED] wrote: Hey Tim The .configure() method is what you are looking for i believe. It is

Re: Lookup HashMap available within the Map

2008-11-25 Thread Chris K Wensel
cool. If you need a hand with Cascading stuff, feel free to ping me on the mail list or #cascading irc. lots of other friendly folk there already. ckw On Nov 25, 2008, at 12:35 PM, tim robertson wrote: Thanks Chris, I have a different test running, then will implement that. Might give

Problems running TestDFSIO to a non-default directory

2008-11-25 Thread Joel Welling
Hi Konstantin (et al.); A while ago you gave me the following trick to run TestDFSIO to an output directory other than the default- just use -Dtest.build.data=/output/dir to pass the new directory to the executable. I recall this working, but it is failing now under 0.18.1, and looking at it I

64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread Sagar Naik
I am trying to migrate from 32 bit jvm and 64 bit for namenode only. *setup* NN - 64 bit Secondary namenode (instance 1) - 64 bit Secondary namenode (instance 2) - 32 bit datanode- 32 bit From the mailing list I deduced that NN-64 bit and Datanode -32 bit

Re: 64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread Allen Wittenauer
On 11/25/08 3:58 PM, Sagar Naik [EMAIL PROTECTED] wrote: I am trying to migrate from 32 bit jvm and 64 bit for namenode only. *setup* NN - 64 bit Secondary namenode (instance 1) - 64 bit Secondary namenode (instance 2) - 32 bit datanode- 32 bit

Re: 64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread lohit
I might be wrong, but my assumption is running SN either in 64/32 shouldn't matter. But I am curious how two instances of Secondary namenode is setup, will both of them talk to same NN and running in parallel? what are the advantages here. Wondering if there are chances of image corruption.

Re: 64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread Sagar Naik
lohit wrote: I might be wrong, but my assumption is running SN either in 64/32 shouldn't matter. But I am curious how two instances of Secondary namenode is setup, will both of them talk to same NN and running in parallel? what are the advantages here. I just have multiple entries master

Re: 64 bit namenode and secondary namenode 32 bit datanode

2008-11-25 Thread lohit
Well, if I think about, image corruption might not happen, since each checkpoint initiation would have unique number. I was just wondering what would happen in this case Consider this scenario. Time 1 -- SN1 asks NN image and edits to merge Time 2 -- SN2 asks NN image and edits to merge Time 2

Filesystem closed errors

2008-11-25 Thread Bryan Duxbury
I have an app that runs for a long time with no problems, but when I signal it to shut down, I get errors like this: java.io.IOException: Filesystem closed at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196) at

Re: Block placement in HDFS

2008-11-25 Thread Hyunsik Choi
Hi All, I try to divide some data into partitions explicitly (like regions of Hbase). I wonder the following way to do is the best method. For example, when we assume a block size 64MB, a file potion corresponding to 0~63MB is allocated to first block? I have three questions as follows: Is the

HDFS directory listing from the Java API?

2008-11-25 Thread Shane Butler
Hi all, Can someone pls guide me on how to get a directory listing of files on HDFS using the java API (0.19.0)? Regards, Shane

Re: Filesystem closed errors

2008-11-25 Thread David B. Ritch
Do you have speculative execution enabled? I've seen error messages like this caused by speculative execution. David Bryan Duxbury wrote: I have an app that runs for a long time with no problems, but when I signal it to shut down, I get errors like this: java.io.IOException: Filesystem

Re: Filesystem closed errors

2008-11-25 Thread Hong Tang
Does your code ever call fs.close()? If so, https://issues.apache.org/ jira/browse/HADOOP-4655 might be relevant to your problem. On Nov 25, 2008, at 9:07 PM, David B. Ritch wrote: Do you have speculative execution enabled? I've seen error messages like this caused by speculative execution.

How to retrieve rack ID of a datanode

2008-11-25 Thread Ramya R
Hi all, I want to retrieve the Rack ID of every datanode. How can I do this? I tried using getNetworkLocation() in org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack as the output for all datanodes. Any advice? Thank in advance Ramya

Re: How to retrieve rack ID of a datanode

2008-11-25 Thread Amar Kamat
Ramya R wrote: Hi all, I want to retrieve the Rack ID of every datanode. How can I do this? I tried using getNetworkLocation() in org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack as the output for all datanodes. Have you setup the cluster to be rack-aware?

Re: 64 bit namenode and secondary namenode 32 bit datanod

2008-11-25 Thread Dhruba Borthakur
The design is such that running multiple secondary namenodes should not corrupt the image (modulo any bugs). Are you seeing image corruptions when this happens? You can run all or any daemons in 32-bit mode or 64 bit-mode. You can mix-and-match. If you have many millions of files, then you might

RE: How to retrieve rack ID of a datanode

2008-11-25 Thread Ramya R
Hi Lohit, I have not set the datanode to tell namenode which rack it belongs to. Can you please tell me how do I do it? Is it using setNetworkLocation()? My intention is to kill the datanodes in a given rack. So it would be useful even if I obtain the subnet each datanode belongs to. Thanks

Switching to HBase from HDFS

2008-11-25 Thread Shimi K
I have a system which uses HDFS to store files on multiple nodes. On each HDFS node machine I have another application which reads the local files. Until know my system worked only with files, HDFS seemed like the right solution and everything worked fine. Now I need to save additional information

Re: How to retrieve rack ID of a datanode

2008-11-25 Thread Yi-Kai Tsai
hi Ramya Setup topology.script.file.name in your hadoop-site.xml and the script. check http://hadoop.apache.org/core/docs/current/cluster_setup.html , Hadoop Rack Awareness section. Hi Lohit, I have not set the datanode to tell namenode which rack it belongs to. Can you please tell me

Re: Switching to HBase from HDFS

2008-11-25 Thread Yi-Kai Tsai
Hi Shimi HBase (or BigTable) is a sparse, distributed, persistent multidimensional sorted map , Jim R. Wilson have a excellent article for understanding it : http://jimbojw.com/wiki/index.php?title=Understanding_HBase_and_BigTable I have a system which uses HDFS to store files on multiple

How We get old version of Haddop

2008-11-25 Thread Rashid Ahmad
Dear Freinds, How we get hadoop old version. -- Regards, Rashid Ahmad

how can I decommission nodes on-the-fly?

2008-11-25 Thread Jeremy Chow
Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each slaver specified in

how can I decommission nodes on-the-fly?

2008-11-25 Thread Jeremy Chow
Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each slaver specified in

Re: how can I decommission nodes on-the-fly?

2008-11-25 Thread Amareshwari Sriramadasu
Jeremy Chow wrote: Hi list, I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then refreshed my cluster with command bin/hadoop dfsadmin -refreshNodes It showed that it can only shut down the DataNode process but not included the TaskTracker process on each