where is the hdfs api doc?

2009-11-18 Thread Y G
hi guys i want to take a look at hdfs api doc but i can not find it in the dist docs dir. there is no org.apache.hadoop.hdfs api doc. i think maybe i missed something. -- 从我的移动设备发送 - 天天开心 身体健康

Re: showing hadoop status UI

2009-11-18 Thread Steve Loughran
Mark N wrote: I want to show the status of M/R jobs on user interface , should i read the default hadoop counters to display some kind of map/ reduce tasks? I could read the status of map/reduce task using Jobclient ( hadoop default counters ) . I can then have a java websevice exposing thes

Re: How to handle imbalanced data in hadoop ?

2009-11-18 Thread Amogh Vasekar
Hi, This is the time for all three phases of reducer right? I think its due to the constant spilling for a single key to disk since the map partitions couldn't be held in-mem due to buffer limit. Did the other reducer have numerous keys with low number of values ( ie smaller partitions? ) Thanks

Re: new MR API:MutilOutputFormat

2009-11-18 Thread Amogh Vasekar
MultipleOutputFormat and MOS are to be merged : http://issues.apache.org/jira/browse/MAPREDUCE-370 Amogh On 11/18/09 12:03 PM, "Y G" wrote: in the old MR API ,there is MutilOutputFormat class which i can use to custom the reduce output file name. it's very useful for me. but i can't find it i

Re: new MR API:MutilOutputFormat

2009-11-18 Thread Y G
thank you ,Amogh. i will patch it. - 天天开心 身体健康 Sent from Nanjing, Jiangsu, China Stephen Leacock - "I detest life-insurance agents: they always argue that I shall some day die, which is not so." - http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html 2009/11/18 Amogh Vasekar : >

Re: where is the hdfs api doc?

2009-11-18 Thread bharath v
look in {hadoop-home}/docs/java/ ! On Wed, Nov 18, 2009 at 3:26 PM, Y G wrote: > hi guys > i want to take a look at hdfs api doc but i can not find it in the > dist docs dir. there is no org.apache.hadoop.hdfs api doc. i think > maybe i missed something. > > -- > 从我的移动设备发送 > > - > 天天开心

Re: where is the hdfs api doc?

2009-11-18 Thread Y G
Hi bharath my version is 0.20.1 there is no {hadoop-home}/docs/java/ dir. and i can not find hdfs api in hadoop-0.20.1\docs\api\org\apache\hadoop - 天天开心 身体健康 Sent from Nanjing, Jiangsu, China Marie von Ebner-Eschenbach

Re: New graphic interface for Hadoop - Contains: FileManager, Daemon Admin, Quick Stream Job Setup, etc

2009-11-18 Thread Ed Kohlwey
The tool looks interesting. You should consider providing the source for it. Is it written in a language that can run on platforms besides windows? On Nov 17, 2009 10:40 AM, "Cubic" wrote: Hi list. This tool is a graphic interface for Hadoop. It may improove your productivity quite a bit, especi

Re: Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-18 Thread John Martyniak
So if I rollback to Ganglia 3.0.x does this problem go away, and everything should work? -John On Nov 17, 2009, at 9:08 PM, Brian Bockelman wrote: Hey John, You need the latest version of this patch for the 0.19.x branch: http://issues.apache.org/jira/browse/HADOOP-4675 Sadly, the patch i

java.io.FileNotFoundException: File file:/tmp/.../job.xml does not exist

2009-11-18 Thread Christoph
Hi all, I'm hitting the same issue as first described in: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200905.mbox/%3cbd0e4dee0905181624o42c48404l9bcf8678e62a0...@mail.gmail.com%3e But there is no answer yet ... I'm using Hadoop 0.2 and recieve the following when I try to run a ma

names or ips in rack awareness script?

2009-11-18 Thread David J. O'Dell
I'm trying to figure out if I should use ip addresses or dns names in my rack awareness script. Its easier for me to use dns names because we have the row and rack number in the name which means I can dynamically determine the rack without having to manually update the list when adding nodes.

Re: names or ips in rack awareness script?

2009-11-18 Thread Michael Thomas
IPs are passed to the rack awareness script. We use 'dig' to do the reverse lookup to find the hostname, as we also embed the rack id in the worker node hostnames. --Mike On 11/18/2009 08:20 AM, David J. O'Dell wrote: I'm trying to figure out if I should use ip addresses or dns names in my r

Re: New graphic interface for Hadoop - Contains: FileManager, Daemon Admin, Quick Stream Job Setup, etc

2009-11-18 Thread CubicDesign
Ed Kohlwey wrote: Is it written in a language that can run on platforms besides windows? The language is Lazarus so it can run or Win, Mac, Linux. On Windows it will need something like Putty or any other SSH client.

Re: java.io.FileNotFoundException: File file:/tmp/.../job.xml does not exist

2009-11-18 Thread Ahmad Ali Iqbal
Dear Christoph My understanding is this error is due to the directory access. I would suggest for simplicity that setup your namenode and datanode from hadoop main directory and then run this example. -- Ahmad Ali Iqbal http://member.acm.org/~ahmad.iqbal On Thu, Nov 19, 2009 at 3:14 AM, Christop

Re: names or ips in rack awareness script?

2009-11-18 Thread Edward Capriolo
On Wed, Nov 18, 2009 at 11:28 AM, Michael Thomas wrote: > IPs are passed to the rack awareness script.  We use 'dig' to do the reverse > lookup to find the hostname, as we also embed the rack id in the worker node > hostnames. > > --Mike > > On 11/18/2009 08:20 AM, David J. O'Dell wrote: >> >> I'm

how to run the mapreduce job from localfilesystem but in hadoop cluster(fully distributed mode)

2009-11-18 Thread Roshan Karki
Hi , I am Roshan and I have setup hadoop in a fully distributed mode(hadoop cluster). The default file sytem is hdfs.I have two nodes cluster. The map reduce job works fine when I give inputfile from hdfs location and output is also generated in hdfs when running WordCount example. My h

execute multiple MR jobs

2009-11-18 Thread Gang Luo
HI all, I am going to execute multiple mapreduce jobs in sequence, but whether or not to execute a job in that sequence could not be determined beforehand, but depend on the result of the previous job. Is there anyone with some ideas how to do this 'dynamically"? p.s. I guess cascading could he

Re: How to handle imbalanced data in hadoop ?

2009-11-18 Thread Runping Qi
Is it true that most of the 17 minutes for the reducer with the 10 same keys was taken by the sort phase? If so, that means that the sorting algorithm does not handle the special case well. On Wed, Nov 18, 2009 at 11:16 AM, Pankil Doshi wrote: > Hey Todd, > > I will attach dataset and java

Join Documentation Correct?

2009-11-18 Thread Edmund Kohlwey
I'm using Cloudera's distribution for Hadoop 0.20.1 + 133 The javadocs for package org.apache.hadoop.mapred.join state " For a given key, each operation will consider the cross product of all values for all sources at that node" I'm doing an inner join between two tables with a text key. One

SVG Logo?

2009-11-18 Thread Edmund Kohlwey
Does anyone know if the hadoop logo is available as an SVG?

Re: names or ips in rack awareness script?

2009-11-18 Thread Allen Wittenauer
On 11/18/09 10:02 AM, "Edward Capriolo" wrote: > It was never clear to me what would be needed ip vs hostname. I > specified ip, short hostnames, and long hostnames just to be safe. And > you know things sometimes change with hadoop ::wink-wink:: IIRC, everything is pretty much passed around as I

Re: How to handle imbalanced data in hadoop ?

2009-11-18 Thread Pankil Doshi
Ya thats true. Though it depends on my cluster configuration but still other reducers (0 to 8) also have 10 keys to handle in which keys are different and they get done in (1 min 30 sec on avg). But 9th reducer getting all 10 keys same takes 17 mins. Pankil On Wed, Nov 18, 2009 at 3:34 PM

Re: How to handle imbalanced data in hadoop ?

2009-11-18 Thread Todd Lipcon
Hi Pankil, Thanks for sending these along. I'll try to block out some time this week to take a look. -Todd On Wed, Nov 18, 2009 at 11:16 AM, Pankil Doshi wrote: > Hey Todd, > > I will attach dataset and java source used by me. Make sure you use with 10 > reducers and also use partitioner clas

Fw: Alternative distributed filesystem.

2009-11-18 Thread Reshu Jain
Hi I wanted to propose IBM's Global Parallel File System™ (GPFS™ ) as the distributed filesystem. GPFS™ is well known for its unmatched scalability and leadership in file system performance, and it is now IBM’s premier storage virtualization solution. More information at http://www-03.ibm.com/s

Re: Hadoop 0.19.2 and Ganglia 3.1.3

2009-11-18 Thread Y G
yes 2009/11/18, John Martyniak : > So if I rollback to Ganglia 3.0.x does this problem go away, and > everything should work? > > -John > > On Nov 17, 2009, at 9:08 PM, Brian Bockelman wrote: > >> Hey John, >> >> You need the latest version of this patch for the 0.19.x branch: >> >> http://issues.

Re: execute multiple MR jobs

2009-11-18 Thread Amogh Vasekar
Hi, JobClient (.18) / Job(.20) class apis should help you achieve this. Amogh On 11/19/09 1:40 AM, "Gang Luo" wrote: HI all, I am going to execute multiple mapreduce jobs in sequence, but whether or not to execute a job in that sequence could not be determined beforehand, but depend on the r

Re: where is the hdfs api doc?

2009-11-18 Thread Y G
i checked the src build.xml and i find that the hdfs api only included in the general public release unless ant javadoc-dev target . so for the hdfs developing, it needs to generated manually. - 天天开心 身体健康 Ogden Nash - "The trouble