Re: Percent progress of map/reduce in JobClient

2008-06-04 Thread Tanton Gibbs
Doesn't seem to be, to me. Seems to be an indicator of records On Wed, Jun 4, 2008 at 5:53 PM, Daniel Blaisdell <[EMAIL PROTECTED]> wrote: > Is the map progress indicator computed as a percentage of maps completed? > > -Daniel > > On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs <[EMAIL PROTECTED]>

Re: Monthly user group meeting

2008-06-04 Thread Otis Gospodnetic
Hi, Any chance the videos will be taken *and* made available outside Yahoo? The videos from the Hadoop summit are still not available: http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html And at this point it looks like they never will be available :( Thanks, Ot

Re: [core-user] Help deflating output files

2008-06-04 Thread Jim R. Wilson
Has someone already written a generic deflator program? It would be a great util to add to the core :) -- Jim On Wed, Jun 4, 2008 at 7:27 PM, Runping Qi <[EMAIL PROTECTED]> wrote: > > You can run another map-only job to read convert the deflated files and > write them out in the format you want.

RE: [core-user] Help deflating output files

2008-06-04 Thread Runping Qi
You can run another map-only job to read convert the deflated files and write them out in the format you want. Runping > -Original Message- > From: Jim R. Wilson [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 04, 2008 4:13 PM > To: core-user@hadoop.apache.org > Subject: [core-user] H

Re: compressed/encrypted file

2008-06-04 Thread Parand Darugar
- Original Message - From: [EMAIL PROTECTED] <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Wed Jun 04 15:06:42 2008 Subject: Re: compressed/encrypted file You can compress / decompress at many points: --prior to mapping --after mapping --after reducing (I've been experim

[core-user] Help deflating output files

2008-06-04 Thread Jim R. Wilson
Hi all, I'm using hadoop-streaming to execute Python jobs in an EC2 cluster. The output directory in HDFS has part-0.deflate files - how can I deflate them back into regular text? In my hadoop-site.xml, I unfortunately have: mapred.output.compress true mapred.output.compression.type

Re: Percent progress of map/reduce in JobClient

2008-06-04 Thread Daniel Blaisdell
Is the map progress indicator computed as a percentage of maps completed? -Daniel On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs <[EMAIL PROTECTED]> wrote: > From what I've read, there are three reduce phases 1. copy 2. sort 3. > reduce > From 0 - 33% is the copy phase. I guess if you don't need

Re: compressed/encrypted file

2008-06-04 Thread Arun C Murthy
Haijun, On Jun 4, 2008, at 3:45 PM, Haijun Cao wrote: Mile, Thanks. "If your inputs to maps are compressed, then you don't get any automatic assignment of mappers to your data: each gzipped file gets assigned a mapper." <--- this is the case I am talking about. With the current compres

Re: Percent progress of map/reduce in JobClient

2008-06-04 Thread Tanton Gibbs
>From what I've read, there are three reduce phases 1. copy 2. sort 3. reduce >From 0 - 33% is the copy phase. I guess if you don't need that phase it could skip this completely. After 33%, it waits until it is done sorting before outputting status again at 66%, then it updates regularly during th

RE: compressed/encrypted file

2008-06-04 Thread Haijun Cao
Mile, Thanks. "If your inputs to maps are compressed, then you don't get any automatic assignment of mappers to your data: each gzipped file gets assigned a mapper." <--- this is the case I am talking about. Haijun -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] O

Re: compressed/encrypted file

2008-06-04 Thread Miles Osborne
You can compress / decompress at many points: --prior to mapping --after mapping --after reducing (I've been experimenting with all these options; we have been crawling blogs every day since Feb and we store on DFS compressed sets of posts) If your inputs to maps are compressed, then you don't

Monthly user group meeting

2008-06-04 Thread Ajay Anand
The next user group meeting is scheduled for June 18th from 6-7:30 pm at the Yahoo! Mission College campus (2821 Mission College, Santa Clara). Registration, driving directions etc are at http://upcoming.yahoo.com/event/760573/ Agenda: 1) Hadoop at Facebook, Hive - Jeff Hammerbacher 2)

compressed/encrypted file

2008-06-04 Thread Haijun Cao
If a file is compressed and encrypted, then is it still possible to split it and run mappers in parallel? Do people compress their files stored in hadoop? If yes, how do you go about processing them in parallel? Thanks Haijun

Percent progress of map/reduce in JobClient

2008-06-04 Thread Stuart Sierra
How does Hadoop decide when to update the "percent complete" for map/reduce tasks? I've been running a small job (~150 MB) on a pseudo-distributed cluster. "bin/hadoop jar" prints: 08/06/04 17:02:16 INFO mapred.JobClient: map 0% reduce 0% 08/06/04 17:05:52 INFO mapred.JobClient: map 100% reduc

Re: confusing about decommission in HDFS

2008-06-04 Thread lohit
The 3 steps you mentioned, were they done while namenode was still running? I think (I might be wrong as well), that the config is read only once, when the namenode is started. So, you should have defined dfs.hosts.exclude file before hand. When you want to refresh, you just updated the file alr

Re: Stackoverflow

2008-06-04 Thread Chris Douglas
The pivot selection is the median of the first, middle, and last elements; it should be the best choice for sorted data. It's still possible to pick bad pivots, but data that forces hundreds of consecutive bad pivot selections should be exceedingly rare. -C On Jun 4, 2008, at 9:24 AM, Doug

Re: Upgrade from 0.16.3 to 0.17.0

2008-06-04 Thread Tanton Gibbs
1) Yes, that is normal. You have to manually finalize the upgrade. 2) Probably, because (as I understand it), it keeps a backup of the pre-upgraded state. 3) you can use hadoop dfsadmin -finalizeUpgrade to finalize it. See here: http://wiki.apache.org/hadoop/Hadoop_Upgrade 4) I assume the finaliz

checking per-node health (jobs, tasks, failures)?

2008-06-04 Thread Meng Mao
I'm trying to implement Nagios health monitoring of a Hadoop grid. If anyone has general tips to share, those would be welcome, too. For those who don't know, Nagios is monitoring software that organizes and manages checking of services. As best as I know, the easiest, most decoupled way to monito

Re: Stackoverflow

2008-06-04 Thread Doug Cutting
Andreas Kostyrka wrote: java.lang.StackOverflowError at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:494) at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58) at org.apache.

confusing about decommission in HDFS

2008-06-04 Thread Xiangna Li
hi, I try to decommission a node by the following the steps: (1) write the hostname of decommission node in a file as the exclude file. (2) let the exclude file be specified as a configuration parameter dfs.hosts.exclude. (3) run "bin/hadoop dfsadmin -refreshNodes". It

Re: hadoop on EC2

2008-06-04 Thread Chris K Wensel
These are the FoxyProxy wildcards I use *compute-1.amazonaws.com* *.ec2.internal* *.compute-1.internal* and w/ hadoop 0.17.0, just type (after booting your cluster) hadoop-ec2 proxy to start the tunnel for that cluster On Jun 3, 2008, at 11:26 PM, James Moore wrote: On Tue, Jun 3, 2008 at

RE: Stackoverflow

2008-06-04 Thread Devaraj Das
Hi Andreas, Here is what I did: bin/hadoop jar build/hadoop-0.18.0-dev-examples.jar randomtextwriter -Dtest.randomtextwrite.min_words_key=40 -Dtest.randomtextwrite.max_words_key=50 -Dtest.randomtextwrite.maps_per_host=1 textinput (this would generate 1GB of text data with pretty long sentences. R

Upgrade from 0.16.3 to 0.17.0

2008-06-04 Thread Iván de Prado
I have upgraded from 0.16.3 to 0.17.0 correctly. But after a few days the disk usage has been increased. I have notice that there are two folder in the data nodes: - current -> With version -13 - previous -> With version -11 And I have this message in the HDFS webapp: Upgrade for version -13 ha

Re: hadoop on EC2

2008-06-04 Thread Steve Loughran
Andreas Kostyrka wrote: Well, the basic "trouble" with EC2 is that clusters usually are not networks in the TCP/IP sense. This makes it painful to decide which URLs should be resolved where. Plus to make it even more painful, you cannot easily run it with one simple SOCKS server, because you

Re: Stackoverflow

2008-06-04 Thread Steve Loughran
Andreas Kostyrka wrote: Ok, a new dead job: ;( This time after 2.4GB/11,3M lines ;( Any idea what I could do debug this? (No idea how to go at debugging a Java process that is distributed and does GBs of data. Its one of the big problems of distributed computing; distributed debugging How

RE: setrep

2008-06-04 Thread Haijun Cao
Lohit, Thanks for the explanation. If that's the case, then it is not slower than expected. Haijun -Original Message- From: lohit [mailto:[EMAIL PROTECTED] Sent: Wed 6/4/2008 2:11 AM To: core-user@hadoop.apache.org Subject: Re: setrep >It seems that setrep won't force replicatio