Doesn't seem to be, to me. Seems to be an indicator of records
On Wed, Jun 4, 2008 at 5:53 PM, Daniel Blaisdell <[EMAIL PROTECTED]> wrote:
> Is the map progress indicator computed as a percentage of maps completed?
>
> -Daniel
>
> On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs <[EMAIL PROTECTED]>
Hi,
Any chance the videos will be taken *and* made available outside Yahoo?
The videos from the Hadoop summit are still not available:
http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html
And at this point it looks like they never will be available :(
Thanks,
Ot
Has someone already written a generic deflator program? It would be a
great util to add to the core :)
-- Jim
On Wed, Jun 4, 2008 at 7:27 PM, Runping Qi <[EMAIL PROTECTED]> wrote:
>
> You can run another map-only job to read convert the deflated files and
> write them out in the format you want.
You can run another map-only job to read convert the deflated files and
write them out in the format you want.
Runping
> -Original Message-
> From: Jim R. Wilson [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 04, 2008 4:13 PM
> To: core-user@hadoop.apache.org
> Subject: [core-user] H
- Original Message -
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wed Jun 04 15:06:42 2008
Subject: Re: compressed/encrypted file
You can compress / decompress at many points:
--prior to mapping
--after mapping
--after reducing
(I've been experim
Hi all,
I'm using hadoop-streaming to execute Python jobs in an EC2 cluster.
The output directory in HDFS has part-0.deflate files - how can I
deflate them back into regular text?
In my hadoop-site.xml, I unfortunately have:
mapred.output.compress
true
mapred.output.compression.type
Is the map progress indicator computed as a percentage of maps completed?
-Daniel
On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs <[EMAIL PROTECTED]> wrote:
> From what I've read, there are three reduce phases 1. copy 2. sort 3.
> reduce
> From 0 - 33% is the copy phase. I guess if you don't need
Haijun,
On Jun 4, 2008, at 3:45 PM, Haijun Cao wrote:
Mile, Thanks.
"If your inputs to maps are compressed, then you don't get any
automatic
assignment of mappers to your data: each gzipped file gets assigned a
mapper." <--- this is the case I am talking about.
With the current compres
>From what I've read, there are three reduce phases 1. copy 2. sort 3. reduce
>From 0 - 33% is the copy phase. I guess if you don't need that phase
it could skip this completely.
After 33%, it waits until it is done sorting before outputting status
again at 66%, then it updates regularly during th
Mile, Thanks.
"If your inputs to maps are compressed, then you don't get any automatic
assignment of mappers to your data: each gzipped file gets assigned a
mapper." <--- this is the case I am talking about.
Haijun
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] O
You can compress / decompress at many points:
--prior to mapping
--after mapping
--after reducing
(I've been experimenting with all these options; we have been crawling blogs
every day since Feb and we store on DFS compressed sets of posts)
If your inputs to maps are compressed, then you don't
The next user group meeting is scheduled for June 18th from 6-7:30 pm at
the Yahoo! Mission College campus (2821 Mission College, Santa Clara).
Registration, driving directions etc are at
http://upcoming.yahoo.com/event/760573/
Agenda:
1) Hadoop at Facebook, Hive - Jeff Hammerbacher
2)
If a file is compressed and encrypted, then is it still possible to split it
and run mappers in parallel?
Do people compress their files stored in hadoop? If yes, how do you go about
processing them in parallel?
Thanks
Haijun
How does Hadoop decide when to update the "percent complete" for
map/reduce tasks? I've been running a small job (~150 MB) on a
pseudo-distributed cluster. "bin/hadoop jar" prints:
08/06/04 17:02:16 INFO mapred.JobClient: map 0% reduce 0%
08/06/04 17:05:52 INFO mapred.JobClient: map 100% reduc
The 3 steps you mentioned, were they done while namenode was still running?
I think (I might be wrong as well), that the config is read only once, when the
namenode is started. So, you should have defined dfs.hosts.exclude file before
hand.
When you want to refresh, you just updated the file alr
The pivot selection is the median of the first, middle, and last
elements; it should be the best choice for sorted data. It's still
possible to pick bad pivots, but data that forces hundreds of
consecutive bad pivot selections should be exceedingly rare. -C
On Jun 4, 2008, at 9:24 AM, Doug
1) Yes, that is normal. You have to manually finalize the upgrade.
2) Probably, because (as I understand it), it keeps a backup of the
pre-upgraded state.
3) you can use hadoop dfsadmin -finalizeUpgrade to finalize it. See
here: http://wiki.apache.org/hadoop/Hadoop_Upgrade
4) I assume the finaliz
I'm trying to implement Nagios health monitoring of a Hadoop grid.
If anyone has general tips to share, those would be welcome, too.
For those who don't know, Nagios is monitoring software that organizes and
manages checking of services.
As best as I know, the easiest, most decoupled way to monito
Andreas Kostyrka wrote:
java.lang.StackOverflowError
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:494)
at org.apache.hadoop.util.QuickSort.fix(QuickSort.java:29)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:58)
at org.apache.
hi,
I try to decommission a node by the following the steps:
(1) write the hostname of decommission node in a file as the exclude file.
(2) let the exclude file be specified as a configuration
parameter dfs.hosts.exclude.
(3) run "bin/hadoop dfsadmin -refreshNodes".
It
These are the FoxyProxy wildcards I use
*compute-1.amazonaws.com*
*.ec2.internal*
*.compute-1.internal*
and w/ hadoop 0.17.0, just type (after booting your cluster)
hadoop-ec2 proxy
to start the tunnel for that cluster
On Jun 3, 2008, at 11:26 PM, James Moore wrote:
On Tue, Jun 3, 2008 at
Hi Andreas,
Here is what I did:
bin/hadoop jar build/hadoop-0.18.0-dev-examples.jar randomtextwriter
-Dtest.randomtextwrite.min_words_key=40
-Dtest.randomtextwrite.max_words_key=50
-Dtest.randomtextwrite.maps_per_host=1 textinput
(this would generate 1GB of text data with pretty long sentences. R
I have upgraded from 0.16.3 to 0.17.0 correctly. But after a few days
the disk usage has been increased. I have notice that there are two
folder in the data nodes:
- current -> With version -13
- previous -> With version -11
And I have this message in the HDFS webapp:
Upgrade for version -13 ha
Andreas Kostyrka wrote:
Well, the basic "trouble" with EC2 is that clusters usually are not networks
in the TCP/IP sense.
This makes it painful to decide which URLs should be resolved where.
Plus to make it even more painful, you cannot easily run it with one simple
SOCKS server, because you
Andreas Kostyrka wrote:
Ok, a new dead job: ;(
This time after 2.4GB/11,3M lines ;(
Any idea what I could do debug this?
(No idea how to go at debugging a Java process that is distributed and does
GBs of data.
Its one of the big problems of distributed computing; distributed debugging
How
Lohit,
Thanks for the explanation. If that's the case, then it is not slower than
expected.
Haijun
-Original Message-
From: lohit [mailto:[EMAIL PROTECTED]
Sent: Wed 6/4/2008 2:11 AM
To: core-user@hadoop.apache.org
Subject: Re: setrep
>It seems that setrep won't force replicatio
26 matches
Mail list logo