That's a great way of putting it: "increasing capacity shifts the
equilibrium". If you work some examples you'll find that it doesn't take
many iterations for a workflow to converge to its stable runtime. There is
some minimum capacity you need for there to be an equilibrium though, or
else the run
*Matthew McCullough to Speak on Dividing and Conquering Hadoop at GIDS 2010
* Great Indian Developer Summit 2010 – Gold Standard for India's Software
Developer Ecosystem
*Bangalore**, January 04, 2010*: Moore's law has finally hit the wall and
CPU speeds have actually decreased in the last few
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is not part of the
released version of 0.20.1 right? Is this expected to be part of 0.20.2 or
later?
2010/1/5 Amareshwari Sri Ramadasu
> In branch 0.21, You can get the functionality of both
> org.apache.hadoop.mapred.lib.MultipleOutputs an
Thanks. What I mean is, the combiner doesn't "intentionally" re-read spilled
records back to memory just to combine them. But it does happens that some
records will be re-read for sort. I think combiner should work on those records.
-Gang
- 原始邮件
发件人: Ted Xu
收件人: common-user@hadoop
*Matthew McCullough to Speak on Dividing and Conquering Hadoop at GIDS 2010
* Great Indian Developer Summit 2010 – Gold Standard for India's Software
Developer Ecosystem
*Bangalore**, January 04, 2010*: Moore's law has finally hit the wall and
CPU speeds have actually decreased in the last few
In branch 0.21, You can get the functionality of both
org.apache.hadoop.mapred.lib.MultipleOutputs and
org.apache.hadop.mapred.lib.MultipleOutputFormat in
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs. Please see
MAPREDUCE-370 for more details.
Thanks
Amareshwari
On 1/5/10 5:56 PM, "
Thanks Steve. I could solve the problem by moving the set() methods before
job creation, as Amogh suggested. However, I will also try your solution.
On Tue, Jan 5, 2010 at 1:24 PM, Steve Kuo wrote:
> There seemed to be a change between 0.20 and 0.19 API in that 0.20 no
> longer
> set "map.input.
Hi Gang,
My understanding to this is that, the combiner has to re-read some records
> which have already been spilled to disk and combine them with those records
> which come later.
>
I believe the combine operation is done before map spill and after reduce
merge. Combine only occurs in the memor
On Sunday 03 January 2010 11:30:29 Nathan Marz wrote:
> I did some analysis on the performance of Hadoop-based workflows. Some of
> the results are counter-intuitive so I thought the community at large would
> be interested:
>
> http://nathanmarz.com/blog/hadoop-mathematics/
>
> Would love to he
Dear Hadoop and Pig Users,
This is just to let you know that the submission deadline for ICS'10 (
http://www.ics-conference.org/) is two weeks from today. ICS is a
premier forum for research in cloud/distributed computing and the most
of the work/research we do in CCDI. The CFP of the conferen
Hi all,
Happy new year!
RSVP is now open for the first 2010 Bay Area Hadoop user group at the Yahoo!
Sunnyvale Campus, planed for Jan 20th.
Registration is available here
http://www.meetup.com/hadoop/calendar/12229988/
Agenda will be posted soon.
Looking forward to seeing you there
Dekel
Hi all,
when I run a mapreduce job using combiner, I find that the combiner input # >
map output #, and combiner output # > reduce input #. My understanding to this
is that, the combiner has to re-read some records which have already been
spilled to disk and combine them with those records which
On Mon, Dec 21, 2009 at 11:57 AM, dave bayer wrote:
>
> On Nov 25, 2009, at 11:27 AM, David J. O'Dell wrote:
>
> I've intermittently seen the following errors on both of my clusters, it
>> happens when writing files.
>> I was hoping this would go away with the new version but I see the same
>> b
try a wide audience...
the number from Reduce output records Counter doesn't match its
actually # of records in the output files. although after reran it, it
did match. any idea what could be wrong?
Thanks,
Yonggang
Hello Amogh,
Thanks a lot for the reply. Moving the Set() methods before Job creation
solved my problem. I think it should be mentioned somewhere in the API docs
or tutorial.
Regards,
Farhan
On Tue, Jan 5, 2010 at 6:09 AM, Amogh Vasekar wrote:
> Hi,
>
> 1. map.input.file in new API is contenti
On Jan 5, 2010, at 7:44 AM, Yu Xi wrote:
Could any hadoop gurus tell me what kinds of security mechanisms are
already(or planed to be) implemented in hadoop filesystem?
It looks like you've found the ones that are already there. You can
see my slides about it here:
http://www.slideshare.
There seemed to be a change between 0.20 and 0.19 API in that 0.20 no longer
set "map.input.file". config.set(), as far as I can tell, should work. I
however use the following to pass the parameters.
String[] params = new String[] { "-D", "tag1=string_value", ...}
ToolRunner(new Configuration()
Hi list,
Could any hadoop gurus tell me what kinds of security mechanisms are
already(or planed to be) implemented in hadoop filesystem?
I know there're some kind of Linux-like 9 bits(ie. ower,group,other) access
control existing in hdfs. Unfortunately there're no user authentication
modules. See
Hi all!
I setup a cluster on a remote machine on EC2, and configured mapreduce and
hdfs on "localhost" by specifying the property in core-site, hdfs-site,
mapred-site.xml as quick start guid shows.
I can see we UI and general task information well, but when I turned to logs
information, or brows
Time for another Boston Hadoop Meetup. Next one will be in two weeks,
on Tuesday, January 19th, 7 pm, at the HubSpot offices:
http://www.meetup.com/bostonhadoop/calendar/12227906/
(HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will
Be Food. There Will Be Beer.)
As before
Hi again.
By the way, I forgot to mention that I do the tests on same machines that
serve as DataNodes. i.e. same machine acts both like as a client and
DataNode.
Regards.
Hi.
Also, It would be interesting to know "data.replication" setting you have
> for this benchmark?
>
>
data.replication = 2
A bit of topic - is it safe to have such number? About a year ago I heard
only 3 way replication was fully tested, while 2 way had some issues - was
it fixed in subsequent
Hi.
Well, that all depends on many details, but:
>
> -) are you really using 4 discs (configured correctly as data
> directories?)
>
>
Yes, 4 directories, one per each disk.
> -) What hdd/connection technology?
>
>
SATA 3Gbp/s
> -) And 77MB/s would match up curiously well with 1Gbit networking
Hi.
Can you provide more information about your workload and the
> environment? eg are you running t.o.a.h.h.BenchmarkThroughput,
> TestDFSIO, or timing hadoop fs -put/get to transfer data to hdfs from
> another machine, looking at metrics, etc. What else is running on the
> cluster? Have you prof
Hi,
We're looking to convert some Ruby/C libxml XML processing code over
to Hadoop. Currently reports are transformed into a CSV output that is
then easier to consume for the downstream systems. We already use
Hadoop (streaming) quite extensively for the rest of our daily batches
so we'd
I'm afraid you have to write it by yourself, since there are no equivalent
classes in new API.
2009/12/28 Huazhong Ning
> Hi all,
>
> I need your help on multiple file output. I have many big files and I hope
> the processing result of each file is outputted to a separate file. I know
> in the o
Hi,
1. map.input.file in new API is contentious. It doesn't seem to be seralized in
.20 ( https://issues.apache.org/jira/browse/HADOOP-5973 ) . As of now you can
use ((FileSplit)context.getInputSplit).getPath() , there was a post on this
sometime back.
2. for your own variables in conf, please
Hi all,
We are organising another open source search social evening (OSSSE?) in
London on Tuesday the 12th of January.
The plan is to get together and chat about search technology, from
Lucene to Solr, Hadoop, Mahout, Xapian, Ferret and the like - bringing
together people from across the field t
Thanks, it works.
Jeff Zhang
On Tue, Jan 5, 2010 at 5:00 PM, Amareshwari Sri Ramadasu <
amar...@yahoo-inc.com> wrote:
> Restarting the trackers makes them un-blacklisted.
>
> -Amareshwari
>
> On 1/5/10 2:27 PM, "Jeff Zhang" wrote:
>
> Hi all,
>
> Two of my nodes are in the blacklist, and I wan
Restarting the trackers makes them un-blacklisted.
-Amareshwari
On 1/5/10 2:27 PM, "Jeff Zhang" wrote:
Hi all,
Two of my nodes are in the blacklist, and I want to reuse them again. How
can I do that ?
Thank you.
Jeff Zhang
Hi all,
Two of my nodes are in the blacklist, and I want to reuse them again. How
can I do that ?
Thank you.
Jeff Zhang
31 matches
Mail list logo