Better to provide a summary as well as the link
On Friday, February 18, 2011, Shrinivas Joshi wrote:
> There seems to be a wiki page already intended for capturing information on
> disks in Hadoop environment. http://wiki.apache.org/hadoop/DiskSetup
>
> Do we just want to link the thread on HDD
There seems to be a wiki page already intended for capturing information on
disks in Hadoop environment. http://wiki.apache.org/hadoop/DiskSetup
Do we just want to link the thread on HDD recommendations from this wiki
page?
-Shrinivas
On Tue, Feb 15, 2011 at 11:48 AM, zGreenfelder wrote:
> unto
On Fri, Feb 18, 2011 at 14:35, Ted Dunning wrote:
> I just read the malstone report. They report times for a Java version that
> is many (5x) times slower than for a streaming implementation. That single
> fact indicates that the Java code is so appallingly bad that this is a very
> bad benchmar
I just read the malstone report. They report times for a Java version that
is many (5x) times slower than for a streaming implementation. That single
fact indicates that the Java code is so appallingly bad that this is a very
bad benchmark.
On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout wrote:
>
Thanks Jim. MRBench mentioned in this paper
http://dcslab.snu.ac.kr/~khjeon/papers/2008/icpads_mrbench.pdf looks like a
map/reduce port of TPC-H workload. BTW, MRBench mentioned in the above paper
and the one in mapred/src/test/mapred/org/apache/hadoop/mapred/MRBench.java
look different to me. Is t
We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the data
and the queries, if not the query generator. There is a Jira issue in Hive that
discusses the TPC-H "benchmark" if you're interested. Sorry, I don't remember
the issue number offhand.
-Original Message-
From: S
Thank you,
Mark
On Fri, Feb 18, 2011 at 4:23 PM, Eric Sammer wrote:
> Mark:
>
> You have a few options. You can:
>
> 1. Package dependent jars in a lib/ directory of the jar file.
> 2. Use something like Maven's assembly plugin to build a self contained
> jar.
>
> Either way, I'd strongly recomm
Mark:
You have a few options. You can:
1. Package dependent jars in a lib/ directory of the jar file.
2. Use something like Maven's assembly plugin to build a self contained jar.
Either way, I'd strongly recommend using something like Maven to build your
artifacts so they're reproducible and in
Hi,
I have a script that I use to re-package all the jars (which are output in a
dist directory by NetBeans) - and it structures everything correctly into a
single jar for running a MapReduce job. Here it is below, but I am not sure
if it is the best practice. Besides, it hard-codes my paths. I am
Thanks Ted and Jim :)
Maha
On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
> That's right. The TextInputFormat handles situations where records cross
> split boundaries. What your mapper will see is "whole" records.
>
> -Original Message-
> From: maha [mailto:m...@umail.ucsb.edu]
> S
MalStone looks like a very narrow benchmark.
Terasort is also a very narrow and somewhat idiosyncratic benchmark, but it
has the characteristic that lots of people use it.
You should add PigMix to your list. There java versions of the problems in
PigMix that make a pretty good set of benchmarks
Which workloads are used for serious benchmarking of Hadoop clusters? Do you
care about any of the following workloads :
TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
Thanks,
-Shrinivas
That's right. The TextInputFormat handles situations where records cross split
boundaries. What your mapper will see is "whole" records.
-Original Message-
From: maha [mailto:m...@umail.ucsb.edu]
Sent: Friday, February 18, 2011 1:14 PM
To: common-user
Subject: Quick question
Hi all,
The input is effectively split by lines, but under the covers, the actual
splits are by byte. Each mapper will cleverly scan from the specified start
to the next line after the start point. At then end, it will over-read to
the end of line that is at or after the end of its specified region. Thi
Hi all,
I want to check if the following statement is right:
If I use TextInputFormat to process a text file with 2000 lines (each ending
with \n) with 20 mappers. Then each map will have a sequence of COMPLETE LINES
.
In other words, the input is not split byte-wise but by lines.
Is th
> On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote:
>> I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting until
>> it completes with RunningJob.waitForCompletion(). I then want to get how
>> long the entire MR takes, which appears to need the JobStatus since
>> RunningJob
16 matches
Mail list logo