Hi All
I want to profile a fraction of the tasks in a job,so I configured my
job as [1].
However I could not get the hprof profiler output on the host on which
I submitted my job.(I use MRv2 with YARN --CDH4.1.2---)
Where can I find the hprof profiler output?
[1]
job.setProfileEnabl
HI All
I want to report that MRunit DOWNLOAD URLs are unavailable。
http://www.apache.org/dyn/closer.cgi/incubator/mrunit/
Could anyone give me another available URL
Regard
Thank you.
http://mrunit.apache.org/general/downloads.html
On Jun 16, 2013 8:20 PM, "YouPeng Yang" wrote:
> HI All
>
> I want to report that MRunit DOWNLOAD URLs are unavailable。
> http://www.apache.org/dyn/closer.cgi/incubator/mrunit/
>
> Could anyone give me another available URL
>
> Regard
>
>
Do the HDFS file-reader classes perform internal buffering?
Thanks
John
Yes they do maintain a buffer equal to the configurable size of
io.file.buffer.size (4k default) for both reads and writes.
On Sun, Jun 16, 2013 at 7:03 PM, John Lilley wrote:
> Do the HDFS file-reader classes perform internal buffering?
>
> Thanks
>
> John
>
>
>
>
--
Harsh J
This is a question for the Hive/Pig lists to answer best.
Note though that they only compile a plan, not the code. The code is
available already, the compiled plan just structures the execution
flow. If you take a look at the sources, you'll find the bits and
pieces that get linked together depend
Hi,
I was wondering if it is possible in hadoop to assign the same partition
numbers to the map outputs. I am running a map-only job (with zero reducers)
and hadoop shuffles the partitions in the output: i.e. input/part-m-X is
processed by task number Y and hence generates output/part-m-000
On Tue, Jun 11, 2013 at 11:22 PM, Ramya S wrote:
> Hi,
>
> When will be the release of stable version of hadoop-2.0.5-alpha?
hadoop-2.0.5-alpha has been released last week and can be obtained
either in its source form:
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.0.5-alpha/
or
Hi all,
I'm trying to setup webhdfs on Hadoop 1.20 with security.
I added the following to hdfs-site.xml
dfs.webhdfs.enabled
true
dfs.web.authentication.kerberos.principal
HTTP/master.hadoop.lo...@hadoop.lrz.de
dfs.web.authentication.kerberos.keytab
/home/
Hive serializes the entire plan into an XML file if you set the log 4j
settings to debug you should get the locations to the files itgenerates
before launching the job.
On Sun, Jun 16, 2013 at 11:08 AM, Harsh J wrote:
> This is a question for the Hive/Pig lists to answer best.
>
> Note though t
Edward is right. With log4j, you can see that. Here, you have the example:
https://github.com/apache/hadoop-common/blob/HADOOP-3628/conf/log4j.properties
The relevant info in the docs:
http://hadoop.apache.org/docs/stable/cluster_setup.html#Logging
Some working examples:
http://stackoverflow.com/
I don't think can be done in a single map/reduce pass.
Here the author discusses an implementation in PIG:
http://techblug.wordpress.com/2011/08/07/transitive-closure-in-pig/
john
From: parnab kumar [mailto:parnab.2...@gmail.com]
Sent: Thursday, June 13, 2013 10:42 PM
To: user@hadoop.apache.org
S
Sorry this is the link I meant:
http://hortonworks.com/blog/transitive-closure-in-apache-pig/
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Sunday, June 16, 2013 1:02 PM
To: user@hadoop.apache.org
Subject: RE: how to design the mapper and reducer for the below problem
I don't th
You basically have a "record similarity scoring and linking" problem -- common
in data-quality software like ours. This could be thought of as computing the
cross-product of all records, counting the number of hash keys in common, and
then outputting those that exceed a threshold. This is very
On further thought, it would be simpler to augment Reducer1 to use disk when it
does not fit into memory. Nested looping over the disk file is sequential and
will be fast. Then you can avoid the distributed join.
john
From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: Sunday, June 16, 2
Both Pig and Hive have an 'explain plan' command that prints a schematic
version. This might make it easier to see what M/R algorithms are used.
Mostly the data goes through single-threaded transforms inside a mapper
or reducer.
https://cwiki.apache.org/Hive/languagemanual-explain.html
On 06/
If you are using TextOutputFormat for your job, getRecordWriter() (i.e
RecordWriter
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext
job) throws IOException, InterruptedException) method uses
FileOutputFormat.getDefaultWorkFile() for generating the fi
17 matches
Mail list logo