hprof profiler output location

2013-06-16 Thread YouPeng Yang
Hi All I want to profile a fraction of the tasks in a job,so I configured my job as [1]. However I could not get the hprof profiler output on the host on which I submitted my job.(I use MRv2 with YARN --CDH4.1.2---) Where can I find the hprof profiler output? [1] job.setProfileEnabl

MRunit DOWNLOAD URLs are unavailable

2013-06-16 Thread YouPeng Yang
HI All I want to report that MRunit DOWNLOAD URLs are unavailable。 http://www.apache.org/dyn/closer.cgi/incubator/mrunit/ Could anyone give me another available URL Regard Thank you.

Re: MRunit DOWNLOAD URLs are unavailable

2013-06-16 Thread Jagat Singh
http://mrunit.apache.org/general/downloads.html On Jun 16, 2013 8:20 PM, "YouPeng Yang" wrote: > HI All > > I want to report that MRunit DOWNLOAD URLs are unavailable。 > http://www.apache.org/dyn/closer.cgi/incubator/mrunit/ > > Could anyone give me another available URL > > Regard > >

HDFS file reader and buffering

2013-06-16 Thread John Lilley
Do the HDFS file-reader classes perform internal buffering? Thanks John

Re: HDFS file reader and buffering

2013-06-16 Thread Harsh J
Yes they do maintain a buffer equal to the configurable size of io.file.buffer.size (4k default) for both reads and writes. On Sun, Jun 16, 2013 at 7:03 PM, John Lilley wrote: > Do the HDFS file-reader classes perform internal buffering? > > Thanks > > John > > > > -- Harsh J

Re: how to get the mapreduce code which was pig/hive script translated to?

2013-06-16 Thread Harsh J
This is a question for the Hive/Pig lists to answer best. Note though that they only compile a plan, not the code. The code is available already, the compiled plan just structures the execution flow. If you take a look at the sources, you'll find the bits and pieces that get linked together depend

Assigning the same partition number to the mapper output

2013-06-16 Thread Maysam Hossein Yabandeh
Hi, I was wondering if it is possible in hadoop to assign the same partition numbers to the map outputs. I am running a map-only job (with zero reducers) and hadoop shuffles the partitions in the output: i.e. input/part-m-X is processed by task number Y and hence generates output/part-m-000

Re: About hadoop-2.0.5 release

2013-06-16 Thread Roman Shaposhnik
On Tue, Jun 11, 2013 at 11:22 PM, Ramya S wrote: > Hi, > > When will be the release of stable version of hadoop-2.0.5-alpha? hadoop-2.0.5-alpha has been released last week and can be obtained either in its source form: http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.0.5-alpha/ or

webhdfs kerberos checksum failed

2013-06-16 Thread Lanati, Matteo
Hi all, I'm trying to setup webhdfs on Hadoop 1.20 with security. I added the following to hdfs-site.xml dfs.webhdfs.enabled true dfs.web.authentication.kerberos.principal HTTP/master.hadoop.lo...@hadoop.lrz.de dfs.web.authentication.kerberos.keytab /home/

Re: how to get the mapreduce code which was pig/hive script translated to?

2013-06-16 Thread Edward Capriolo
Hive serializes the entire plan into an XML file if you set the log 4j settings to debug you should get the locations to the files itgenerates before launching the job. On Sun, Jun 16, 2013 at 11:08 AM, Harsh J wrote: > This is a question for the Hive/Pig lists to answer best. > > Note though t

Re: how to get the mapreduce code which was pig/hive script translated to?

2013-06-16 Thread Marcos Luis Ortiz Valmaseda
Edward is right. With log4j, you can see that. Here, you have the example: https://github.com/apache/hadoop-common/blob/HADOOP-3628/conf/log4j.properties The relevant info in the docs: http://hadoop.apache.org/docs/stable/cluster_setup.html#Logging Some working examples: http://stackoverflow.com/

RE: how to design the mapper and reducer for the below problem

2013-06-16 Thread John Lilley
I don't think can be done in a single map/reduce pass. Here the author discusses an implementation in PIG: http://techblug.wordpress.com/2011/08/07/transitive-closure-in-pig/ john From: parnab kumar [mailto:parnab.2...@gmail.com] Sent: Thursday, June 13, 2013 10:42 PM To: user@hadoop.apache.org S

RE: how to design the mapper and reducer for the below problem

2013-06-16 Thread John Lilley
Sorry this is the link I meant: http://hortonworks.com/blog/transitive-closure-in-apache-pig/ john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, June 16, 2013 1:02 PM To: user@hadoop.apache.org Subject: RE: how to design the mapper and reducer for the below problem I don't th

RE: How to design the mapper and reducer for the following problem

2013-06-16 Thread John Lilley
You basically have a "record similarity scoring and linking" problem -- common in data-quality software like ours. This could be thought of as computing the cross-product of all records, counting the number of hash keys in common, and then outputting those that exceed a threshold. This is very

RE: How to design the mapper and reducer for the following problem

2013-06-16 Thread John Lilley
On further thought, it would be simpler to augment Reducer1 to use disk when it does not fit into memory. Nested looping over the disk file is sequential and will be fast. Then you can avoid the distributed join. john From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Sunday, June 16, 2

Re: how to get the mapreduce code which was pig/hive script translated to?

2013-06-16 Thread Lance Norskog
Both Pig and Hive have an 'explain plan' command that prints a schematic version. This might make it easier to see what M/R algorithms are used. Mostly the data goes through single-threaded transforms inside a mapper or reducer. https://cwiki.apache.org/Hive/languagemanual-explain.html On 06/

RE: Assigning the same partition number to the mapper output

2013-06-16 Thread Devaraj k
If you are using TextOutputFormat for your job, getRecordWriter() (i.e RecordWriter org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException) method uses FileOutputFormat.getDefaultWorkFile() for generating the fi