Re: how to figure out the range of a split that failed?

2010-07-05 Thread edward choi
Thanks for the tip. I actually already have tried your method. The command I wrote is like below cerr << "reporter:counter:SkippingTaskCounters,MapProcessedRecords,1\n"; This actually produced some skipped records in skip folder. But the problem is that the skipped records' text was all messed up

Re: how to figure out the range of a split that failed?

2010-07-05 Thread Sharad Agarwal
to be precise you have to write on error stream -> for map: reporter:counter:SkippingTaskCounters,MapProcessedRecords, for reduce: reporter:counter:SkippingTaskCounters,ReduceProcessedGroups, edward choi wrote: Thanks for the response. I went to the web page you told me and several other pages

Re: Does hadoop need to have ZooKeeper to work?

2010-07-05 Thread Edward Capriolo
On Mon, Jun 28, 2010 at 10:26 AM, Pierre ANCELOT wrote: > Hive depends on zookeeper though, if you plan to have it. > > > > On Mon, Jun 28, 2010 at 4:23 PM, Eason.Lee wrote: > >> No, they are separate projects! >> they don't depend on each other~~ >> >> 2010/6/28 legolas >> >> > >> > Hi, >> > >>

Re: Hadoop Shutdown Hook

2010-07-05 Thread Jeff Hammerbacher
Hey Arv, CDH2 and CDH3 both have HADOOP-4829: see http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.89.releasenotes.html and http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.releasenotes.html. Alternatively, you can download an Apache 0.21 release candidate at http://people.apache.org/~tomwh

Re: What is it??? help required

2010-07-05 Thread Alex Loddengaard
Hi Ahmad, On Sat, Jul 3, 2010 at 11:21 AM, Ahmad Shahzad wrote: > > 1) What is the purpose of the HttpServer that is started at port 50060, and > jetty bounded to it. > This is used for web UI status (just like the jobtracker and namenode web UI), along with map -> reduce intermediate data trans

Hadoop Shutdown Hook

2010-07-05 Thread Arv Mistry
Hi folks, I need to be able to overwrite the hadoop shutdown hook. The following jira https://issues.apache.org/jira/browse/HADOOP-4829 describes the fix as being implemented in 0.21.0. When will that load be available? In the mean time is there a workaround? The Jira describes using Java refl

Re: Hashing two relations

2010-07-05 Thread Gang Luo
Actually I mean hash join when I said default reduce join. The hash partitioner will shuffle records from mappers to reducers. Each reducer receive a hash partition which involves many keys. the reduce method will process one key (and all the associative records) at one time. What you can do is

Re: Partitioned Datasets Map/Reduce

2010-07-05 Thread abc xyz
Thanks Aaron. The first option sounds good. How can I ensure to write the partition numbers in a single file while I am writing each partition to a separate  file? I mean, Ok after the custom partitioner, an identity reducer would work to write the part-x file for each partition, but how

Re: why my Reduce Class does not work?

2010-07-05 Thread Vitaliy Semochkin
Thank you very much Ken, The problem was with missing Generic declaration (Eclipse failed to override the method and I didn't notice mistake) instead of public void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException be should public void reduce(Te

Re: Hashing two relations

2010-07-05 Thread abc xyz
The default reduce join is the sort-merge join. I want to have a hash-join on reduce-side for some experimenting. I want to get a partition from each hash-table and build an in-memory hash table for one and probing the partition from other table against it (like grace-join algorithm). Any sugges

Re: Partitioned Datasets Map/Reduce

2010-07-05 Thread Aaron Kimball
One possibility: write out all the partition numbers (one per line) to a single file, then use the NLineInputFormat to make each line its own map task. Then in your mapper itself, you will get in a key of "0" or "1" or "2" etc. Then explicitly open /dataset1/part-(n) and /dataset2/part-(n) in your

Re: create error

2010-07-05 Thread Aaron Kimball
Is there a reason you're using that particular interface? That's very low-level. See http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample for the proper API to use. - Aaron On Sat, Jul 3, 2010 at 1:36 AM, Vidur Goyal wrote: > Hi, > > I am trying to create a file in hdfs . I am calling create

Re: Text files vs. SequenceFiles

2010-07-05 Thread Aaron Kimball
David, I think you've more-or-less outlined the pros and cons of each format (though do see Alex's important point regarding SequenceFiles and compression). If everyone who worked with Hadoop clearly favored one or the other, we probably wouldn't include support for both formats by default. :) Nei