Thanks for the tip.
I actually already have tried your method. The command I wrote is like below
cerr << "reporter:counter:SkippingTaskCounters,MapProcessedRecords,1\n";
This actually produced some skipped records in skip folder. But the problem
is that the skipped records' text was all messed up
to be precise you have to write on error stream ->
for map:
reporter:counter:SkippingTaskCounters,MapProcessedRecords,
for reduce:
reporter:counter:SkippingTaskCounters,ReduceProcessedGroups,
edward choi wrote:
Thanks for the response. I went to the web page you told me and
several other pages
On Mon, Jun 28, 2010 at 10:26 AM, Pierre ANCELOT wrote:
> Hive depends on zookeeper though, if you plan to have it.
>
>
>
> On Mon, Jun 28, 2010 at 4:23 PM, Eason.Lee wrote:
>
>> No, they are separate projects!
>> they don't depend on each other~~
>>
>> 2010/6/28 legolas
>>
>> >
>> > Hi,
>> >
>>
Hey Arv,
CDH2 and CDH3 both have HADOOP-4829: see
http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.89.releasenotes.html and
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.releasenotes.html.
Alternatively, you can download an Apache 0.21 release candidate at
http://people.apache.org/~tomwh
Hi Ahmad,
On Sat, Jul 3, 2010 at 11:21 AM, Ahmad Shahzad wrote:
>
> 1) What is the purpose of the HttpServer that is started at port 50060, and
> jetty bounded to it.
>
This is used for web UI status (just like the jobtracker and namenode web
UI), along with map -> reduce intermediate data trans
Hi folks,
I need to be able to overwrite the hadoop shutdown hook. The following jira
https://issues.apache.org/jira/browse/HADOOP-4829 describes the fix as being
implemented in 0.21.0. When will that load be available?
In the mean time is there a workaround? The Jira describes using Java
refl
Actually I mean hash join when I said default reduce join. The hash partitioner
will shuffle records from mappers to reducers. Each reducer receive a hash
partition which involves many keys. the reduce method will process one key (and
all the associative records) at one time. What you can do is
Thanks Aaron. The first option sounds good.
How can I ensure to write the partition numbers in a single file while I am
writing each partition to a separate file? I mean, Ok after the custom
partitioner, an identity reducer would work to write the part-x file for
each partition, but how
Thank you very much Ken,
The problem was with missing Generic declaration
(Eclipse failed to override the method and I didn't notice mistake)
instead of
public void reduce(Text key, Iterable values, Reducer.Context
context) throws IOException, InterruptedException
be should
public void reduce(Te
The default reduce join is the sort-merge join. I want to have a hash-join on
reduce-side for some experimenting. I want to get a partition from each
hash-table and build an in-memory hash table for one and probing the partition
from other table against it (like grace-join algorithm). Any sugges
One possibility: write out all the partition numbers (one per line) to a
single file, then use the NLineInputFormat to make each line its own map
task. Then in your mapper itself, you will get in a key of "0" or "1" or "2"
etc. Then explicitly open /dataset1/part-(n) and /dataset2/part-(n) in your
Is there a reason you're using that particular interface? That's very
low-level.
See http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample for the proper
API to use.
- Aaron
On Sat, Jul 3, 2010 at 1:36 AM, Vidur Goyal wrote:
> Hi,
>
> I am trying to create a file in hdfs . I am calling create
David,
I think you've more-or-less outlined the pros and cons of each format
(though do see Alex's important point regarding SequenceFiles and
compression). If everyone who worked with Hadoop clearly favored one or the
other, we probably wouldn't include support for both formats by default. :)
Nei
13 matches
Mail list logo