how to turn off sorting

2010-11-30 Thread exception
Hi all, I want to create index for a bunch of log. Out log is line based text file. Each line contains time, uid and some other data. I want to create index for uid. So for each line, mapper will emit a pair. And the result is sorted by uid. The reducer will combine all the offset for a single

Re: FILE_BYTES_WRITTEN and HDFS_BYTES_WRITTEN

2010-11-30 Thread Niels Basjes
For some parts of a task the system stores information on the local (non-HDFS) file system of the node that is actually running the job. That is the FILE_.. Stuff written to HDFS is the HDFS_... HTH 2010/11/30 : > When an hadoop MapReduce example is executed, at the end of the example it's > sho

FILE_BYTES_WRITTEN and HDFS_BYTES_WRITTEN

2010-11-30 Thread psdc1978
When an hadoop MapReduce example is executed, at the end of the example it's showed a table with all the information about the execution, like the number of Map and Reduce tasks executed, the number of bytes read and written. In this information it exists 2 fields FILE_BYTES_WRITTEN and H

GridMix2 preparations

2010-11-30 Thread psdc1978
Hi, To run gridmix2 (rungridmix_2) at ${HADOOP_HOME}/src/benchmarks/gridmix2 , do I need to run previously the generateGridmix2data.sh script file? Thanks, Pedro

Re: Problems running GridMix2

2010-11-30 Thread Pedro Costa
The command that I launch to run GridMix2 is : $>./rungridmix_2 Here's the content of the file: [code] #!/usr/bin/env bash ## Environment configuration GRID_DIR=$(dirname "$0") GRID_DIR=$(cd "$GRID_DIR"; pwd) source $GRID_DIR/gridmix-env-2 Date=$(date +%F-%H-%M-%S-%N) echo $Date > $1_start.out

Problems running GridMix2

2010-11-30 Thread psdc1978
Hi, 1 - I'm trying to run GridMix2 (rungridmix_2) in a cluster, but it happens nothing. A Job isn't created. It simple appears the message: [code] GridMix results: Total num of Jobs: 0 ExecutionTime: 0 [/code] Is there a way to know what is happenning with gridmix? Here's the gridmix-env-2