Hi all,
I want to create index for a bunch of log. Out log is line based text file.
Each line contains time, uid and some other data. I want to create index for
uid. So for each line, mapper will emit a pair. And the result is
sorted by uid. The reducer will combine all the offset for a single
For some parts of a task the system stores information on the local
(non-HDFS) file system of the node that is actually running the job.
That is the FILE_.. Stuff written to HDFS is the HDFS_...
HTH
2010/11/30 :
> When an hadoop MapReduce example is executed, at the end of the example it's
> sho
When an hadoop MapReduce example is executed, at the end of the example
it's showed a table with all the information about the execution, like the
number of Map and Reduce tasks executed, the number of bytes read and
written.
In this information it exists 2 fields FILE_BYTES_WRITTEN and
H
Hi,
To run gridmix2 (rungridmix_2) at ${HADOOP_HOME}/src/benchmarks/gridmix2 ,
do I need to run previously the generateGridmix2data.sh script file?
Thanks,
Pedro
The command that I launch to run GridMix2 is :
$>./rungridmix_2
Here's the content of the file:
[code]
#!/usr/bin/env bash
## Environment configuration
GRID_DIR=$(dirname "$0")
GRID_DIR=$(cd "$GRID_DIR"; pwd)
source $GRID_DIR/gridmix-env-2
Date=$(date +%F-%H-%M-%S-%N)
echo $Date > $1_start.out
Hi,
1 - I'm trying to run GridMix2 (rungridmix_2) in a cluster, but it happens
nothing. A Job isn't created. It simple appears the message:
[code]
GridMix results:
Total num of Jobs: 0
ExecutionTime: 0
[/code]
Is there a way to know what is happenning with gridmix?
Here's the gridmix-env-2