Either use an instance variable or a Combiner. The latter is correct
if you want the top-n per key from the mapper.
On Wed, Jan 12, 2011 at 10:03 AM, Rakesh Davanum wrote:
> Hi,
>
> I have a sort job consisting of only the Mapper (no Reducer) task. I want my
> results to contain only the top n r
My S3 secret key has a slash in it. After replacing the / with %2F I
can use it as a filesystem URL in something like:
$ hadoop fs -fs s3n://$KEY:$sec...@$bucket -ls /
Found 1 items
drwxrwxrwx - 0 1969-12-31 16:00 /remote
But when I try a distcp, it crashes with:
$ had
Those are created by fsck and will come back. They belong to the
filesystem and you shouldn't delete them.
Instead, create subdirectories on those mount points and use them as
DFS directories. E.g. create and use /mnt/dfs instead of /mnt
Cheers,
Anthony
On Mon, Sep 28, 2009 at 6:33 PM, Stas Os
First set up your cluster and the client machine as per the getting
started guide. Synchronize your configuration files everywhere.
The FileSystem class has the API for interacting with the HDFS:
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/fs/FileSystem.html
The JobConf a
rt later.
>
> On Mon, Sep 21, 2009 at 12:15 PM, Jeff Zhang wrote:
>
>> My cluster has running for several months.
>>
>> Is this a bug of hadoop? I think hadoop is supposed to run for long time.
>>
>> And will I lose data if I manually kill the process ?
&g
How long has the cluster been running? I have run into this problem
when tmpwatch deleted all of the pid files from tmp because they were
more than n-days old.
If that is the case, you will have to manually kill all of the
processes yourself.
Cheers,
Anthony
On Sun, Sep 20, 2009 at 10:19 PM, Je
start-all.sh is shell script; it can't be run via Hadoop. Just run it
directly from your shell like this:
$ bin/start-all.sh
Cheers,
Anthony
On Thu, Sep 17, 2009 at 4:30 PM, Simon Chu wrote:
> had...@zoe:/opt/hadoop-0.18.3> hadoop start-all.sh
> Exception in thread "main" java.lang.NoClassDefF
fuse.h should come with the FUSE software, not Hadoop. It should be
somewhere like /usr/include/fuse.h on a Linux machine. Possibly
/usr/local/include/fuse.h
Did you install FUSE from source? If not, you probably need something
like Debian's libfuse-dev package installed by your operating syste
For the Thrift server bug, the best way to get it fixed is to file a
bug report at http://issues.apache.org/jira
HBase 0.20 is out, download here:
http://hadoop.apache.org/hbase/releases.html
There is an HBase mailing list, hbase-u...@hadoop.apache.org.
And yes, I believe you do still need to ke
There is nothing really preventing you from filling your HDFS with a
lot of very small files*, so it would depend on your use case;
however, typical usage of Hadoop would prescribe as large of a block
size as is available, in order to stream very large files off the disk
efficiently.
* Except name
Yes, just run something along the lines of:
hadoop distcp hdfs://local-namenode/path hdfs://ec2-namenode/path
on the job tracker of a MapReduce cluster.
Make sure that your EC2 security group setup allows HDFS access from
the local HDFS cluster and wherever you run MapReduce job from. Also,
I b
11 matches
Mail list logo