Block sizes are per-file, not permanently set on the HDFS. So create
your files with a sufficiently large block size (2G is OK if it fits
your usecase well). This way you won't have block splits, as you
desire.
For example, to upload a file via the shell with a tweaked blocksize, I'd do:
hadoop
Thanks! That's exactly what I want.
And Ted, what do you mean by snapshots and mirrors ?
On 2011/11/8 16:21, Harsh J wrote:
Block sizes are per-file, not permanently set on the HDFS. So create
your files with a sufficiently large block size (2G is OK if it fits
your usecase well). This way you
- Original Message -
From: donal0412 donal0...@gmail.com
Date: Tuesday, November 8, 2011 1:04 pm
Subject: dfs.write.packet.size set to 2G
To: hdfs-user@hadoop.apache.org
Hi,
I want to store lots of files in HDFS, the file size is = 2G.
I don't want the file to split into blocks,because
Hi all,
I was trying to configure Eclipse SDK Version: 3.7.1 with remote Hadoop 0.20.2 .
I am able to configure eclipse such that I can browse HDFS from it. When I try
to execute MapReduce job using run on Hadoop, nothing happens and I get the
error in eclipse logs
Message : Plug-in
By snapshots, I mean that you can freeze a copy of a portion of the the
file system for later use as a backup or reference. By mirror, I mean that
a snapshot can be transported to another location in the same cluster or to
another cluster and the mirrored image will be updated atomically to the
Hi,
I resolved the issue by doing the following :
1. Get hadoop-eclipse-plugin-0.20.203.0
2. Rename to : hadoop-0.20.2-eclipse-plugin
3. Place this plugin jar in eclipse plugins directory
4. Restart eclipse
This works for me and now able to execute MR job through my eclipse.
Thanks
Thats a good point. What is hdfs is used as an archive? We dont really use
it for mapreduce more for archival purposes.
On Mon, Nov 7, 2011 at 7:53 PM, Ted Dunning tdunn...@maprtech.com wrote:
3x replication has two effects. One is reliability. This is probably
more important in large
For archival purposes, you don't need speed (mostly). That eliminates one
argument for 3x replication.
If you have RAID-5 or RAID-6 on your storage nodes, then you eliminate most
of your disk failure costs at the cluster level. This gives you something
like 2.2x replication cost.
You can also
Just recently my hadoop jobs started failing with Could not obtain block.
I recently restarted the cluster but this error has killed 3 jobs
I need some hints on how tp diagnose and fix the problem
d by: java.io.IOException: Could not obtain block:
blk_8697778223665513111_1917303
facing the following
if /bin/sh ./libtool --mode=compile --tag=CC gcc
-DPACKAGE_NAME=\libhdfs\ -DPACKAGE_TARNAME=\libhdfs\
-DPACKAGE_VERSION=\0.1.0\ -DPACKAGE_STRING=\libhdfs\ 0.1.0\
-DPACKAGE_BUGREPORT=\omal...@apache.org\ -DPACKAGE=\libhdfs\
-DVERSION=\0.1.0\ -DSTDC_HEADERS=1
10 matches
Mail list logo