Block Size

2011-09-29 Thread lessonz
I'm new to Hadoop, and I'm trying to understand the implications of a 64M block size in the HDFS. Is there a good reference that enumerates the implications of this decision and its effects on files stored in the system as well as map-reduce jobs? Thanks.

block size

2011-09-20 Thread hao.wang
Hi All: I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file is significantly smaller than the HDFS block size. Then, I want to know whether the small file used 128M in HDFS? regards 2011-09-21 hao.wang

DFS block size

2009-11-14 Thread Hrishikesh Agashe
Hi, Default DFS block size is 64 MB. Does this mean that if I put file less than 64 MB on HDFS, it will not be divided any further? I have lots and lots if XMLs and I would like to process them directly. Currently I am converting them to Sequence files (10 XMLs per sequence file) and the

Re: Block Size

2011-09-29 Thread Chris Smith
On 29 September 2011 18:39, lessonz wrote: > I'm new to Hadoop, and I'm trying to understand the implications of a 64M > block size in the HDFS. Is there a good reference that enumerates the > implications of this decision and its effects on files stored in the system > as w

Re: Block Size

2011-09-29 Thread Uma Maheswara Rao G 72686
hi, Here is some useful info: A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files. Every

Re: block size

2011-09-20 Thread Joey Echeverria
overhead by having to track a larger number of small files. So, if you can merge files, it's best practice to do so. -Joey On Tue, Sep 20, 2011 at 9:54 PM, hao.wang wrote: > Hi All: >   I have lots of small files stored in HDFS. My HDFS block size is 128M. Each > file is significantl

Re: DFS block size

2009-11-14 Thread Amogh Vasekar
Replies inline. On 11/14/09 9:55 PM, "Hrishikesh Agashe" wrote: Hi, Default DFS block size is 64 MB. Does this mean that if I put file less than 64 MB on HDFS, it will not be divided any further? --Yes, file will be stored in single block per replica. I have lots and lots if

Re: DFS block size

2009-11-15 Thread Jeff Hammerbacher
> > Cloudera has a pretty detailed blog on this. > Indeed. See http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/. The post is getting a bit long in the tooth but should contain some useful information for you. Regards, Jeff

Per-file block size

2010-04-13 Thread Andrew Nguyen
I thought I saw a way to specify the block size for individual files using the command-line using "hadoop dfs -put/copyFromLocal..." However, I can't seem to find the reference anywhere. I see that I can do it via the API but no references to a command-line mechanism. Am I j

change HDFS block size

2010-09-07 Thread Gang Luo
Hi all, I need to change the block size (from 128m to 64m) and have to shut down the cluster first. I was wondering what will happen to the current files on HDFS (with 128M block size). Are they still there and usable? If so, what is the block size of those lagacy files? Thanks, -Gang

Re: Re: block size

2011-09-20 Thread hao.wang
Hi, Joey: Thanks for your help! 2011-09-21 hao.wang 发件人: Joey Echeverria 发送时间: 2011-09-21 10:10:54 收件人: common-user 抄送: 主题: Re: block size HDFS blocks are stored as files in the underlying filesystem of your datanodes. Those files do not take a fixed amount of space, so if you

Re: Per-file block size

2010-04-13 Thread Amogh Vasekar
Hi, Pass the -D property in command line. eg: Hadoop fs -Ddfs.block.size= . You can check if its actually set the way you needed by hadoop fs -stat %o HTH, Amogh On 4/14/10 9:01 AM, "Andrew Nguyen" wrote: I thought I saw a way to specify the block size for individual files

Re: change HDFS block size

2010-09-07 Thread Jeff Zhang
Those lagacy files won't change block size (NameNode have the mapping between block and file) only the new added files will have the block size of 128m On Tue, Sep 7, 2010 at 7:27 PM, Gang Luo wrote: > Hi all, > I need to change the block size (from 128m to 64m) and have to sh

Re: change HDFS block size

2010-09-08 Thread Alex Kozlov
The block size is a per-file property, so it will change only for the newly created files. If you want to change the block size for the 'legacy' files, you'll need to recreate them, for example with the distcp command (for the new block size 512M): * hadoop distcp -D dfs.block

Re: change HDFS block size

2010-09-08 Thread Gang Luo
That makes sense. Thanks Alex and Jeff. -Gang - 原始邮件 发件人: Alex Kozlov 收件人: common-user@hadoop.apache.org 发送日期: 2010/9/8 (周三) 1:31:14 下午 主 题: Re: change HDFS block size The block size is a per-file property, so it will change only for the newly created files. If you want to

HDFS block size v.s. mapred.min.split.size

2011-02-17 Thread Boduo Li
Hi, I'm recently benchmarking Hadoop. I know two ways to control the input data size for each map task(): by changing the HDFS block size (have to reload data into HDFS in this case), or by setting mapred.min.split.size. For my benchmarking task, I need to change the input size for a map

RE: HDFS block size v.s. mapred.min.split.size

2011-02-17 Thread Jim Falgout
Generally, if you have large files, setting the block size to 128M or larger is helpful. You can do that on a per file basis or set the block size for the whole filesystem. The larger block size cuts down on the number of map tasks required to handle the overall data size. I've experim

Re: HDFS block size v.s. mapred.min.split.size

2011-02-17 Thread Koji Noguchi
> (mapred.min.split.size can be only set to larger than HDFS block size) > I haven't tried this on a new mapreduce API, but -Dmapred.min.split.size= -Dmapred.map.tasks=1 I think this would let you set a split size smaller than the hdfs block size :) Koji On 2/17/11 2:

io.sort.mb based on HDFS block size

2011-04-12 Thread Shrinivas Joshi
Looking at workloads like TeraSort where intermediate map output is proportional to HDFS block size, I was wondering whether it would be beneficial to have a mechanism for setting buffer spaces like io.sort.mb to be a certain factor of HDFS block size? I am sure there are other config parameters

Re: io.sort.mb based on HDFS block size

2011-04-14 Thread 顾荣
shrinked to some degree which we are not sure. In one word,the data's finally size are uncertain.so,the this fact to config HDFS block size kind of meaningless. Good Luck Walker Gu. 2011/4/13 Shrinivas Joshi > Looking at workloads like TeraSort where intermediate map output is > prop

Re: io.sort.mb based on HDFS block size

2011-04-14 Thread Shrinivas Joshi
Hi Walker, Thanks for your feedback. I was actually thinking that io.sort.mb could be some factor of block size and not equal to block size. This will avoid re-tuning of sort buffer sizes and spill threshold values for different HDFS block sizes. Am I missing something? Thanks, -Shrinivas On

Re: io.sort.mb based on HDFS block size

2011-04-16 Thread 顾荣
ll be combined,if there is a combine function.So,the data size is really uncertain during the process .From HDFS's pespective,it can just feel that the data come group by group ,no idea about the io.sort.mb which is the buffer's total size. that's why I think setting HDFS block size to c

Does changing the block size of MiniDFSCluster work?

2011-04-12 Thread Jason Rutherglen
I'm using the append 0.20.3 branch and am wondering why the following fails, where setting the block size either in the Configuration or the DFSClient.create method causes a failure later on when writing a file out. Configuration conf = new Configuration(); long blockSize = (long)32 * 1024 *

Can I change the block size and then restart?

2009-11-19 Thread Raymond Jennings III
Can I just change the block size in the config and restart or do I have to reformat? It's okay if what is currently in the file system stays at the old block size if that's possible ?

change hdfs block size for file existing on HDFS

2012-06-26 Thread Anurag Tangri
Hi, We have a situation where all files that we have are 64 MB block size. I want to change these files (output of a map job mainly) to 128 MB blocks. What would be good way to do this migration from 64 mb to 128 mb block files ? Thanks, Anurag Tangri

BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread elton sky
Hello, In hdfs.org.apache.hadoop.hdfs.DFSClient .DFSOutputStream.writeChunk(byte[] b, int offset, int len, byte[] checksum) The second last line: int psize = Math.min((int)(blockSize-bytesCurBlock), writePacketSize); When I use blockSize bigger than 2GB, which is out of the boundary of integer

Re: Can I change the block size and then restart?

2009-11-19 Thread Edward Capriolo
On Thu, Nov 19, 2009 at 11:24 AM, Raymond Jennings III wrote: > Can I just change the block size in the config and restart or do I have to > reformat?  It's okay if what is currently in the file system stays at the old > block size if that's possible ? > > > &g

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy KS
Hi Anurag, The easiest option would be , in your map reduce job set the dfs.block.size to 128 mb --Original Message-- From: Anurag Tangri To: hdfs-u...@hadoop.apache.org To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: change hdfs block size for file

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy Ks
b set the dfs.block.size > to 128 mb > > --Original Message-- > From: Anurag Tangri > To: hdfs-u...@hadoop.apache.org > To: common-user@hadoop.apache.org > ReplyTo: common-user@hadoop.apache.org > Subject: change hdfs block size for file existing on HDFS > Sent: Ju

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Harsh J
s for dfs.block.size. MR programs should carry it as well, and you may verify that by checking a job.xml of a job. If it doesn't have the proper value, ensure the submitting user has proper configs with the block size you want them to use. However, folks can still override client configs if th

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread Allen Wittenauer
On Oct 18, 2010, at 3:33 AM, elton sky wrote: > > > When I use blockSize bigger than 2GB, which is out of the boundary of > integer something weird would happen. For example, for a 3GB block it will > create more than 2Million packets. > > Anyone noticed this before? https://issues.apache.o

RE: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread Michael Segel
Ok, I'll bite. Why would you want to use a block size of > 2GB? > Date: Mon, 18 Oct 2010 21:33:34 +1100 > Subject: BUG: Anyone use block size more than 2GB before? > From: eltonsky9...@gmail.com > To: common-user@hadoop.apache.org > > Hello, > > In > hd

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread elton sky
>Why would you want to use a block size of > 2GB? For keeping a maps input split in a single block~ On Tue, Oct 19, 2010 at 9:07 AM, Michael Segel wrote: > > Ok, I'll bite. > Why would you want to use a block size of > 2GB? > > > > > Date: Mon, 18 Oct 2010 2

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread Owen O'Malley
Block sizes larger than 2**31 are known to not work. I haven't ever tracked down the problem, just set my block size to be smaller than that. -- Owen

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread James Seigel
If there is a hard requirement for input split being one block you could just make your input split fit a smaller block size. Just saying, in case you can't overcome the 2G ceiling J Sent from my mobile. Please excuse the typos. On 2010-10-18, at 5:08 PM, "elton sky" wrote

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread Allen Wittenauer
On Oct 18, 2010, at 4:08 PM, elton sky wrote: >> Why would you want to use a block size of > 2GB? > For keeping a maps input split in a single block~ Just use mapred.min.split.size + multifileinputformat.

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread elton sky
I am curious, any specific reason to make it smaller than 2**31? On Tue, Oct 19, 2010 at 10:27 AM, Owen O'Malley wrote: > Block sizes larger than 2**31 are known to not work. I haven't ever tracked > down the problem, just set my block size to be smaller than that. > > -- Owen >

Re: BUG: Anyone use block size more than 2GB before?

2010-10-19 Thread Steve Loughran
On 18/10/10 23:07, Michael Segel wrote: Ok, I'll bite. Why would you want to use a block size of> 2GB? 1. Some of the events coming off large physics devices are single self-contained files of 3+ GB size; having a block size which has an event in a single block guarantees locality

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread M. C. Srivas
I thought the petasort benchmark you published used 12.5G block sizes. How did you make that work? On Mon, Oct 18, 2010 at 4:27 PM, Owen O'Malley wrote: > Block sizes larger than 2**31 are known to not work. I haven't ever tracked > down the problem, just set my block size to

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread Owen O'Malley
The block sizes were 2G. The input format made splits that were more than a block because that led to better performance. -- Owen

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread Alex Kozlov
Hmm, this is interesting: how did it manage to keep the blocks local? Why performance was better? On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley wrote: > The block sizes were 2G. The input format made splits that were more than a > block because that led to better performance. > > -- Owen >

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread Milind A Bhandarkar
If a file of say, 12.5 GB were produced by a single task with replication 3, the default replication policy will ensure that the first replica of each block will be created on local datanode. So, there will be one datanode in the cluster that contains one replica of all blocks of that file. Map

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread elton sky
Milind, You are right. But that only happens when your client is one of the data nodes in HDFS. otherwise a random node will be picked up for the first replica. On Fri, Oct 22, 2010 at 3:37 PM, Milind A Bhandarkar wrote: > If a file of say, 12.5 GB were produced by a single task with replication

Re: BUG: Anyone use block size more than 2GB before?

2010-10-21 Thread Milind A Bhandarkar
That's correct. That is why teragen, the program that generates data to be sorted in terasort is a MR program :-) - Milind On Oct 21, 2010, at 9:47 PM, elton sky wrote: > Milind, > > You are right. But that only happens when your client is one of the data > nodes in HDFS. otherwise a random no

Setting a larger block size at runtime in the DFSClient

2011-04-12 Thread Jason Rutherglen
Are there performance implications to setting the block size to 1 GB or higher (via the DFSClient.create method)?

Re: Setting a larger block size at runtime in the DFSClient

2011-04-12 Thread Harsh J
Hey Jason, On Tue, Apr 12, 2011 at 7:06 PM, Jason Rutherglen wrote: > Are there performance implications to setting the block size to 1 GB > or higher (via the DFSClient.create method)? You'll be streaming 1 complete GB per block to a DN with that value (before the next block gets s

Re: Setting a larger block size at runtime in the DFSClient

2011-04-12 Thread Jason Rutherglen
Harsh, thanks, and sounds good! On Tue, Apr 12, 2011 at 7:08 AM, Harsh J wrote: > Hey Jason, > > On Tue, Apr 12, 2011 at 7:06 PM, Jason Rutherglen > wrote: >> Are there performance implications to setting the block size to 1 GB >> or higher (via the DFSClient.creat

Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Hi, I'm porting a legacy application to hadoop and it uses a bunch of small files. I'm aware that having such small files ain't a good idea but I'm not doing the technical decisions and the port has to be done for yesterday... Of course such small files are a problem, loading 64MB blocks for a few

Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Hi all I met a problem about changing block size from 64M to 128M. I am sure I modified the correct configuration file hdfs-site.xml. Because I can change the replication number correctly. However, it does not work on block size changing. For example: I change the dfs.block.size to 134217728

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Pierre, These are not traditional filesystem blocks - if you save a file smaller than 64MB, you don't lose 64MB of file space.. Hadoop will use 32KB to store a 32KB file (ok, plus a KB of metadata or so), not 64MB. Brian On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote: > Hi, > I'm port

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Hi, thanks for this fast answer :) If so, what do you mean by blocks? If a file has to be splitted, it will be splitted when larger than 64MB? On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman wrote: > Hey Pierre, > > These are not traditional filesystem blocks - if you save a file smaller > th

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
... and by slices of 64MB then I mean... ? On Tue, May 18, 2010 at 2:38 PM, Pierre ANCELOT wrote: > Hi, thanks for this fast answer :) > If so, what do you mean by blocks? If a file has to be splitted, it will be > splitted when larger than 64MB? > > > > > > On Tue, May 18, 2010 at 2:34 PM, Bria

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote: > Hi, thanks for this fast answer :) > If so, what do you mean by blocks? If a file has to be splitted, it will be > splitted when larger than 64MB? > For every 64MB of the file, Hadoop will create a separate block. So, if you have a 32KB fil

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Okay, thank you :) On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman wrote: > > On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote: > > > Hi, thanks for this fast answer :) > > If so, what do you mean by blocks? If a file has to be splitted, it will > be > > splitted when larger than 64MB? > > > >

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
Pierre, Adding to what Brian has said (some things are not explicitly mentioned in the HDFS design doc)... - If you have small files that take up < 64MB you do not actually use the entire 64MB block on disk. - You *do* use up RAM on the NameNode, as each block represents meta-data that needs to b

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Thank you, Any way I can measure the startup overhead in terms of time? On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles wrote: > Pierre, > > Adding to what Brian has said (some things are not explicitly mentioned in > the HDFS design doc)... > > - If you have small files that take up < 64MB you

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
Should be evident in the total job running time... that's the only metric that really matters :) On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT wrote: > Thank you, > Any way I can measure the startup overhead in terms of time? > > > On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles >wrote: > >

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread He Chen
If you know how to use AspectJ to do aspect oriented programming. You can write a aspect class. Let it just monitors the whole process of MapReduce On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles wrote: > Should be evident in the total job running time... that's the only metric > that really ma

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Nyamul Hassan
This is a very interesting thread to us, as we are thinking about deploying HDFS as a massive online storage for a on online university, and then serving the video files to students who want to view them. We cannot control the size of the videos (and some class work files), as they will mostly be

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
Hey Hassan, 1) The overhead is pretty small, measured in a small number of milliseconds on average 2) HDFS is not designed for "online latency". Even though the average is small, if something "bad happens", your clients might experience a lot of delays while going through the retry stack. The

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Konstantin Shvachko
You can also get some performance numbers and answers to the block size dilemma problem here: http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html I remember some people were using Hadoop for storing or streaming videos. Don't know how well that worked. It

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Konstantin Boudnik
I had an experiment with block size of 10 bytes (sic!). This was _very_ slow on NN side. Like writing 5 Mb was happening for 25 minutes or so :( No fun to say the least... On Tue, May 18, 2010 at 10:56AM, Konstantin Shvachko wrote: > You can also get some performance numbers and answers to

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Brian Bockelman
tem with 1 trillion files. On May 18, 2010, at 12:56 PM, Konstantin Shvachko wrote: > You can also get some performance numbers and answers to the block size > dilemma problem here: > > http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html > > I re

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Pierre ANCELOT
Thanks for the sarcasm but with 3 small files and so, 3 Mapper instatiations, even though it's not (and never did I say it was) he only metric that matters, it seem to me lie something very interresting to check out... I have hierarchy over me and they will be happy to understand my choices

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Patrick Angeles
That wasn't sarcasm. This is what you do: - Run your mapreduce job on 30k small files. - Consolidate your 30k small files into larger files. - Run mapreduce ok the larger files. - Compare the running time The difference in runtime is made up by your task startup and seek overhead. If you want to

RE: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Jones, Nick
I'm not familiar with how to use/create them, but shouldn't a HAR (Hadoop Archive) work well in this situation? I thought it was designed to collect several small files together through another level indirection to avoid the NN load and decreasing the HDFS block size. Nick Jones ---

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Todd Lipcon
avoid the > NN load and decreasing the HDFS block size. > > Yes, or CombineFileInputFormat. JVM reuse also helps somewhat, so long as you're not talking about hundreds of thousands of files (in which case it starts to hurt JT load with that many tasks in jobs) There are a number of ways

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-19 Thread Pierre ANCELOT
Okay, sorry then, I misunderstood. I think I could aswell run it on empty files, I would only get task startup overhead. Thank you. On Tue, May 18, 2010 at 11:36 PM, Patrick Angeles wrote: > That wasn't sarcasm. This is what you do: > > - Run your mapreduce job on 30k small files. > - Consolidate

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-19 Thread Konstantin Shvachko
tions, but I suspect they become important over the next 10 years. Brian PS - I starting thinking along these lines during MSST when the LLNL guy was speculating about what it meant to "fsck" a file system with 1 trillion files. On May 18, 2010, at 12:56 PM, Konstantin Shvachko

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-20 Thread Nyamul Hassan
t; central catalog. If you start with a POSIX filesystem namespace (and the >> guarantees it implies), what rules must you relax in order to arrive at DNS? >> On the scale of managing million (billion? ten billion? trillion?) files, >> are any of the assumptions relevant? >

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread Harsh J
Your client (put) machine must have the same block size configuration during upload as well. Alternatively, you may do something explicit like `hadoop dfs -Ddfs.block.size=size -put file file` On Thu, May 5, 2011 at 12:59 AM, He Chen wrote: > Hi all > > I met a problem about changing b

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
not. Chen On Wed, May 4, 2011 at 3:00 PM, Harsh J wrote: > Your client (put) machine must have the same block size configuration > during upload as well. > > Alternatively, you may do something explicit like `hadoop dfs > -Ddfs.block.size=size -put file file` > > On Thu, Ma

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
t a problem. > > I like your second solution. But I am not sure, whether the namenode > will divide those 128MB > > blocks to smaller ones in future or not. > > Chen > > On Wed, May 4, 2011 at 3:00 PM, Harsh J wrote: > >> Your client (put) machine must have the

Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
But I am not sure, whether the namenode >> will divide those 128MB >> >> blocks to smaller ones in future or not. >> >> Chen >> >> On Wed, May 4, 2011 at 3:00 PM, Harsh J wrote: >> >>> Your client (put) machine must have the same block size config

Question regardin the block size and the way that a block is used in Hadoop

2011-03-12 Thread Florin Picioroaga
aller than a single block does not occupy a full block’s worth of underlying storage." I can understand the physical space left from the initial block size will be free. My question is can the underlying operating reuse/write this remained free space? I'll look forward for your answers. Thank you, Florin

How I can assume the proper a block size if the input file size is dynamic?

2011-02-22 Thread Jun Young Kim
hi, all. I know dfs.blocksize key can affect the performance of a hadoop. in my case, I have thousands of directories which are including so many different sized input files. (file sizes are from 10K to 1G). in this case, How I can assume the dfs.blocksize to get a best performance? 11/02/22

Re: Question regardin the block size and the way that a block is used in Hadoop

2011-03-12 Thread James Seigel
ot large enough to occupy the full > size of the block. > > From the statement (cite from the book) > "Unlike a filesystem for a single disk, a file in HDFS that is smaller than a > single block does not occupy a full block’s worth of underlying storage." I > can understa

Re: How I can assume the proper a block size if the input file size is dynamic?

2011-02-22 Thread Tish Heyssel
Yeah, That's not gonna work. You need to pre-process your input files to concatenate them into larger files and then set your dfs.blocksize accordingly. Otherwise your jobs will be slow, slow slow. tish On Tue, Feb 22, 2011 at 3:57 AM, Jun Young Kim wrote: > hi, all. > > I know dfs.blocksize

Re: How I can assume the proper a block size if the input file size is dynamic?

2011-02-22 Thread Jun Young Kim
currenly, I got a problem to reduce the output of mappers. 11/02/23 09:57:45 INFO input.FileInputFormat: Total input paths to process : 4157 11/02/23 09:57:47 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 11/02/23 09:57:47 INFO mapreduce.JobSubmitter: