Re: Understanding the relationship between block size and RPC / IPC length?

2019-11-08 Thread Wei-Chiu Chuang
`hdfs fsck -files -blocks -locations` and the largest block is of > length `1342177728`. > > > > - Is there some overhead for RPC calls? Could a block of length > `1342177728` be resulting in the original warning log at the top of this > post? > > - My understanding is that the only way

Understanding the relationship between block size and RPC / IPC length?

2019-11-08 Thread Carey, Paul
is of length `1342177728`. - Is there some overhead for RPC calls? Could a block of length `1342177728` be resulting in the original warning log at the top of this post? - My understanding is that the only way a client writing to HDFS can specify a block size is via either `-Ddfs.blocksize

Block size 1GB, 2GB

2018-09-06 Thread Sudhir Babu Pothineni
Hi, I am trying to use larger block size like 1GB, 2GB. In our case the files are 5GB to 12GB size and we process whole file per mapper, is there any side effects to using larger block size like 1GB, 2GB? Like HDFS stability when doing replication? Thanks Sudhir

Block size 1GB, 2GB

2018-09-06 Thread Sudhir Babu Pothineni
Hi, I am trying to use larger block size like 1GB, 2GB. In our case the files are 5GB to 12GB size and we process whole file per mapper, is there any side effects to using larger block size like 1GB, 2GB? Like HDFS stability when doing replication? Thanks Sudhir

RE: Hdfs default block size

2017-05-22 Thread Sidharth Kumar
/hdfs-default.xml > > > > Regards > > Surendra > > > > > > *From:* Sidharth Kumar [mailto:sidharthkumar2...@gmail.com] > *Sent:* 22 May 2017 19:36 > *To:* common-u...@hadoop.apache.org > *Subject:* Hdfs default block size > > > > Hi, >

RE: Hdfs default block size

2017-05-22 Thread surendra lilhore
block size Hi, Can you kindly tell me what is the default block size in apache hadoop 2.7.3? Is it 64mb or 128mb? Thanks Sidharth

Hdfs default block size

2017-05-22 Thread Sidharth Kumar
Hi, Can you kindly tell me what is the default block size in apache hadoop 2.7.3? Is it 64mb or 128mb? Thanks Sidharth

Re: distcp fails with "source and target differ in block-size"

2016-05-24 Thread Chris Nauroth
There is also some discussion on that JIRA considering a checksum strategy independent of block size. I don't think anything was ever implemented though, and there would be some drawbacks to that approach. Sorry if this caused confusion. --Chris Nauroth On 5/24/16, 9:55 AM, "D

Re: distcp fails with "source and target differ in block-size"

2016-05-24 Thread Dmitry Sivachenko
h. The message to the user recommends either > the -pb (preserve block size) or -skipCrc (skip checksum validation) as > potential workarounds. The intent of that patch was not to silently > proceed and report success when the block sizes are different, although > there was some discus

Re: distcp fails with "source and target differ in block-size"

2016-05-24 Thread Chris Nauroth
Hello Dmitry, To clarify, the intent of MAPREDUCE-5065 was to message the user that using different block sizes on source and destination might cause a failure to checksum mismatch. The message to the user recommends either the -pb (preserve block size) or -skipCrc (skip checksum validation

Re: distcp fails with "source and target differ in block-size"

2016-05-22 Thread Dmitry Sivachenko
or a long time. Are >> you certain that you passed a dfs.blocksize equal to what was used in the >> source files? Did all source files use the same block size? >> > > > No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize (I > want to change it d

Re: distcp fails with "source and target differ in block-size"

2016-05-21 Thread Dmitry Sivachenko
? Did all source files use the same block size? > No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize (I want to change it during the copy). I am not sure that all source files use the same block size (there are thousands of them), but it is probably wrong to report error when

Re: distcp fails with "source and target differ in block-size"

2016-05-20 Thread Chris Nauroth
Hello Dmitry, MAPREDUCE-5065 has been included in these branches for a long time. Are you certain that you passed a dfs.blocksize equal to what was used in the source files? Did all source files use the same block size? --Chris Nauroth On 5/20/16, 3:30 PM, "Dmitry Sivachenko&quo

distcp fails with "source and target differ in block-size"

2016-05-20 Thread Dmitry Sivachenko
Hello, When I copy files with distcp and -D dfs.blocksize=XXX (hadoop-2.7.2), it fails with "Source and target differ in block-size" error despite MAPREDUCE-5065 was committed 3 years ago. Is it possible to merge this change to 2.7 / 2.8 branche

Specifying replication factor block size during distcp

2015-08-19 Thread Varun Sharma
Hi, I am running a Distcp programmatically from Hadoop cluster to another - using Hadoop 2.7 and distcp v2. I would like to set a custom block size and replication factor for my files. How can I achieve that ? Thanks ! Varun

Re: Specifying replication factor block size during distcp

2015-08-19 Thread Ted Yu
Hadoop 2.7 and distcp v2. I would like to set a custom block size and replication factor for my files. How can I achieve that ? Thanks ! Varun

Re: Specifying replication factor block size during distcp

2015-08-19 Thread nataraj jonnalagadda
Not sure if this feature is available. A workaround would be to update replication factor and block size at the HDFS level and reverting the changes after the distcp is complete. This is good for a one time copy. :-) On Wed, Aug 19, 2015 at 12:52 PM, Ted Yu yuzhih...@gmail.com wrote: I looked

Re: Smaller block size for more intense jobs

2015-05-13 Thread Marko Dinic
? Best regards, Marko On Wed 13 May 2015 06:17:58 AM CEST, Harshit Mathur wrote: Hi Marko, If your files are very small (less than the block size) then a lot of map tasks will get executed, but as the initialization and overheads degrades the overall performance, so it might appear that the single map

Re: Question about Block size configuration

2015-05-12 Thread Himawan Mahardianto
thank you for the explanation, and how much byte each metadata will consuming in RAM if BS is 64MB or smaller than that? I heard every metadata will store on RAM right?

Smaller block size for more intense jobs

2015-05-12 Thread marko.dinic
Hello, I'm in doubt should I specify the block size to be smaller than 64MB in case that my mappers need to do intensive computations? I know that it is better to have larger files, since the replication and NameNode as a weak point, but I'm don't have that much data, but the operations

Re: Question about Block size configuration

2015-05-12 Thread Drake민영근
Hi I think metadata size is not greatly different. The problem is the number of blocks. The block size is lesser than 64MB, more block generated with the same file size(if 32MB then 2x more blocks). And, yes. all metadata is in the namenode's heap memory. Thanks. Drake 민영근 Ph.D kt NexR

Re: Smaller block size for more intense jobs

2015-05-12 Thread Harshit Mathur
Hi Marko, If your files are very small (less than the block size) then a lot of map tasks will get executed, but as the initialization and overheads degrades the overall performance, so it might appear that the single map is executing very fast but the overall job execution will take more time

Re: Question about Block size configuration

2015-05-11 Thread Krishna Kishore Bonagiri
The default HDFS block size 64 MB means, it is the maximum size of block of data written on HDFS. So, if you write 4 MB files, they will still be occupying only 1 block of 4 MB size, not more than that. If your file is more than 64MB, it gets split into multiple blocks. If you set the HDFS block

Re: Question about Block size configuration

2015-05-11 Thread Alexander Alten-Lorenz
: The default HDFS block size 64 MB means, it is the maximum size of block of data written on HDFS. So, if you write 4 MB files, they will still be occupying only 1 block of 4 MB size, not more than that. If your file is more than 64MB, it gets split into multiple blocks. If you set the HDFS block

Question about Block size configuration

2015-05-11 Thread Himawan Mahardianto
Hi guys, I have a couple question about HDFS block size: What if I set my HDFS block size from default 64 MB to 2 MB each block, what will gonna happen? I decrease the value of a block size because I want to store an image file (jpeg, png etc) that have size about 4MB each file, what is your

can block size for namenode be different from datanode block size?

2015-03-25 Thread Dr Mich Talebzadeh
Hi, The block size for HDFS is currently set to 128MB by defauilt. This is configurable. My point is that I assume this parameter in hadoop-core.xml sets the block size for both namenode and datanode. However, the storage and random access for metadata in nsamenode is different and suits

can block size for namenode be different from datanode block size?

2015-03-25 Thread Dr Mich Talebzadeh
Hi, The block size for HDFS is currently set to 128MB by defauilt. This is configurable. My point is that I assume this parameter in hadoop-core.xml sets the block size for both namenode and datanode. However, the storage and random access for metadata in nsamenode is different and suits

Re: can block size for namenode be different from datanode block size?

2015-03-25 Thread Mirko Kämpf
Hi Mich, please see the comments in your text. 2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk: Hi, The block size for HDFS is currently set to 128MB by defauilt. This is configurable. Correct, an HDFS client can overwrite the cfg-property and define a different block

Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mirko Kämpf
*ReplyTo: * user@hadoop.apache.org *Subject: *Re: can block size for namenode be different from datanode block size? Hi Mich, please see the comments in your text. 2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk: Hi, The block size for HDFS is currently set to 128MB

Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
? Regards, Mich Let your email find you with BlackBerry from Vodafone -Original Message- From: Mirko Kämpf mirko.kae...@gmail.com Date: Wed, 25 Mar 2015 15:20:03 To: user@hadoop.apache.orguser@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Re: can block size for namenode

Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mirko Kämpf
, 25 Mar 2015 16:08:02 + *To: *user@hadoop.apache.orguser@hadoop.apache.org; m...@peridale.co.uk *ReplyTo: * user@hadoop.apache.org *Subject: *Re: can block size for namenode be different from wdatanode block size? Correct, let's say you run the NameNode with just 1GB of RAM. This would

Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
...@gmail.com Date: Wed, 25 Mar 2015 16:08:02 To: user@hadoop.apache.orguser@hadoop.apache.org; m...@peridale.co.uk Reply-To: user@hadoop.apache.org Subject: Re: can block size for namenode be different from wdatanode block size? Correct, let's say you run the NameNode with just 1GB of RAM

Re: can block size for namenode be different from datanode block size?

2015-03-25 Thread Ravi Prakash
Hi Mich! The block size you are referring to is used only on the datanodes. The file that the namenode writes (fsimage OR editlog) is not chunked using this block size. HTHRavi On Wednesday, March 25, 2015 8:12 AM, Dr Mich Talebzadeh m...@peridale.co.uk wrote: Hi, The block

Re: Can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Harsh J
2. The block size is only relevant to DataNodes (DN). NameNode (NN) does not use this parameter Actually, as a configuration, its only relevant to the client. See also http://www.quora.com/How-do-I-check-HDFS-blocksize-default-custom Other points sound about right, except the ability to do

Can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
Thank you all for your contribution. I have summarised the findings as below 1. The Hadoop block size is a configurable parameter dfs.block.size in bytes . By default this is set to 134217728 bytes or 128MB 2. The block size is only relevant to DataNodes (DN). NameNode (NN) does

Default Block Size in HDFS

2015-02-22 Thread Krish Donald
Hi, I have read somewhere that default block size in Hadoop 2.4 is 256MB . Is it correct ? In which version default block size was 128MB ? Thanks Krish

Re: Default Block Size in HDFS

2015-02-22 Thread Ulul
://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Cheers On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com mailto:gotomyp...@gmail.com wrote: Hi, I have read somewhere that default block size in Hadoop 2.4 is 256MB . Is it correct

Re: Default Block Size in HDFS

2015-02-22 Thread Ted Yu
As of Hadoop 2.6, default blocksize is 128 MB (look for dfs.blocksize) https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Cheers On Sun, Feb 22, 2015 at 11:11 AM, Krish Donald gotomyp...@gmail.com wrote: Hi, I have read somewhere that default block size

HDFS block size question

2014-12-10 Thread Sajid Syed
Hello All, If the HDFS block size is to 128MB on the cluster and on the Client its set to 64MB, What will be the size of the Block when its written to HDFS.? Can any please point me to the link where I can find more information. Thanks Sajeeth

Re: HDFS block size question

2014-12-10 Thread Shahab Yunus
is not specific to this parameter – for example, the same thing happens with*dfs.replication* and others…. Regards, Shahab On Wed, Dec 10, 2014 at 3:27 PM, Sajid Syed sajid...@gmail.com wrote: Hello All, If the HDFS block size is to 128MB on the cluster and on the Client its set to 64MB

RE: HDFS block size question

2014-12-10 Thread johny casanova
Hi, although the cluster has 128MB the client always goes with the configuration local to it. So in this case it will use the 64MB. Date: Wed, 10 Dec 2014 15:27:07 -0500 Subject: HDFS block size question From: sajid...@gmail.com To: user@hadoop.apache.org Hello All, If the HDFS block

Re: issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-25 Thread Stanley Shi
Your client side was running at 14/07/24 18:35:58 INFO mapreduce.Job: T***, But you are pasting NN log at 2014-07-24 17:39:34,255; By the way, which version of HDFS are you using? Regards, *Stanley Shi,* On Fri, Jul 25, 2014 at 10:36 AM, ch huang justlo...@gmail.com wrote: 2014-07-24

issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-24 Thread ch huang
hi,maillist: i try to copy data from my old cluster to new cluster,i get error ,how to handle this? 14/07/24 18:35:58 INFO mapreduce.Job: Task Id : attempt_1406182801379_0004_m_00_1, Status : FAILED Error: java.io.IOException: File copy failed:

Re: issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-24 Thread Stanley Shi
Would you please also past the corresponding namenode log? Regards, *Stanley Shi,* On Fri, Jul 25, 2014 at 9:15 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i try to copy data from my old cluster to new cluster,i get error ,how to handle this? 14/07/24 18:35:58 INFO

Re: issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-24 Thread ch huang
2014-07-24 17:33:04,783 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby 2014-07-24 17:33:05,742 WARN

Re: Block size

2014-01-03 Thread Shahab Yunus
+default+block+sizesource=blots=pXxPwd2Hv8sig=xbcUaJ10zOkLqOPw4u35ZTRPCVchl=ensa=Xei=gujGUs-eK4ulsASMi4GIDwved=0CFYQ6AEwAw#v=onepageq=change%20hadoop%20default%20block%20sizef=false You can do it programmatically as well. http://stackoverflow.com/questions/2669800/changing-the-block-size-of-a-dfs

Re: Block size

2014-01-03 Thread David Sinclair
Change the dfs.block.size in hdfs-site.xml to be the value you would like if you want to have all new files have a different block size. On Fri, Jan 3, 2014 at 11:37 AM, Kurt Moesky kurtmoe...@gmail.com wrote: I see the default block size for HDFS is 64 MB, is this a value that can be changed

RE: Block size

2014-01-03 Thread German Florez-Larrahondo
Also note that the block size in recent releases is actually called dfs.blocksize as opposed to dfs.block.size, and that you can set it per job as well. In that scenario, just pass it as an argument to your job (e.g. Hadoop bla -D dfs.blocksize= 134217728) Regards From: David Sinclair

Re: Block size

2014-01-03 Thread Zhao, Xiaoguang
As I am new to hdfs, I was told that the minimize block size is 64M, is it correct? XG 在 2014年1月4日,3:12,German Florez-Larrahondo german...@samsung.commailto:german...@samsung.com 写道: Also note that the block size in recent releases is actually called “dfs.blocksize” as opposed

Re: Block size

2014-01-03 Thread Harsh J
, Zhao, Xiaoguang xiaoguang.z...@honeywell.com wrote: As I am new to hdfs, I was told that the minimize block size is 64M, is it correct? XG 在 2014年1月4日,3:12,German Florez-Larrahondo german...@samsung.com 写道: Also note that the block size in recent releases is actually called “dfs.blocksize

RE: modify hdfs block size

2013-09-10 Thread Brahma Reddy Battula
You can change the block size of existing files with a command like hadoop distcp -Ddfs.block.size=$[256*1024*1024] /path/to/inputdata /path/to/inputdata-with-largeblocks. After this command completes, you can remove the original data From: kun yan

Re: modify hdfs block size

2013-09-10 Thread Vinayakumar B
to decide based on your usecase. Regards, Vinayakumar B On Sep 10, 2013 9:02 AM, kun yan yankunhad...@gmail.com wrote: Hi all Can I modify HDFS data block size is 32MB, I know the default is 64MB thanks -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one

modify hdfs block size

2013-09-09 Thread kun yan
Hi all Can I modify HDFS data block size is 32MB, I know the default is 64MB thanks -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com

Re: modify hdfs block size

2013-09-09 Thread Harsh J
HDFS data block size is 32MB, I know the default is 64MB thanks -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com -- Harsh J

Re: modify hdfs block size

2013-09-09 Thread kun yan
AM, kun yan yankunhad...@gmail.com wrote: Hi all Can I modify HDFS data block size is 32MB, I know the default is 64MB thanks -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad

Why big block size for HDFS.

2013-03-31 Thread Rahul Bhattacharjee
is , no matter what the block size we decide , finally its getting written to the computers HDD , which would be formatted and would have a block size in KB's and also while writing to the FS (not HDFS) , its not guaranteed that the blocks that we write are continuous , so there would be disk seeks anyways

RE: Why big block size for HDFS.

2013-03-31 Thread John Lilley
From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] Subject: Why big block size for HDFS. Many places it has been written that to avoid huge no of disk seeks , we store big blocks in HDFS , so that once we seek to the location , then there is only data transfer rate which would

Re: Why big block size for HDFS.

2013-03-31 Thread Azuryy Yu
this correctly. My question is , no matter what the block size we decide , finally its getting written to the computers HDD , which would be formatted and would have a block size in KB's and also while writing to the FS (not HDFS) , its not guaranteed that the blocks that we write

Re: Why big block size for HDFS.

2013-03-31 Thread Rahul Bhattacharjee
[mailto:rahul.rec@gmail.com] *Subject:* Why big block size for HDFS. ** ** Many places it has been written that to avoid huge no of disk seeks , we store big blocks in HDFS , so that once we seek to the location , then there is only data transfer rate which would be predominant , no more

block-size vs split-size

2012-11-27 Thread Kartashov, Andy
Guys, I understand that if not specified, default block size of HDFs is 64Mb. You can control this value by altering dfs.block.size property and increasing to value to 64Mb x 2 or 64Mb x 4.. Every time we make the change to this property we must reimport the data for the changes to take effect

Re: block-size vs split-size

2012-11-27 Thread Harsh J
Hi, Response inline. On Tue, Nov 27, 2012 at 8:35 PM, Kartashov, Andy andy.kartas...@mpac.ca wrote: Guys, I understand that if not specified, default block size of HDFs is 64Mb. You can control this value by altering dfs.block.size property and increasing to value to 64Mb x 2 or 64Mb x 4

RE: block-size vs split-size

2012-11-27 Thread Kartashov, Andy
Thanks Harsh. I totally forgot about the locality thing. I take it, for the best perfomance it is better leave the split size property alone and let the framework handle the splits on the basis of the block size. p.s. There were meant to be only 5 questions. Rgds, AK47 -Original Message

Re: block-size vs split-size

2012-11-27 Thread Mohammad Tariq
AM, Kartashov, Andy andy.kartas...@mpac.cawrote: Thanks Harsh. I totally forgot about the locality thing. I take it, for the best perfomance it is better leave the split size property alone and let the framework handle the splits on the basis of the block size. p.s. There were meant

block size

2012-11-20 Thread Kartashov, Andy
Guys, After changing property of block size from 64 to 128Mb, will I need to re-import data or will running hadoop balancer will resize blocks in hdfs? Thanks, AK NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use

RE: block size

2012-11-20 Thread Kartashov, Andy
Cheers! From: Kai Voigt [mailto:k...@123.org] Sent: Tuesday, November 20, 2012 11:34 AM To: user@hadoop.apache.org Subject: Re: block size Hi, Am 20.11.2012 um 17:31 schrieb Kartashov, Andy andy.kartas...@mpac.camailto:andy.kartas...@mpac.ca: After changing property of block size from 64

HDFS block size

2012-11-16 Thread Pankaj Gupta
Hi, I apologize for asking a question that has probably been discussed many times before, but I just want to be sure I understand it correctly. My question is regarding the advantages of large block size in HDFS. The Hadoop Definitive Guide provides comparison with regular file systems

Re: HDFS block size

2012-11-16 Thread Andy Isaacson
understand, the data node stores data on a regular file system. If this is so then how does having a bigger HDFS block size provide better seek performance, when the data will ultimately be read from regular file system which has much smaller block size. Suppose that HDFS stored data in smaller

Re: HDFS block size

2012-11-16 Thread Pankaj Gupta
on a regular file system. If this is so then how does having a bigger HDFS block size provide better seek performance, when the data will ultimately be read from regular file system which has much smaller block size. Suppose that HDFS stored data in smaller blocks (64kb for example

Re: HDFS block size

2012-11-16 Thread Pankaj Gupta
. For HDFS, this is variable in size since blocks can be smaller than the max size. The key problem with a large size here is that it is relatively difficult to allow quick reading of the file during writing. With a smaller block size, the block can be committed in a way that the reader can read

Re: File block size use

2012-10-09 Thread Anna Lahoud
...@gmail.com wrote: ** Hi Anna If you want to increase the block size of existing files. You can use a Identity Mapper with no reducer. Set the min and max split sizes to your requirement (512Mb). Use SequenceFileInputFormat and SequenceFileOutputFormat for your job. Your job should be done

Re: File block size use

2012-10-09 Thread Anna Lahoud
Raj - I was not able to get this to work either. On Tue, Oct 2, 2012 at 10:52 AM, Raj Vishwanathan rajv...@yahoo.com wrote: I haven't tried it but this should also work hadoop fs -Ddfs.block.size=NEW BLOCK SIZE -cp src dest Raj -- *From:* Anna Lahoud

Re: File block size use

2012-10-09 Thread Raj Vishwanathan
Anna I misunderstood your problem. I thought you wanted to change the block size of every file. I didn' t realize that you were aggregating multiple small files into different, albeit smaller, set of larger files of a bigger block size  to improve performance.  I think as Chris suggested you

Re: File block size use

2012-10-02 Thread Anna Lahoud
Thank you. I will try today. On Tue, Oct 2, 2012 at 12:23 AM, Bejoy KS bejoy.had...@gmail.com wrote: ** Hi Anna If you want to increase the block size of existing files. You can use a Identity Mapper with no reducer. Set the min and max split sizes to your requirement (512Mb). Use

Re: File block size use

2012-10-02 Thread Raj Vishwanathan
I haven't tried it but this should also work  hadoop  fs  -Ddfs.block.size=NEW BLOCK SIZE -cp  src dest Raj From: Anna Lahoud annalah...@gmail.com To: user@hadoop.apache.org; bejoy.had...@gmail.com Sent: Tuesday, October 2, 2012 7:17 AM Subject: Re: File

File block size use

2012-10-01 Thread Anna Lahoud
and IdentityReducer. Although that approaches a better solution, it still requires that I know in advance how many reducers I need to get better file sizes. I was looking at the SequenceFile.Writer constructors and noticed that there are block size parameters that can be used. Using a writer constructed

Re: File block size use

2012-10-01 Thread Chris Nauroth
Hello Anna, If I understand correctly, you have a set of multiple sequence files, each much smaller than the desired block size, and you want to concatenate them into a set of fewer files, each one more closely aligned to your desired block size. Presumably, the goal is to improve throughput

Re: File block size use

2012-10-01 Thread Bejoy KS
Hi Anna If you want to increase the block size of existing files. You can use a Identity Mapper with no reducer. Set the min and max split sizes to your requirement (512Mb). Use SequenceFileInputFormat and SequenceFileOutputFormat for your job. Your job should be done. Regards Bejoy KS

change hdfs block size for file existing on HDFS

2012-06-26 Thread Anurag Tangri
Hi, We have a situation where all files that we have are 64 MB block size. I want to change these files (output of a map job mainly) to 128 MB blocks. What would be good way to do this migration from 64 mb to 128 mb block files ? Thanks, Anurag Tangri

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy KS
Hi Anurag, The easiest option would be , in your map reduce job set the dfs.block.size to 128 mb --Original Message-- From: Anurag Tangri To: hdfs-u...@hadoop.apache.org To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: change hdfs block size for file

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy Ks
reduce job set the dfs.block.size to 128 mb --Original Message-- From: Anurag Tangri To: hdfs-u...@hadoop.apache.org To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: change hdfs block size for file existing on HDFS Sent: Jun 26, 2012 11:07 Hi, We

Re: change hdfs block size for file existing on HDFS

2012-06-26 Thread Bejoy KS
Hi Anurag, The easiest option would be , in your map reduce job set the dfs.block.size to 128 mb --Original Message-- From: Anurag Tangri To: hdfs-user@hadoop.apache.org To: common-u...@hadoop.apache.org ReplyTo: common-u...@hadoop.apache.org Subject: change hdfs block size for file

Re: Newbie question on block size calculation

2012-02-23 Thread viva v
Thanks very much for the clarification. So, we'd i guess ideally set the block size equal to the transfer rate for optimum results. If seek time has to be 0.5% of transfer time would i set my block size at 200MB (higher than transfer rate)? Conversely if seek time has to be 2% of transfer time

Newbie question on block size calculation

2012-02-21 Thread viva v
Have just started getting familiar with Hadoop HDFS. Reading Tom White's book. The book describes an example related to HDFS block size. Here's a verbatim excerpt from the book If the seek time is around 10 ms, and the transfer rate is 100 MB/s, then to make the seek time 1% of the transfer

Block Size

2011-09-29 Thread lessonz
I'm new to Hadoop, and I'm trying to understand the implications of a 64M block size in the HDFS. Is there a good reference that enumerates the implications of this decision and its effects on files stored in the system as well as map-reduce jobs? Thanks.

Re: Block Size

2011-09-29 Thread Chris Smith
On 29 September 2011 18:39, lessonz less...@q.com wrote: I'm new to Hadoop, and I'm trying to understand the implications of a 64M block size in the HDFS. Is there a good reference that enumerates the implications of this decision and its effects on files stored in the system as well as map

Re: Block Size

2011-09-29 Thread Uma Maheswara Rao G 72686
hi, Here is some useful info: A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files. Every

block size

2011-09-20 Thread hao.wang
Hi All: I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file is significantly smaller than the HDFS block size. Then, I want to know whether the small file used 128M in HDFS? regards 2011-09-21 hao.wang

Re: block size

2011-09-20 Thread Joey Echeverria
overhead by having to track a larger number of small files. So, if you can merge files, it's best practice to do so. -Joey On Tue, Sep 20, 2011 at 9:54 PM, hao.wang hao.w...@ipinyou.com wrote: Hi All:   I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file

Re: Re: block size

2011-09-20 Thread hao.wang
Hi, Joey: Thanks for your help! 2011-09-21 hao.wang 发件人: Joey Echeverria 发送时间: 2011-09-21 10:10:54 收件人: common-user 抄送: 主题: Re: block size HDFS blocks are stored as files in the underlying filesystem of your datanodes. Those files do not take a fixed amount of space, so if you

RE: set reduced block size for a specific file

2011-08-30 Thread Ben Clay
Todd- Ouch. I'm stuck with 0.21 for the near future, so I'll just write a small app that copies a file using a different block size. For reference, the config dir override using the following command did not work either: HADOOP_CONF_DIR=mycustomconf bin/hadoop dfs -put /src/path /dest/path

Re: set reduced block size for a specific file

2011-08-27 Thread Ted Dunning
- Original Message - From: Ben Clay rbc...@ncsu.edu Date: Saturday, August 27, 2011 10:03 pm Subject: set reduced block size for a specific file To: hdfs-user@hadoop.apache.org I'd like to set a lowered block size for a specific file. IE, if HDFS is configured to use 64mb blocks, I'd like

Re: set reduced block size for a specific file

2011-08-27 Thread Allen Wittenauer
. 1. Copy $HADOOP_CONF_DIR or $HADOOP_HOME/conf to a dir 2. modify the hdfs-site.xml to have your new block size 3. Run the following: HADOOP_CONF_DIR=mycustomconf hadoop dfs -put file dir Convenient? No. Doable? Definitely.

RE: set reduced block size for a specific file

2011-08-27 Thread Ben Clay
I didn't even think of overriding the config dir. Thanks for the tip! -Ben -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Saturday, August 27, 2011 6:42 PM To: hdfs-user@hadoop.apache.org Cc: rbc...@ncsu.edu Subject: Re: set reduced block size for a specific

Re: set reduced block size for a specific file

2011-08-27 Thread Aaron T. Myers
for lack of features/bugs. 1. Copy $HADOOP_CONF_DIR or $HADOOP_HOME/conf to a dir 2. modify the hdfs-site.xml to have your new block size 3. Run the following: HADOOP_CONF_DIR=mycustomconf hadoop dfs -put file dir Convenient? No. Doable? Definitely.

Re: set reduced block size for a specific file

2011-08-27 Thread Praveen Sripati
! -Ben -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Saturday, August 27, 2011 6:42 PM To: hdfs-user@hadoop.apache.org Cc: rbc...@ncsu.edu Subject: Re: set reduced block size for a specific file On Aug 27, 2011, at 12:42 PM, Ted Dunning wrote

The problem about block size

2011-08-10 Thread 程笑
Hi, I have established a Hadoop cluster with one NameNode and two DataNodes. Now I have a question about block size. I site the block size for 64MB.I stor one text file (50MB) on the HDFS. Whether this text file is splited? If not, the text file stor on which DataNodes? I user MapReduce

Re: The problem about block size

2011-08-10 Thread bejoy . hadoop
If 64 mb is your hdfs block size then the 50 mb file won't be splitted, would be stored in a single block in hdfs. AFAIK which data node or rather which all data nodes is decided by the name node. The block would be replicated and stored, in default the replication factor is 3. So in your case

Block Size

2011-06-17 Thread snedix
Hi all, I wanna ask question, Is there a reason why block size should be set to some 2^N, for some integer N ? Does it help with block defragmentation etc. ? Thanks in advance..

Re: Block Size

2011-06-17 Thread elton sky
question, Is there a reason why block size should be set to some 2^N, for some integer N ? Does it help with block defragmentation etc. ? Thanks in advance..

Re: Block size in HDFS

2011-06-13 Thread Allen Wittenauer
FYI, I've added this to the FAQ since it comes up every so often.

  1   2   3   >