Re: Files vs blocks

2019-01-29 Thread Sudhir Babu Pothineni
Thanks Ramdas and Wei, memory is fine, my only worry about ratio of of files-directories vs Blocks as Wei-Chou mentioned. I will work on this, it’s over partitioned. > On Jan 29, 2019, at 5:02 PM, Ramdas Singh wrote: > > As a rule of thumb for sizing purposes, we should have 1000 MB memory fo

Re: Files vs blocks

2019-01-29 Thread Ramdas Singh
As a rule of thumb for sizing purposes, we should have 1000 MB memory for one million blocks. Thanks, Ramdas On Tue, Jan 29, 2019 at 5:53 PM Wei-Chiu Chuang wrote: > I don't feel this is strictly a small file issue (since I am not seeing > the average file size) > But it looks like your direc

Re: Files vs blocks

2019-01-29 Thread Wei-Chiu Chuang
I don't feel this is strictly a small file issue (since I am not seeing the average file size) But it looks like your directory/file ratio is way too low. I've seen that when Hive creates too many partitions. That can render Hive queries inefficient. On Tue, Jan 29, 2019 at 2:09 PM Sudhir Babu Pot

Re: Files vs blocks

2019-01-29 Thread Ramdas Singh
Hi Sudhir, According to my calculations based on the number of block (144,385,717) comes out close to 132 GB of heap memory. I think you are doing fine. Thanks, Ramdas On Tue, Jan 29, 2019 at 5:09 PM Sudhir Babu Pothineni wrote: > > One of Hadoop cluster I am working > > 85,985,789 files and

Files vs blocks

2019-01-29 Thread Sudhir Babu Pothineni
One of Hadoop cluster I am working 85,985,789 files and directories, 58,399,919 blocks = 144,385,717 total file system objects Heap memory used 132.0 GB of 256 GB Heap Memory. I feel it’s odd the ratio of files vs blocks way higher showing more of small files problem, But the cluster worki