Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-04 Thread Ajantha Bhat
Hi David, a) Compressing table status is good. But need to check the decompression overhead and how much overall benefit we can get. b) I suggest we can keep multiple 10MB files (or configurable), then read it distributed way. c) Once read all the table status files better to cache them at driver

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-03 Thread akashrn5
Hi David, After discussing with you its little bit clear, let me just summarize in some lines *Goals* 1. reduce the size of status file (which reduces overall size wit some MBs) 2. make table status file less prone to failures, and fast reading during read *For the above goals with your solution

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-03 Thread David CaiQiang
Hi Akash 2. new tablestsatus, only store the lastest status file name, not all status files. status file will store all segment metadata (just like old tablestatus) 3. if we have delta file, no need to read status file for each query. only reading delta file is enough if status file not chang

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-03 Thread akashrn5
Hi david, Thanks for starting this discussion, i have some questions and inputs 1. in solution 1, it just plane compression, where we will get the benefit of size, but still we will face, reliability issues in case of concurrency. So can be -1. 2. solution 2 writing, and reading to separate fil

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-01 Thread David CaiQiang
add solution 4 to separate the status file by segment status *solution 4:* Based on solution 2, support status.inprogress 1) new tablestatus file format { "statusFileName":"status-uuid1", "inProgressStatusFileName": "status-uuid2.inprogess", "updateStatusFileName":"updatest

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-01 Thread Zhangshunyu
solution2, +1 - My English name is Sunday -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-01 Thread David CaiQiang
[Background] Now the size of one segment metadata entry is about 200 bytes in the tablestatus file. if the table has 1 million segments and the mean size of segments is 1GB(means the table size is 1PB), the size of the tablestatus file will reach 200MB. Any reading/writing operation on this table