Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-04 Thread Ajantha Bhat
Hi David, a) Compressing table status is good. But need to check the decompression overhead and how much overall benefit we can get. b) I suggest we can keep multiple 10MB files (or configurable), then read it distributed way. c) Once read all the table status files better to cache them at driver

Re: [Discussion] Segment management enhance

2020-09-04 Thread Ajantha Bhat
Hi David, a) Recently we tested huge concurrent load and compactions but never faced two loads using same segment id issue (because of table status lock in recordNewLoadMetadata), so I am not sure whether we really need to update to UUID. b) And about other segment interfaces, we have to refactor

Re: [Discussion] Update feature enhancement

2020-09-04 Thread Ajantha Bhat
Hi David. Thanks for proposing this. *+1 from my side.* I have seen users with 200K segments table stored in cloud. It will be really slow to reload all the segments where update happened for indexes like SI, min-max, MV. So, it is good to write as a new segment and just load new segment indexes

Re: [Discussion] Update feature enhancement

2020-09-04 Thread David CaiQiang
Hi Akash, 3. Update operation contain a insert operation. Update operation will do the same thing how the insert operation process this issue. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Segment management enhance

2020-09-04 Thread Kunal Kapoor
Hi David, Then better we keep a mapping for the segment UUID to virtual segment number in the table status file as well, Any API through which the user can get the segment details should return the virtual segment id instead of the UUID. On Fri, Sep 4, 2020 at 12:59 PM David CaiQiang wrote: > Hi

Re: [Discussion] Segment management enhance

2020-09-04 Thread David CaiQiang
Hi Kunal, 1. The user uses SQL API or other interfaces. This UUID is a transaction id, and we already stored the timestamp and other informations in the segment metadata. This transaction id can be used in the loading/compaction/update operation. We can append this id into the log if nee

Re: [Discussion] Update feature enhancement

2020-09-04 Thread akashrn5
Hi David, 1. Yeah i already told that it will come in to picture in delete case, as update is (delete + insert). 2. yes, we will be loading the single merge file into cache, which can be little bit better compared to existing one. 3. I didnt get the complete ans actually, when exactly you plan to