Hi David,
a) Compressing table status is good. But need to check the decompression
overhead and how much overall benefit we can get.
b) I suggest we can keep multiple 10MB files (or configurable), then read
it distributed way.
c) Once read all the table status files better to cache them at driver
Hi David,
a) Recently we tested huge concurrent load and compactions but never faced
two loads using same segment id issue (because of table status lock in
recordNewLoadMetadata), so I am not sure whether we really need to update
to UUID.
b) And about other segment interfaces, we have to refactor
Hi David. Thanks for proposing this.
*+1 from my side.*
I have seen users with 200K segments table stored in cloud.
It will be really slow to reload all the segments where update happened for
indexes like SI, min-max, MV.
So, it is good to write as a new segment
and just load new segment indexes
Hi Akash,
3. Update operation contain a insert operation. Update operation will
do the same thing how the insert operation process this issue.
-
Best Regards
David Cai
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Hi David,
Then better we keep a mapping for the segment UUID to virtual segment
number in the table status file as well,
Any API through which the user can get the segment details should return
the virtual segment id instead of the UUID.
On Fri, Sep 4, 2020 at 12:59 PM David CaiQiang wrote:
> Hi
Hi Kunal,
1. The user uses SQL API or other interfaces. This UUID is a transaction
id, and we already stored the timestamp and other informations in the
segment metadata.
This transaction id can be used in the loading/compaction/update
operation. We can append this id into the log if nee
Hi David,
1. Yeah i already told that it will come in to picture in delete case, as
update is (delete + insert).
2. yes, we will be loading the single merge file into cache, which can be
little bit better compared to existing one.
3. I didnt get the complete ans actually, when exactly you plan to