Re: [DISCUSSION] Support JOIN query with spatial index

2021-05-17 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [ANNOUNCE] Akash R Nilugal as new PMC for Apache CarbonData

2021-04-13 Thread David CaiQiang
Congratulations Akash - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-26 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 2.1.1(RC1) release

2021-03-18 Thread David CaiQiang
-1, please fix the pending defect and merge the completed PR at first. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Taking the inputs for Segment Interface Refactoring

2021-02-18 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Improve carbondata CDC performance

2021-02-18 Thread David CaiQiang
+1, you can finish the implementation. How about using the following SQL instead of the cartesian join? SELECT df.filePath FROM targetTableBlocks df where exists (select 1 from srcTable where srcTable.value between df.min and df.max) - Best Regards David Cai -- Sent from:

Re: Improve carbondata CDC performance

2021-02-18 Thread David CaiQiang
I mean you can push your logic into CarbonDataSourceScan as a dynamic runtime filter. Actually, CarbonDataSourceScan already used min/max zoom maps as an index filter to prune blocklist (in the CarbonScanRDD.getPartition method). We can do more things on the join query. Here I assume the source

Re: Improve carbondata CDC performance

2021-02-17 Thread David CaiQiang
Hi Akash, You can enhance the runtime filter to improve the join performance. It has the rule to dynamically check whether the join can add the runtime filter or not. Better to push down the runtime filter into CarbonDataSourceScan, and better to avoid adding a UDF function to

Re: [Discussion]Presto Queries leveraging Secondary Index

2021-01-17 Thread David CaiQiang
hi Venu and Ajantha, For the new SI solution, I have some suggestions also. 1. agree to avoid query plan rewrite 2. push down the SI filter to the pruning step of the main table directly on the driver side, but we need a distributed job to improve performance 3. segment level usability for

Re: [DISCUSSION] Display the segment ID when carbondata load is successful

2021-01-17 Thread David CaiQiang
Hi Nihal, my suggestion as following, 1. contain the normal output of the show segment command 2. add more information for loading, like numFiles, numRows, rawDataSize (maybe show segment need also, take care of CDC which needs to update this information) - Best Regards David Cai -- Sent

Re: [DISCUSSION] Geo spatial index algorithm improvement and UDFs enhancement

2020-12-25 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Upgrade presto-sql to 333 version

2020-12-25 Thread David CaiQiang
+1 Are there other impacts? - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION]Improve Simple updates and delete performance in carbondata

2020-12-08 Thread David CaiQiang
Hi akash, for simple updates and delete scenario, you can try to do it. During update/delete, 1) for updated/deleted segment, no need to update segmentMetadataInfo. 2) for new inserted segment, you can summary blocklet level index to segment level index, reading

Re: [DISCUSSION]Improve Simple updates and delete performance in carbondata

2020-11-27 Thread David CaiQiang
hi Akash, for the simple update case, can you do a test to confirm your inference after a fast change? - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Size control of minor compaction

2020-11-23 Thread David CaiQiang
+1 It will task many resources and a long time to compact a large segment, and may not get a good result. Auto compaction is disabled, we could give a large default value(maybe 1024GB), it will not impact the behavior by default. And the table level threshold is needed also. If the user wants

Re: [ANNOUNCE] Ajantha as new PMC for Apache CarbonData

2020-11-23 Thread David CaiQiang
Congratulations to Ajantha. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION]Merge index property and operations improvement.

2020-11-23 Thread David CaiQiang
a) remove mergeIndex property and event listener, add mergeIndex as a part of loading/compaction transaction. b) if the merging index failed, loading/compaction should fail directly. c) keep merge_index command and mark it deprecated. for a new table, maybe it will do nothing.

Re: [Discussion] About carbon.si.segment.merge feature

2020-11-08 Thread David CaiQiang
hi Ajantha, Agree to remove "carbon.si.segment.merge" 1. dynamic decide the number for the loading tasks Before loading the SI segment, it is easy to estimate the total size of this SI segment. So better to dynamic decide the number for the loading tasks to avoid small carbon files in

Re: [Discussion] Partition Optimization

2020-11-04 Thread David CaiQiang
Agree with Vishal, better to test and confirm the difference. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Update feature enhancement

2020-11-04 Thread David CaiQiang
PR#3999 already implemented this enhancement, please know. PR URL: https://github.com/apache/carbondata/pull/3999 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Taking the inputs for Segment Interface Refactoring

2020-10-18 Thread David CaiQiang
I list feature list about segment as follows before starting to re-factory segment interface. [table related] 1. get lock for table lock for tablestatus lock for updatedTablestatus 2. get lastModifiedTime of table [segment related] 1. segment datasource datasource: file format,other

Re: [Discussion] Segment management enhance

2020-10-09 Thread David CaiQiang
Hi Ramana, I agree with you. When writing segment file, the system use listFiles to collect all index files. In some case, it will add stale index files into segment file. We will try to fix it at first. - Best Regards David Cai -- Sent from:

Re: [discuss]CarbonData update operation enhance

2020-09-22 Thread David CaiQiang
hi Linwood, 1. better to implement "Update feature enhancement" at first, it will create a new segment to store new files. http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Update-feature-enhancement-td99769.html 2. clean deletedelta files now carbon need

Re: Clean files enhancement

2020-09-18 Thread David CaiQiang
agree with Ravindra, 1. stop all automatic clean data in load/insert/compact/update/delete... 2. when clean files command clean in-progress or uncertain data, we can move them to data trash. it can prevent delete useful data by mistake, we already find this issue in some scenes. other

Re: Clean files enhancement

2020-09-15 Thread David CaiQiang
1. cleaning the in_progressing segment is very dangerous, please remove this part from code. After the user explicitly uses clean file command with an option "clean_in_progressing"="true", we check segment lock to clean segment. 2. if the status of a semgent is mark_for_delete/compacted, we can

Re: Carbon merge should support update random columns each row

2020-09-10 Thread David CaiQiang
+1 maybe we need to use a delta file to store updated values instead of the deletedelta file - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Update feature enhancement

2020-09-04 Thread David CaiQiang
Hi Akash, 3. Update operation contain a insert operation. Update operation will do the same thing how the insert operation process this issue. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Segment management enhance

2020-09-04 Thread David CaiQiang
Hi Kunal, 1. The user uses SQL API or other interfaces. This UUID is a transaction id, and we already stored the timestamp and other informations in the segment metadata. This transaction id can be used in the loading/compaction/update operation. We can append this id into the log if

Re: [Discussion] Update feature enhancement

2020-09-04 Thread David CaiQiang
Hi Akash, 1. the update operation still has "deletdelta" files, it keeps the same with previous. horizontal compaction is still needed. 2. loading one carbonindexmerge file will fast, and not impact the query performance. (customer has faced this issue) 3. for insert/loading, it can trigger

[Discussion] Segment management enhance

2020-09-03 Thread David CaiQiang
[Background] 1. In some scenes, two loading/compaction jobs maybe write data to the same segment, it will result in some data confusion and impact some features which will not work fine again. 2. Loading/compaction/update/delete operations need to clean stale data before execution. Cleaning

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-03 Thread David CaiQiang
Hi Akash 2. new tablestsatus, only store the lastest status file name, not all status files. status file will store all segment metadata (just like old tablestatus) 3. if we have delta file, no need to read status file for each query. only reading delta file is enough if status file not

[Discussion] Update feature enhancement

2020-09-02 Thread David CaiQiang
[Background] Now update feature insert the updated rows into the old segments where the data are updated. In the end, it needs to reload the indexes of related segments. [Movitation] If there are many updated segments, it will take a long time to reload the indexes again. So I suggest writing

Re: [Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-01 Thread David CaiQiang
add solution 4 to separate the status file by segment status *solution 4:* Based on solution 2, support status.inprogress 1) new tablestatus file format { "statusFileName":"status-uuid1", "inProgressStatusFileName": "status-uuid2.inprogess",

[Discussion] Improve the reading/writing performance on the big tablestatus file

2020-09-01 Thread David CaiQiang
[Background] Now the size of one segment metadata entry is about 200 bytes in the tablestatus file. if the table has 1 million segments and the mean size of segments is 1GB(means the table size is 1PB), the size of the tablestatus file will reach 200MB. Any reading/writing operation on this

Re: [Discussion]Query Regarding Task launch mechanism for data load operations

2020-08-14 Thread David CaiQiang
This mechanism will work fine for LOCAL_SORT loading of big data and the small cluster with big executor. If it doesn't match these conditions, better consider a new solution to adapter the generic scenario. I suggest re-factoring NO_SORT, maybe we can check and improve the global_sort solution.

Re: [Disscuss] The precise of timestamp is limited to millisecond in carbondata, which is incompatiable with DB

2020-07-30 Thread David CaiQiang
agree with Ravi - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Implement delete and update feature in carbondata SDK.

2020-07-30 Thread David CaiQiang
+1 Can we add a commit method to support multiple operations at once? CarbonSDKUID .delete(...) .delete(...) .update(...) .commit - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Support the LIMIT operator for show segments command

2020-07-30 Thread David CaiQiang
+1 for solution 1 but the limit statment will get the head or tail of segment list? or need order by some columns? Please describe the details. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION]Remove the call to update the serde properties in case of alter scenarios

2020-07-30 Thread David CaiQiang
+1 for removing it. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] SI support Complex Array Type

2020-07-30 Thread David CaiQiang
+1 for solution2 Can we support more than one array_contains by using SI join (like SI on primitive data type)? - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread David CaiQiang
update reply: The merging index should be a part of loading. It is not good to extract the merging index to an independent process, it brought the query issue (the system can't find the index files when/after merging). In my opinion, during loading, new .carbonindex files should be temporary,

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread David CaiQiang
The merging index should be a part of loading. It is not good to extract the merging index to an independent process, it brought the query issue (the system can't find the index files when/after merging). In my opinion, during loading, new .carbonindex files should be temporary, we should

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread David CaiQiang
Better to always merge index. -1 for 1, +1 for 2, - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 2.0.1(RC1) release

2020-06-01 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] About global sort in 2.0.0

2020-05-31 Thread David CaiQiang
+1 and agree with Kunal - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-18 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-18 Thread David CaiQiang
hi, Kunal another question about point 4: *4. A staged Maven repository is available for review at:* https://repository.apache.org/content/repositories/orgapachecarbondata-1062/ please check this URL, it doesn't contain javadoc jar.

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-17 Thread David CaiQiang
I have a doubt about point 3: *3. The artifacts to be voted on are located here:* https://dist.apache.org/repos/dist/dev/carbondata/2.0.0-rc3/ why need to release two same source packages? apache-carbondata-2.0.0-spark2.3-source-release.zip apache-carbondata-2.0.0-spark2.4-source-release.zip

Re: [Disscussion] Support GloabalSort in the CDC Flow

2020-05-12 Thread David CaiQiang
In my opinion, this is an issue if it can't work. Better to change the topic title to use ‘question'/'issue' instead of 'discussion'. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Disscussion] Remove 'Create Stream'

2020-05-12 Thread David CaiQiang
How about mark the stream SQL as experimental? Now in some cases, it is an easy way for the user to understand the streaming table. We can improve it in the future. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Dissussion] Support FLOAT datatype in the CDC Flow

2020-05-10 Thread David CaiQiang
please check another topic: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Float-and-Double-compatibility-issue-with-external-segments-to-Carbon-td93870.html. if this is an issue, you can create an issue in carbondata jira. - Best Regards David Cai --

Re: [Discussion]Float and Double compatibility issue with external segments to Carbon

2020-05-07 Thread David CaiQiang
It is a historical legacy issue and easy to reuse the solution of the double data type. Suggest implementing the float data type independently. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Disable Adaptive encoding for Double and Float by default

2020-05-07 Thread David CaiQiang
I agree with Ravindra and I can try to fix it (I have mentioned in a PR review comment). - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [ANNOUNCE] Kunal Kapoor as new PMC for Apache CarbonData

2020-05-07 Thread David CaiQiang
Congratulations Kunal - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Carbon over-use cluster resources

2020-05-07 Thread David CaiQiang
Hi, manhua Now no_sort reuse the loading flow of local_sort. It is not a good solution and led to the situation which you have mentioned. In my opinion, we need to adjust the loading flow of no_sort, maybe like global_sort finally. In addition, the producer-consumer pattern in data encoding

Re: [VOTE] Apache CarbonData 2.0.0(RC2) release

2020-05-04 Thread David CaiQiang
-1 for me, based on the below points. 1. We need to update quick-start-guide.md for Carbon 2.0. For example, Carbon 2.0 supports the multi-tenant scenario, the carbon property "carbon.storelocation" should be deprecated. only the user who used "carbon.storelocation" in the previous version can

Re: [Discussion] Support SegmentLevel MinMax for better Pruning and less driver memory usage

2020-02-12 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Multi-tenant support by refactoring datamaps

2020-02-12 Thread David CaiQiang
+1 please take care of the performance changes during refactoring datamaps - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Support Secondary Index on Carbon Table

2020-02-06 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Propose feature change in CarbonData 2.0

2019-11-28 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: How to define constraints and indexes in carbondata while creating a table.

2019-06-27 Thread David CaiQiang
doc: https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#sort-columns-configuration testcase:

Re: How to define constraints and indexes in carbondata while creating a table.

2019-06-25 Thread David CaiQiang
As so far, Carbondata don't support primary keys, foreign keys, NOT NULL, etc. Table creation can use SORT_COLUMNS to create the main index, but the secondary index doesn't be supported. - Best Regards David Cai -- Sent from:

Re: Why metadata path didn't show up on my local disk

2019-04-24 Thread David CaiQiang
Maybe it used javax.jdo.option.ConnectionURL configuration. When hive,hadoop and spark don't set this configuration, it will use the parameter of getOrCreateCarbonSession. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] is it necessary to support SORT_COLUMNS modification

2019-04-09 Thread David CaiQiang
please check JIRA and find the design doc: https://issues.apache.org/jira/browse/CARBONDATA-3347 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Support Compaction for Range Sort

2019-04-08 Thread David CaiQiang
How will it compact Seg_0 and Seg_1 in the new compaction? For example: Seg_0 has 3 ranges (0-100), (100-200), (200-300) and Seg_1 has 2 ranges (50-150) and (250-300); - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 1.5.3(RC1) release

2019-04-07 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: 答复: spark streaming insert data error

2019-03-20 Thread David CaiQiang
There are all documents(include streaming table) under the following link. https://github.com/apache/carbondata/tree/master/docs You can find all examples in examples/spark2 module: example 1 (support Update/Delete)

Re: spark streaming insert data error

2019-03-20 Thread David CaiQiang
You can get table schema by CarbonTable.getCreateOrderColumn method. It will return the correct table schema. "name,city,id,salary" is the order of column storage, it is not the table schema. - Best Regards David Cai -- Sent from:

[Discussion] is it necessary to support SORT_COLUMNS modification

2019-03-13 Thread David CaiQiang
Hi all, Let's discuss whether it is necessary to support SORT_COLUMNS modification. *Background* "SORT_COLUMNS" is a table level property, and we can't change it after creating a table. *Motivation* When we want to optimize the query performance and found that it needs to

Re: Injection of custom rules in session

2019-03-07 Thread David CaiQiang
>From spark 2.2, Spark can inject extensions. for example: val spark = SparkSession .builder() ... .withExtensions(...) ... CarbonSession.CarbonBuilder uses default extensions to create CarbonSession. It doesn't inject parser, analyzer and so on. And the extensions variable is private

[Discussion] How to pass some options into Insert Into command

2019-02-19 Thread David CaiQiang
Hi all, For data loading, we can pass some options into load data command by using options clause, but insert into command can't. How to pass some options into Insert Into command? some options as following. 1. implement options clause for insert into command 2. use hint

Re: [Discussion] DDLs to operate on CarbonLRUCache

2019-02-19 Thread David CaiQiang
+1 for 5,6, after point 5 estimated the cache size, the point 6 can modify the configuration dynamically. +1 for 3,4: maybe need to add a lock to sync the concurrent operations. If it wants to release cache, it will not need to restart the driver Maybe we also need to check how to use these

Re: [VOTE] Apache CarbonData 1.5.2(RC2) release

2019-02-01 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: 【Discuss】load data cause GC overhead limit exceeded

2019-01-28 Thread David CaiQiang
maybe we can validate this property and limit it to less than the total memory size (or 60%...) of the driver side - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-06 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [discussion] Open check code style of example module

2018-12-24 Thread David CaiQiang
In some examples, we need to print some info to the console. So we need to skip some code styles. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [carbondata-presto enhancements] support reading stream segment in presto

2018-12-18 Thread David CaiQiang
I will try to implement it. PR link: https://github.com/apache/carbondata/pull/3001 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-16 Thread David CaiQiang
Better to support alter 'sort_columns' and 'sort_scope' also. After the table creation and data loading, the user can adjust 'sort_columns' and 'sort_scope'. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Proposal] Thoughts on general guidelines to follow in Apache CarbonData community

2018-11-18 Thread David CaiQiang
+1 for 1,2,3,4,5,8,9,10 +0 for 6,7 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Throw NullPointerException occasionally when query from stream table

2018-11-06 Thread David CaiQiang
Where do we call SegmentPropertiesAndSchemaHolder.invalidate in handoff thread? - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Remove BTree related code

2018-08-23 Thread David CaiQiang
+0 for 1. delete 11 files Better to add Start/End keys to DataMapRow also. In my opinion, the union of Min/Max values and Start/End keys can work better. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[DISCUSSION] Implement file-level Min/Max index for streaming segment

2018-08-23 Thread David CaiQiang
Hi All, Currently, the filter queries on the streaming table always scan all streaming files, even though there are no data in streaming files that meet the filter conditions. So I try to support file-level min/max index on streaming segment. It helps to reduce the task number and improve

Re: CarbonStore Java & REST API proposal

2018-07-04 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: S3 support

2018-06-22 Thread David CaiQiang
Hi Kunal, I have some questions. *Problem(Locking):* Does the memory lock support that the multiple drivers concurrently load data to the same table? maybe it should note this limitation. *Problem(Write with append mode):* 1. atomicity After the overwrite operation failed, maybe the

Re: Use RowStreamParserImp as default value of config 'carbon.stream.parser'

2018-06-08 Thread David CaiQiang
+1, I agree with using RowStreamParserImpl by default. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Getting Different Encoding in timestamp and date datatype.

2018-03-22 Thread David CaiQiang
The direct dictionary ignores the millisecond of the timestamp data. If the millisecond is needless, the direct dictionary uses the integer to improve compression. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Getting Different Encoding in timestamp and date datatype.

2018-03-21 Thread David CaiQiang
Hi Jatin, Timestamp column is non-dictionary by default. After adding the Timestamp column to the table property 'dictionary_include', it will have the same encoding list. - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 1.3.1(RC1) release

2018-03-05 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion] Implement Lucene DataMap to support full text search

2018-02-09 Thread David CaiQiang
It will be an independent module. The layout maybe like this: carbondata |_ datamap |__ lucene - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Should CarbonData need to integrate with Spark Streaming too?

2018-01-17 Thread David CaiQiang
+1 for 2). The same as integration with Structured Streaming - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Initiating Apache CarbonData-1.3.0 Release

2017-12-25 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Problem with with writing the loadStartTime in "dd-MM-yyyy HH:mm:ss:SSS" format

2017-12-19 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Refactory on spark related modules

2017-12-05 Thread David CaiQiang
+1 - Best Regards David Cai -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [VOTE] Apache CarbonData 1.2.0(RC3) release

2017-09-24 Thread David CaiQiang
+1 Release this package as Apache CarbonData 1.2.0 1. Release There are important new features and the integration of new platform 2. The tag " mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format package" passed "mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format install" passed 3.

Re: [DISCUSSION] Update the function of show segments

2017-09-20 Thread David CaiQiang
I agree with Jacky. I think enhanced segment metadata will help us to understand the table. I suggest the following properties for segment metadata: 1. total data file size 2. total index file size 3. data file count 4. index file count 5. last modified time (last update time) Through these

Re: [DISCUSSION] About data backward compatibility

2017-08-14 Thread David CaiQiang
I agree with Ravindra, now is the time to implement migration tool. - Best Regards David Cai -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20219.html Sent from the Apache

Re: [DISCUSSION] Interfaces for index frame work

2017-08-14 Thread David CaiQiang
+1 - Best Regards David Cai -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Interfaces-for-index-frame-work-tp13274p20218.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

Re: problem with branch-1.1

2017-06-26 Thread David CaiQiang
The spark core version of hdp2.6.0-spark2.1.0 is spark 2.1.1. In spark 2.1.1, CatalystConf was already removed. We raised PR to support it and will merge it at later. https://github.com/apache/carbondata/pull/1096 https://github.com/apache/carbondata/pull/1017 And the command will be "mvn

Re: [VOTE] Presto integration version :Re: [DISCUSSION] Whether Carbondata should work with Presto in the next release version(1.2.0)

2017-06-13 Thread David CaiQiang
+1 for supporting presto integration. - Best Regards David Cai -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/VOTE-Presto-integration-version-Re-DISCUSSION-Whether-Carbondata-should-work-with-Presto-in-the-next-tp14906p14907.html

Re: About ColumnGroup feature

2017-06-12 Thread David CaiQiang
+1 for A As I known, so far ColumnGroup feature can't improve performance very well, it became a useless feature nearly. If necessary, we need redesign this feature to keep code clean and tune it well to improve performance. - Best Regards David Cai -- View this message in context:

  1   2   >