Re: Propose feature change in CarbonData 2.0

2019-12-03 Thread ravipesala
Hi, Thank you for proposing. Please check my comments below. 1.Global dictionary: It was one of the prime features when it was initially released to apache. Even though spark has introduced tungsten still it has its benefits like compression, filtering and aggregation queries. But after the

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-26 Thread ravipesala
Hi Manhua, Even at page level the row count will not be available probably from the next version. It would be decided as per size, not per count. Already code got merged and we are keeping the count based page configuration temporarily for backward compatibility. So at any place, we will not

Re: [DISCUSSION] Page Level Bloom Filter

2019-11-25 Thread ravipesala
Hi Manhua, Main problem with this approach is we cannot save any IO as our IO unit is blocklet not page. Once it is already to memory I really don’t think we can get performance with bloom at page level. I feel the solution would be efficient only the IO is saved somewhere. Our min/max index is

[ANNOUNCE] Apache CarbonData 1.5.4 release

2019-06-10 Thread ravipesala
Hi All, Apache CarbonData community is pleased to announce the release of the Version 1.5.4 in The Apache Software Foundation (ASF). Apache CarbonData community is pleased to announce the release of the Version 1.5.4 in The Apache Software Foundation (ASF). CarbonData is a high-performance

Re: [Discussion] Migrate CarbonData to support PrestoSQL

2019-05-08 Thread ravipesala
Hi, It is better to move to PrestoSQL as the community is more active compared to PrestoDB. But we should consider the current users of PrestoDB as well. Maintaining two modules in carbondata is not a viable solution as the maintenance becomes diffcult. I feel we can wait for one more version

RE: Re:[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

2019-02-19 Thread ravipesala
Hi Akash, There is a difference between index datamap (like bloom) and olap datamaps (like MV). Index datamaps used only for pruning the data while olap datamaps will be used as pre-computed data which can be fetched directly as per query. In OLAP datamap case lazy build or deferred build makes

Re: [Discussion] Refactor dynamic configuration

2019-01-06 Thread ravipesala
Hi, Please check my views on it. Basic design should be there is a clear separation between modules. Like Spark based configurations are no means to presto, So every module can have there owned conf and constant classes. 1. No need for CarbonProperties and CarbonCommonConstants. 2. Should have

[Discussion] Supporting Hive Metastore in Presto CarbonData.

2018-12-17 Thread ravipesala
Hi, Current Carbon Presto integration added a new presto connector that takes the carbon store folder and lists the databases and tables from the folders. In this implementation, we have many issues like. 1. DB and table always need to be in specific order and name of the folders should always

Re: [Discussion] Make 'no_sort' as default sort_scope and keep sort_columns as 'empty' by default

2018-12-17 Thread ravipesala
Hi, +1 for making 'no_sort' as default sort_scope 1. Regarding removing empty SORT_COLUMNS option, I don't think we change the current behaviour as already some users might be using it in their script, so if we remove empty SORT_COLUMNS option then their scripts start failing after upgrade. It

Re: [carbondata-presto enhancements] support reading pre-aggregate table in presto

2018-12-14 Thread ravipesala
Hi, Yes, we have a plan. But it will take some time to bring this to presto integration as we need to first bring up and stabilize the MV module and also need to analyze how to update the query plan to use pre-agg tables in presto . Regards, Ravindra. -- Sent from:

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread ravipesala
Hi Jacky, In spark integration we have two approaches one with very deep integration and one with shallow integration using the sparks fileformat. One with deep integration we use the datasource name as carbondata, this name also registered to java services so anything which comes with this

Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread ravipesala
+1 Yes Jacky, he is not going add any new plugin. Depending on the folder structure and table status he considers whether it is transactional or non-transactional inside the same plugin. PR https://github.com/apache/carbondata/pull/2982/ already raised for it. Regards, Ravindra. -- Sent

Re: [DISCUSS] Support transactional table in SDK

2018-12-07 Thread ravipesala
Hi Jacky, Its a good idea to support writing transactional table from SDK. But we need to add following limitations as well 1. It can work on file systems which can take append lock like HDFS. 2. Compaction, delete segment cannot be done on online segments till it is converted to the

Re: SDK support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE

2018-12-06 Thread ravipesala
I agree with @kumarvishal , better not add more options as it confuses the user. We better fallback automatically depends on the size of the dictionary. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Alter table column rename feature

2018-12-06 Thread ravipesala
+1 Please make sure the DDL is consistent with HIve, no need to add any new DDL. Regards, Ravindra -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

RE: [DISCUSSION] Support DataLoad using Json for CarbonSession

2018-12-06 Thread ravipesala
+1 for JSON loading from CarbonSession LOAD command. @xuchuanyin There is a reason why we are not completely depending on Spark datasource for loading data. We have specific feature called badrecord handling, if we load data directly from spark I don't think we can get the bad records present in

Re: field search

2018-09-21 Thread ravipesala
Hi, SInce carbondata is columnar we can filter on individual columns and use the output of that column to filter the remaining columns. This what we called bitset pipelining. So in your case, if column A got row 1 after filter then it uses only that row 1 to filter the remaining columns. If my

Re: CarbonWriterBuild issue

2018-09-21 Thread ravipesala
+1 I feel it is better to remove the transactional flag from sdk API as it is redundant currently. we better support it in a better way in future. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Low Performance of full scan.

2018-09-20 Thread ravipesala
Hi, Thanks for testing the performance. We have also observed this performance difference and working on to improve the same. Please check my latest discussion (http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/CarbonData-Performance-Optimization-td62950.html) to improve

Re: Feature Proposal: CarbonCli tool

2018-09-05 Thread ravipesala
Hi , I have following doubts and suggestions for this tool. 1. To which module you are planning to keep this tool. Ideally, it should be under tools folder and going forward we can add more tools like this under it. 2. Which file schema are you printing? are you randomly choosing the file to

Re: [VOTE] Apache CarbonData 1.4.1(RC2) release

2018-08-14 Thread ravipesala
Hi all PMC vote has passed for Apache Carbondata 1.4.1 release, the result as below: +1(binding): 3(Liang Chen, Kumar Vishal, Ravindra) +1(non-binding) : 5 Thanks all for your vote. Regards, Ravindra -- Sent from:

Re: [Discussion] Refactor Segment Management Interface.

2018-08-14 Thread ravipesala
Hi, I have fixed review comments and updated the design document. Please check the V2 version of document in the jira. https://issues.apache.org/jira/browse/CARBONDATA-2827 Regards, Ravi -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Can we create pre-aggregation for non-carbon format tables?

2018-08-06 Thread ravipesala
Hi, In the current pr-aggregate design we cannot create on non carbon format tables. But we have another module called MV, there we will add the functionality to allow other format tables also as Materialized Views(MV) . But it may take some more time to get stabilized this feature. Ravindra

Re: Questions about rebuilding datamap

2018-08-06 Thread ravipesala
Hi, REBUILD DATAMAP is implemented only for the full refresh, not done for incremental data loading, that's why it tries to refresh all the segments irrespective of it is already built or not. We are planning for the incremental rebuilding of datamap in the next version. I feel we can block

Re: Refactored SegmentPropertiesFetcher to resolve pruning problem post the carbon schema restructure.

2018-05-04 Thread ravipesala
Yes Shahid, you are right. During Update scenario there is a chance of creating new data within the same segment. And it will lead to wrong data if the schema is different. It is not same case for delete as we don't have any schema for it. I feel we always better create a new segment even for

Re: Grammar about supporting string longer than 32000 characters

2018-05-04 Thread ravipesala
In case of dataframe we can take the varchar(max) as default. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Grammar about supporting string longer than 32000 characters

2018-05-02 Thread ravipesala
Hi, I agree with option 2 but not new datatype use varchar(size). There are more optimizations we can do with varchar(size) datatype like 1. if the size is smaller (less than 8 bytes) then we can write in fixed length encoder instead of LV encode it can save a lot of space and memory. 2. If the

Re: insert carbondata table failed

2017-09-18 Thread ravipesala
Hello, I don't get much from the logs but the error seems related to memory issue from Spark. From your old emails I get that you are using 3 node cluster. Is that all 3 node has nodemanager and datanodes? So better give only less number of executors and provide more memory to it like below.