Re: [VOTE] Apache CarbonData 2.3.0(RC2) release

2022-01-20 Thread Ravindra Pesala
+1 Regards, Ravindra. On Wed, 19 Jan 2022 at 23:57, Kunal Kapoor wrote: > Hi All, > I submit the Apache CarbonData 2.3.0(RC2) for your vote. > > > *1.Release Notes:* > >

Re: [DISCUSSION]Carbondata Streamer tool and Schema change capture in CDC merge

2021-09-01 Thread Ravindra Pesala
+1 I want to understand few clarifications regarding the design. 1. Generally CDC includes IUD operations, so how are you planning to handle them? Are you planning to merge command? If yes how frequent you want to merge it? 2. How you can make sure the Kafka exactly once semantics( how can you

Re: [VOTE] Apache CarbonData 2.2.0(RC2) release

2021-08-04 Thread Ravindra Pesala
+1 Regards, Ravindra On Wed, 4 Aug 2021 at 7:23 PM, Liang Chen wrote: > +1 > > Regards > Liang > > Ajantha Bhat 于2021年8月3日周二 下午1:07写道: > > > +1 > > > > Regards, > > Ajantha > > > > On Mon, Aug 2, 2021 at 9:03 PM Venkata Gollamudi > > wrote: > > > > > +1 > > > > > > Regards, > > > Venkata

Re: [VOTE] Apache CarbonData 2.2.0(RC1) release

2021-07-08 Thread Ravindra Pesala
-1 I suggest PR 4148 to be merged before release. Regards, Ravindra. On Thu, 8 Jul 2021 at 5:04 PM, Jacky Li wrote: > -1, > > I suggest following PR to be merged before release > #4148 > #4157 > #4158 > #4162 > > Regards, > Jacky Li > > > > 2021年7月6日 下午3:14,Akash Nilugal 写道: > > > > Hi All,

Re: [Design Discussion] Transaction manager, time travel and segment interface refactoring

2021-04-28 Thread Ravindra Pesala
+1 Much needed feature and interface refactoring. Thanks for working on it. Regards, Ravindra. On Thu, 22 Apr 2021 at 2:36 PM, Ajantha Bhat wrote: > Hi All, > In this thread, I am continuing the below discussion along with the > Transaction Manager and Time Travel feature design. > >

Re: [VOTE] Apache CarbonData 2.1.1(RC2) release

2021-03-29 Thread Ravindra Pesala
+1 Regards, Ravindra. On Fri, 26 Mar 2021 at 11:02 PM, Indhumathi wrote: > +1 > > Regards > Indhumathi M > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > -- Thanks & Regards, Ravi

Re: Improve carbondata CDC performance

2021-03-11 Thread Ravindra Pesala
+1 Instead of doing the cartesian join, we can broadcast the sorted min/max with file paths and do the binary search inside the map function. Thank you On Wed, 24 Feb 2021 at 13:02, akashrn5 wrote: > Hi Venu, > > Thanks for your review. > > I have replied the same in the document. > you are

Re: [DISCUSSION]Improve Simple updates and delete performance in carbondata

2020-12-10 Thread Ravindra Pesala
+1 I am looking forward to this feature as most of the update/delete operations are simple and it can simplify and improve the performance as well. Thank you. On Thu, 19 Nov 2020 at 19:41, Akash Nilugal wrote: > Hi Community, > > Carbondata supports update and delete using spark. So basically

Re: [VOTE] Apache CarbonData 2.1.0(RC2) release

2020-11-04 Thread Ravindra Pesala
+1 On Wed, 4 Nov 2020 at 12:17 PM, Kunal Kapoor wrote: > Hi All, > > I submit the Apache CarbonData 2.1.0(RC2) for your vote. > > > *1.Release Notes:* > >

Re: [ANN] Indhumathi as new Apache CarbonData committer

2020-10-07 Thread Ravindra Pesala
Congrats Indumathi ! On Wed, 7 Oct 2020 at 10:29 AM, manish gupta wrote: > Congratulations Indumathi > > Regards > Manish Gupta > > On Wed, 7 Oct 2020 at 10:23 AM, brijoobopanna > wrote: > > > Congrats Indhumathi, best of luck for your new role in the community > > > > > > > > -- > > Sent

Re: Clean files enhancement

2020-09-24 Thread Ravindra Pesala
Hi Vikram, +1 It is good to remove the automatic cleanup. But I am still worried about the clean file command executed by user as well. We need to enhance the clean file command to introduce dry run to print what segments it is going to be deleted and what is left. If user ok with dry run

Re: Clean files enhancement

2020-09-17 Thread Ravindra Pesala
-1 I don’t see any reason why we should use trash. How does it change the behaviour. 1. Are you still going with automatic clean up? If yes then you are adding extra time to move the data to trash(for S3 file system). 2. Even if you move the data and keep the time to live as 3 days in trash, what

Re: Clean files enhancement

2020-09-15 Thread Ravindra Pesala
+1 with Vishal proposal. It is not safe to clean the automatically with out ensuring the data integrity. Let’s enhance the clean command to do sanity check before removing it. It should be the administrative work to delete the data, not the framework automatic feature. User can call when he needs

Re: [DISCUSSION] Parallel compaction and update

2020-09-14 Thread Ravindra Pesala
Hi Nihal, I appreciate the design but I don’t want to implement features with out proper segment interfacing in place. With out segment refactoring if you try to implement this type of features will make the code more dirty. Once we bring the proper segment interfacing and transaction

Re: [Discussion] Update feature enhancement

2020-09-14 Thread Ravindra Pesala
+1 Already partition loading uses the new segment to write the update delta data. It is better to make consistent across all. Creating new segment simplifies the design. On Mon, 14 Sep 2020 at 1:48 AM, Venkata Gollamudi wrote: > Hi David, > > +1 > > > > Initially when segments concept is

Re: [Discussion] SI support Complex Array Type

2020-08-02 Thread Ravindra Pesala
Hi All, +1 for solution 2. But don't store rowid as it makes the storage very big and it gives a very slow performance. Let's go with the current model of SI which stores till blocklet level. Don't make things complicated by storing rowid. Solution 1 makes the scan slower as it needs to construct

Re: [Disscuss] The precise of timestamp is limited to millisecond in carbondata, which is incompatiable with DB

2020-07-15 Thread Ravindra Pesala
Hi, I think it is bigger than just changing to DateTimeFormatter. As of now, carbon uses only 64 bit to store timestamp so it can accommodate till milliseconds. In order to support till nanoseconds, we need to use 96 bits. If you check spark-parquet it uses 96 bits to store timestamp. It would

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

2020-07-09 Thread Ravindra Pesala
Hi, +1 I agree with Vishal, let's deprecate the configuration and keep it as internal. Regards, Ravindra. On Fri, 10 Jul 2020 at 01:54, Ajantha Bhat wrote: > Hi, > I didn't reply to deprecation. *+1 for deprecating it*. > > *And +1 for issue fix also.* > Issue fix, I didn't mean when

Re: [VOTE] Apache CarbonData 2.0.0(RC3) release

2020-05-17 Thread Ravindra Pesala
+1 Regards, Ravindra. On Sun, 17 May 2020 at 9:24 PM, Ajantha Bhat wrote: > +1 > > Regards, > Ajantha > > > > On Sun, 17 May, 2020, 6:41 pm Jacky Li, wrote: > > > +1 > > > > Regards, > > Jacky > > > > > > > 2020年5月17日 下午4:50,Kunal Kapoor 写道: > > > > > > Hi All, > > > > > > I submit the

Re: [Dissussion] Support FLOAT datatype in the CDC Flow

2020-05-11 Thread Ravindra Pesala
Hi, CDC can support all primitive data types. If it is failing in particular scenario please raise a jira with a proper test case to reproduce the problem. Thank you Regards, Ravindra. On Mon, 11 May 2020 at 11:35 AM, haomarch wrote: > We don't support FLOAT datatype in the CDC Flow. This is

Re: Disable Adaptive encoding for Double and Float by default

2020-03-25 Thread Ravindra Pesala
ding that decimal points from every float and double > value [*PrimitivePageStatsCollector.getDecimalCount(double)*] * > *where we convert to string and use substring().* > > so I want to disable adaptive encoding for double and float by default. > > Thanks, > Ajantha > > On Wed, Mar 25, 2020 at 11:37 AM

Re: Disable Adaptive encoding for Double and Float by default

2020-03-25 Thread Ravindra Pesala
Hi , It increases the store size. Can you give me performance figures with and without these changes. And also provide how much store size impact if we disable it. Regards, Ravindra. On Wed, 25 Mar 2020 at 1:51 PM, Ajantha Bhat wrote: > Hi all, > > I have done insert into flow profiling

Re: What is the transaction ability of CarbonData? Does it support the transaction like this.

2020-02-17 Thread Ravindra Pesala
Hi , Yes, you are right. Carbon supports the way you expected. It can either give the data before overwrite or after overwrite in another session when you run query concurrently. It never gives `FileNotFoundException`. Regards, Ravindra. On Mon, 17 Feb 2020, 20:14 李书明, wrote: > Hi

Re: Discussion: change default compressor to ZSTD

2020-02-07 Thread Ravindra Pesala
Hi Jacky, As per the original PR https://github.com/apache/carbondata/pull/2628 , query performance got decreased by 20% ~ 50% compared to snappy. So I am concerned about the performance. Please better have a proper tpch performance report on the regular cluster like we do for every version and

Re: [Discussion] Support Secondary Index on Carbon Table

2020-02-05 Thread Ravindra Pesala
+1 Regards, Ravindra. On Wed, 5 Feb 2020 at 8:03 PM, Indhumathi M wrote: > Hi Community, > > Currently we have datamaps like,* default datamaps* which are block and > blocklet and *coarse grained datamaps* like bloom, and *fine grained > datamaps* like lucene > which helps in better pruning

Re: Optimize and refactor insert into command

2020-01-02 Thread Ravindra Pesala
Hi, +1 It’s a long pending work. Most welcome. Regards, Ravindra. On Fri, 20 Dec 2019 at 7:55 AM, Ajantha Bhat wrote: > Currently carbondata "insert into" uses the CarbonLoadDataCommand itself. > Load process has steps like parsing and converter step with bad record > support. > Insert into

Re: [VOTE] Apache CarbonData 1.6.1(RC1) release

2019-10-14 Thread Ravindra Pesala
+1 On Sat, 12 Oct 2019 at 9:18 AM, 恩爸 <441586...@qq.com> wrote: > +1. > But it better remove CARBONDATA-3540 and CARBONDATA-3544 from release > notes, these two improvements are not included in 1.6.1. > > > > > --Original-- > From:"kunalkapoor [via Apache

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Ravindra Pesala
void the problem i mentioned above with datamaps loaded > in cache. > > 4. I agree, your point is valid one. I will do more abalysis on this based on > the user use cases and then we can decide finally. That would be better. > > Please give your inputs/suggestions on the abo

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-06 Thread Ravindra Pesala
gt; cannot avoid aggregations from main table. > > > Regards, > Akash R Nilugal > > On 2019/10/04 11:35:46, Ravindra Pesala wrote: >> Hi Akash, >> >> I have following suggestions. >> >> 1. I think it is redundant to use granularity inside create

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-04 Thread Ravindra Pesala
Hi Akash, I have following suggestions. 1. I think it is redundant to use granularity inside create datamap, user can use the respective granularity UDF in his query like time(1h) or time(1d) etc. 2. Better create separate RP commands and let user add the RP on the datamap or even on the main

Re: [ANNOUNCE] Ajantha as new Apache CarbonData committer

2019-10-03 Thread Ravindra Pesala
Congrats Ajantha and welcome. Regards, Ravindra. > On 3 Oct 2019, at 8:00 PM, Liang Chen wrote: > > Hi > > > We are pleased to announce that the PMC has invited Ajantha as new Apache > CarbonData committer and the invite has been accepted! > > Congrats to Ajantha and welcome aboard. > >

[DISCUSSION] Support heterogeneous format segments in carbondata

2019-09-10 Thread Ravindra Pesala
Hi All, This discussion is regarding support of other formats in carbon. Already existing customers use other formats like parquet, orc etc., but if they want to migrate to carbon there is no proper solution at hand. So this feature allows all the old data to add as a segment to carbondata .

Re: [DISCUSSION] implement MERGE INTO statement

2019-08-31 Thread Ravindra Pesala
Hi David, +1 It is better to follow the hive syntax rather than having our own. Please check it https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge And also it is better to have design document explaining the changes to be done on current IUD. Regards,

Re: Adapt to SparkSessionExtensions

2019-08-31 Thread Ravindra Pesala
Hi, I think it is better to work on the master branch instead of 2.0 branch. It will avoid the rebase cost and unnecessary confusion. it is better to go with proper version quality. Regards, Ravindra. On Mon, 26 Aug 2019 at 8:13 PM, Jacky Li wrote: > I have created branch-2.0, let's work on

Time travel/versioning on carbondata.

2019-08-23 Thread Ravindra Pesala
Hi All, CarbonData allows to store the data incrementally and do the Update/Delete operations on the stored data. But the user always can access the latest state of data at that point of time. In the current system, it is not possible to access the old version of data. And it is not possible to

Re: [DISCUSSION] Cache Pre Priming

2019-08-23 Thread Ravindra Pesala
Hi Akash, +1 for Vishal suggestion.Better focus on load data cache sync. Regards, Ravindra. On Fri, 23 Aug 2019 at 16:35, Akash Nilugal wrote: > Hi vishal, > > Your point is correct, we can focus on just loading to cache after data > load is finished (Async Operation). > for DDL support,

Re: [VOTE] Apache CarbonData 1.6.0(RC3) release

2019-08-14 Thread Ravindra Pesala
+1 Regards, Ravindra. On Tue, 13 Aug 2019 at 17:12, Raghunandan S < carbondatacontributi...@gmail.com> wrote: > Hi > > > I submit the Apache CarbonData 1.6.0 (RC3) for your vote. > > > 1.Release Notes: > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344965 > >

Re: Apache CarbonData 2 RoadMap Feedback

2019-07-18 Thread Ravindra Pesala
Hi, Yes, Flink and CarbonData integration will definitely attract more users. We welcome any contributions in that direction. Regards, Ravindra. On Thu, 18 Jul 2019 at 07:55, 蒋晓峰 wrote: > Hi Community, > > > > >I have already read CarbonData 2 roadmap.I consider that integration > with

Re: [Discussion] Roadmap for Apache CarbonData 2

2019-07-18 Thread Ravindra Pesala
s stablility. > Maybe we can consider about this also. > > > > > - > Regards > Manhua > > > > ---Original--- > From: "Ravindra Pesala" > Date: Tue, Jul 16, 2019 22:31 PM > To: "dev"; > Subject: [Discussion] Roadmap fo

[Discussion] Roadmap for Apache CarbonData 2

2019-07-16 Thread Ravindra Pesala
Hi Community, Three years have passed since the launching of the Apache CarbonData project, CarbonData has become a popular data management solution for various scenarios. As new workload like AI and new runtime environment like the cloud is emerging quickly, I think we are reaching a point that

[VOTE] Apache CarbonData 1.6.0(RC1) release

2019-07-15 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.6.0 (RC1) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344965 Some key features and improvements in this release: 1. Supported Index Server to distribute the index cache and parallelize the

Re: [VOTE] Apache CarbonData 1.5.4(RC1) release

2019-05-29 Thread Ravindra Pesala
Hi all PMC vote has passed for Apache Carbondata 1.5.4 release, the result as below: +1(binding): 4(Jacky, Kumar Vishal, Ravindra, Liang Chen) +1(non-binding) : 3 Thanks all for your vote. Regards, Ravindra. On Wed, 29 May 2019 at 15:28, Liang Chen wrote: > +1 > > Regards > Liang > >

[VOTE] Apache CarbonData 1.5.4(RC1) release

2019-05-17 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.5.4 (RC1) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12345388 Some key features and improvements in this release: 1. Supported alter SORT_COLUMNS property on the table to allow changing sort

Re: [VOTE] Apache CarbonData 1.5.3(RC1) release

2019-04-08 Thread Ravindra Pesala
+1 Regards, Ravindra. On Wed, 3 Apr 2019 at 1:23 PM, Raghunandan S < carbondatacontributi...@gmail.com> wrote: > Hi > > > I submit the Apache CarbonData 1.5.3 (RC1) for your vote. > > > 1.Release Notes: > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344322 > >

Re: [VOTE] Apache CarbonData 1.5.2(RC2) release

2019-02-01 Thread Ravindra Pesala
+1 Regards, Ravindra. On Wed, 30 Jan 2019 at 10:54 PM, Raghunandan S < carbondatacontributi...@gmail.com> wrote: > Hi > > > I submit the Apache CarbonData 1.5.2 (RC2) for your vote. > > > 1.Release Notes: > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344321 >

Re: [DISCUSS] Move to gitbox as per ASF infra team mail

2019-01-05 Thread Ravindra Pesala
+1 Regards, Ravindra. On Sun, 6 Jan 2019 at 01:45, Kumar Vishal wrote: > +1 > Regards > Kumar Vishal > > On Sat, 5 Jan 2019 at 10:23, xuchuanyin wrote: > > > +1 > > > > seems the committers only need to change the url for asf repo, that's OK. > > > > On 5/1/2019 10:08, Liang Chen wrote: > > >

Re: [ANNOUNCE] Chuanyin Xu as new PMC for Apache CarbonData

2019-01-01 Thread Ravindra Pesala
Congrats. Regards, Ravindra. On Wed, 2 Jan 2019 at 05:49, Liang Chen wrote: > Hi > > We are pleased to announce that Chuanyin Xu as new PMC for Apache > CarbonData. > > Congrats to Chuanyin Xu! > > Apache CarbonData PMC > -- Thanks & Regards, Ravi

[ANNOUNCE] Apache CarbonData 1.5.1 release

2018-12-04 Thread Ravindra Pesala
Hi, Apache CarbonData community is pleased to announce the release of the Version 1.5.1 in The Apache Software Foundation (ASF). CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail

Re: [VOTE] Apache CarbonData 1.5.1(RC2) release

2018-12-04 Thread Ravindra Pesala
Hi all PMC vote has passed for Apache Carbondata 1.5.1 release, the result as below: +1(binding): 4(Liang Chen, Jacky, Kumar Vishal, Ravindra) +1(non-binding) : 5 Thanks all for your vote. Regards, Ravindra On Tue, 4 Dec 2018 at 16:57, Bhavya Aggarwal wrote: > +1 > > Regards > Bhavya > >

[VOTE] Apache CarbonData 1.5.1(RC2) release

2018-11-30 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.5.1 (RC2) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344320 Some key features and improvements in this release: 1. Optimized scan performance by avoiding multiple data copies and

[Discussion] Bloom memory and pruning optimisation using hierarchical pruning.

2018-11-28 Thread Ravindra Pesala
Hi, Problem: Current bloom filter is calculated at the blocklet level and if the cardinality of a column is more and number of blocklets loaded are more then the bloom size will become bigger. In few of our use cases, it is grown till 60 GB also, and it might increase when data grows or add more

Re: [proposal] Parallelize block pruning of default datamap in driver for filter query processing.

2018-11-22 Thread Ravindra Pesala
+1, It will be helpful for pruning millions of data files in less time. Please try to generalize for all datamaps. Thanks & Regards Ravindra On Fri, 23 Nov 2018 at 10:24, Ajantha Bhat wrote: > @xuchuanyin > Yes, I will be handling this for all types of datamap pruning in the same > flow when I

[VOTE] Apache CarbonData 1.5.1(RC1) release

2018-11-21 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.5.1 (RC1) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12344320 Some key features and improvements in this release: 1. Optimized scan performance by avoiding multiple data copies and

[Proposal] Thoughts on general guidelines to follow in Apache CarbonData community

2018-11-16 Thread Ravindra Pesala
Hi Please find our thoughts on the guidelines we can follow in the community to ensure the quality of Carbondata and make the community more collaborative. Let us discuss here. 1.Let us discuss all the features in the community before starting design. Let us attach the design document in

Re: Throw 'NoSuchElementException: None.get' error when use CarbonSession to read parquet.

2018-11-16 Thread Ravindra Pesala
Hi, I will check and fix it. Regards, Ravindra On Fri, 16 Nov 2018 at 09:24, xm_zzc <441586...@qq.com> wrote: > Hi: > Please help. I used CarbonSession to read parquet and it throws > 'NoSuchElementException: None.get' error, reading carbondata files is ok. > *Env*: local mode, Spark 2.3 +

[ANNOUNCE] Apache CarbonData 1.5.0 release

2018-10-16 Thread Ravindra Pesala
Hi, Apache CarbonData community is pleased to announce the release of the Version 1.5.0 in The Apache Software Foundation (ASF). CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookups on detail

Re: [VOTE] Apache CarbonData 1.5.0(RC2) release

2018-10-15 Thread Ravindra Pesala
Hi all PMC vote has passed for Apache Carbondata 1.5.0 release, the result as below: +1(binding): 4(Liang Chen, JB, Kumar Vishal, Ravindra) +1(non-binding) : 3 Thanks all for your vote. Regards, Ravindra On Fri, 12 Oct 2018 at 00:20, Kumar Vishal wrote: > +1 > Regards > Kumar Vishal > >

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

2018-10-02 Thread Ravindra Pesala
Hi Aaron, CarbonData profiler is not tested feature added in old version, So it might have broken or not adding correct information during explain command. We will try to correct it in the next version, meanwhile can you please check and make sure that the data you are getting from query is

Re: [ANNOUNCE] Raghunandan as new committer of Apache CarbonData

2018-09-26 Thread Ravindra Pesala
Congrats Raghu On Wed, 26 Sep 2018 at 12:53, sujith chacko wrote: > Congratulations Raghu > > On Wed, 26 Sep 2018 at 12:44 PM, Rahul Kumar > wrote: > > > congrats Raghunandan !! > > > > > > Rahul Kumar > > *Sr. Software Consultant* > > *Knoldus Inc.* > > m: 9555480074 > > w: www.knoldus.com

[VOTE] Apache CarbonData 1.5.0(RC1) release

2018-09-25 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.5.0 (RC1) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12341006 Some key features and improvements in this release: 1. Supported carbon as Spark datasource using Spark's Fileformat

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-08-21 Thread Ravindra Pesala
Yes, I agree with Liang. We no need to consider showing sql in describe table in case of CTAS. Regards Ravindra On Tue, 21 Aug 2018 at 20:47, Raghunandan S < carbondatacontributi...@gmail.com> wrote: > Hi, > In opinion it is not required to show the original select sql. Also is > there a way to

[DISCUSSION] Support Standard Spark's FileFormat interface in Carbondata

2018-08-21 Thread Ravindra Pesala
Hi, Current Carbondata has deep integration with Spark to provide optimizations in performance and also supports features like compaction, IUD, data maps and metadata management etc. This type of integration forces user to use CarbonSession instance to use carbon even for read and write

[ANNOUNCE] Apache CarbonData 1.4.1 release

2018-08-15 Thread Ravindra Pesala
Hi, Apache CarbonData community is pleased to announce the release of the Version 1.4.1 in The Apache Software Foundation (ASF). CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookups on detail

[VOTE] Apache CarbonData 1.4.1(RC2) release

2018-08-09 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.4.1 (RC2) for your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12343148 Some key features and improvements in this release: 1. Supported Local dictionary to improve IO and query performance.

[Discussion] Refactor Segment Management Interface.

2018-08-03 Thread Ravindra Pesala
Hi, *Carbon uses tablestatus file to record segment status and details of each segment during each load. This tablestatus enables carbon to support concurrent loads and reads without data inconsistency or corruption.So it is a very important feature of carbondata and we should have clean

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-21 Thread Ravindra Pesala
Hi Manish, Thanks for proposing the solutions of driver memory problem. +1 for solution 1 but it may not be the complete solution. We should also have solution 2 to solve driver memory issue completely. I think in a very near feature we should have solution 2 as well. I have few doubts and

Re: Complex DataType Enhancements

2018-06-14 Thread Ravindra Pesala
Hi Sounak, Are you planning to do predicate pushdown or projection push down for struct type? I guess adaptive encoding is only possible for integral datatypes like long, int, short not for all datatypes. So please be list down what type of encoding you are planning on complex types. Regards,

Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Ravindra Pesala
Hi Vishal, +1 Thank you for starting a discussion on it. It will be a very helpful feature to improve query performance and reduces the memory footprint. Please add the design document for the same. Regards, Ravindra. On 5 June 2018 at 09:22, xuchuanyin wrote: > Hi, Kumar: > Local

[VOTE] Apache CarbonData 1.4.0(RC2) release

2018-05-22 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.4.0 (RC2) for your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=1234100

Re: Change the 'comment' content for column when execute command 'desc formatted table_name'

2018-04-26 Thread Ravindra Pesala
I agree with Liang's suggestion to align the information with table schema. And I have one suggestion related to NO_INVERTED_INDEX , instead of mentioning no inverted index columns better mention which are inverted index columns. It is very hard user to understand which are inverted index

Re: [VOTE] Apache CarbonData 1.3.1(RC1) release

2018-03-09 Thread Ravindra Pesala
Hi all PMC vote has passed for Apache Carbondata 1.3.1 release, the result as below: +1(binding): 5(Liang Chen, Jacky, David, JB, Ravindra) +1(non-binding) : 1 Thanks all for your vote. Regards, Ravindra On 5 March 2018 at 15:10, David CaiQiang wrote: > +1 > > > >

[VOTE] Apache CarbonData 1.3.1(RC1) release

2018-03-04 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.3.1 (RC1) for your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12342754 * Some key improvement in this patch release:

Re: About bucket feature in carbon

2018-02-09 Thread Ravindra Pesala
Jacky > > > > 在 2018年2月9日,下午3:49,Ravindra Pesala <ravi.pes...@gmail.com> 写道: > > > > Hi Likun, > > > > I feel it is better to change the implementation to use sparks bucketing > > generation just like how standard hive partitions gener

Re: About bucket feature in carbon

2018-02-08 Thread Ravindra Pesala
Hi Likun, I feel it is better to change the implementation to use sparks bucketing generation just like how standard hive partitions generates. It will be easy to change it after implementing of partition feature. And it is a useful feature for joining big tables and hash based buckets and

Re: [VOTE] Apache CarbonData 1.3.0(RC2) release

2018-02-06 Thread Ravindra Pesala
Hi all PMC vote has passed for Apache Carbondata 1.3.0 release, the result as below: +1(binding): 5(Liang Chen, Jacky, Kumar Vishal, JB, Ravindra) +1(non-binding) : 3 Thanks all for your vote. Regards, Ravindra On 5 February 2018 at 14:05, xm_zzc <441586...@qq.com> wrote: > +1 > > The

[VOTE] Apache CarbonData 1.3.0(RC2) release

2018-02-03 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.3.0 (RC2) for your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12341004 * Some key improvement in this patch release:

[VOTE] Apache CarbonData 1.3.0(RC1) release

2018-01-09 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.3.0 (RC1) for your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12341004 * Some key improvement in this patch release:

Re: [ANNOUNCE] Kunal Kapoor as new Apache CarbonData committer

2018-01-08 Thread Ravindra Pesala
Congrats Kunal Regards, Ravindra On 8 January 2018 at 20:29, xm_zzc <441586...@qq.com> wrote: > Congratulations Kunal !! > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ > -- Thanks & Regards, Ravi

Initiating Apache CarbonData-1.3.0 Release

2017-12-23 Thread Ravindra Pesala
Hi All, We are initiating CarbonData 1.3.0 release so no new features are allowed to commit on master branch till the release is done. We will stabilize the code and only defect fixes are allowed to commit. Please let us know if any urgent features need to be merged into 1.3.0 version so that we

Re: [Discussion] Support Spark/Hive based partition in carbon

2017-12-09 Thread Ravindra Pesala
d segment management. > > @Jacky @Ravindra @chenliang @David CaiQiang can you estimate the impact? > > > > Best regards! > Yuhai Cen > > > 在2017年12月5日 15:29,Ravindra Pesala<ravi.pes...@gmail.com> 写道: > Hi Jacky, > > Here we have the main probl

Re: 回复: [DISCUSSION] Refactory on spark related modules

2017-12-05 Thread Ravindra Pesala
Hi Jacky, I don't think it's a good idea to create new modules for spark2.1 and spark2.2 versions.We should not create a module for every spark minor version. Earlier we had a modules spark and spark2 because of major version change and a lot of interfaces are changed along with it. If it is a

Re: [Discussion] Support Spark/Hive based partition in carbon

2017-12-04 Thread Ravindra Pesala
ere any problem if use the hive/spark folder > structure? > > > > > > > > > Best regards! > Yuhai Cen > > > 在2017年12月4日 14:09,Ravindra Pesala<ravi.pes...@gmail.com> 写道: > Hi, > > > Please find the design document for

Re: [Discussion] Support Spark/Hive based partition in carbon

2017-12-03 Thread Ravindra Pesala
Hi, Please find the design document for standard partition support in carbon. https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT0-6pkQ7GQ/edit?usp=sharing Regards, Ravindra. On 27 November 2017 at 17:36, cenyuhai11 wrote: > The datasource api still

[Discussion] Support Spark/Hive based partition in carbon

2017-11-21 Thread Ravindra Pesala
Partition features of Spark: 1. Creating table with partition CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name1 col_type1 [COMMENT col_comment1], ...)] USING datasource [OPTIONS (key1=val1, key2=val2, ...)] [PARTITIONED BY (col_name1, col_name2, ...)]

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-11-06 Thread Ravindra Pesala
Hi Bill, Please find my comments. 1. We are not supporting join queries in this design so it will be always one parent table for an aggregate table. We may consider the join queries for creating aggregation queries in future. 2. Aggregation column name will be created internally and it would be

Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-25 Thread Ravindra Pesala
Hi Liang, Now the TABLE_BLOCKSIZE is only limited to the size of carbondata file. It is not considered for allocating tasks. So it does not matter the size of TABLE_BLOCKSIZE. But yes we can consider it as 512M. We can also change the default of blocklet (carbon.blockletgroup.size.in.mb) size to

Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-25 Thread Ravindra Pesala
Hi, Yes, it is a good suggestion we can plan to set the number of loading cores dynamically as per the available executor cores. Can you please raise jira for it. Regards, Ravindra On 25 October 2017 at 12:08, xm_zzc <441586...@qq.com> wrote: > Hi: > If we are using carbondata + spark to

[Discussion] Merging carbonindex files for each segments and across segments

2017-10-20 Thread Ravindra Pesala
Hi, Problem : The first-time query of carbon becomes very slow. It is because of reading many small carbonindex files and cache to the driver at the first time. Many carbonindex files are created in two cases Case 1: Loading data in large cluster For example, if the cluster size is 100

Re: [Discussion] Carbon Store abstraction

2017-10-20 Thread Ravindra Pesala
Hi Jacky, Thank you for steering this activity. Yes, there is a need to refactor code to get the store management out of spark integration module. It becomes difficult to add another integration module if there is no clear API for store management. Please find my comments. 1. Is it really

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-17 Thread Ravindra Pesala
Hi Bhavya, For pre-aggregate table load, we will not delete old data and calculate aggregation every time. We load aggregation tables also incrementally along with the main table. For suppose if we create an aggregation table on the main table then aggregation table is calculated and loaded with

Re: DataMap Interface requires `IndexColumns` as Input

2017-10-09 Thread Ravindra Pesala
Hi, Indexed columns on which datamap is created is present in DataMapFactory. You can check getMeta method. By using the filter expression tree during pruning we can get the filter columns and prune the related datamap. Please don't refer the PR 1399 yet as it is still incomplete and many things

Re: [DISCUSSION] support user specified segment reading for query

2017-10-05 Thread Ravindra Pesala
Hi, Instead of using SET command to use for segments why don't you use QUERY HINT . Using query hint we can mention the segments inside the query itself as a hint. For example SELECT /*+SEGMENTS(1,3,5) */ from t1. By using the above custom hint we can query from selected segments only, This

[ANNOUNCE] Apache CarbonData 1.2.0 release

2017-09-29 Thread Ravindra Pesala
Hi All, The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.2.0 1.Release Notes: *https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+1.2.0+Release

[VOTE] Apache CarbonData 1.2.0(RC3) release

2017-09-22 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.2.0 (RC3) to your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12340260 * Some key improvement in this patch release:

Re: [DISCUSSION] Update the function of show segments

2017-09-21 Thread Ravindra Pesala
Hi, I agree with Jacky and David. But it is suggested to keep current 'show segments' command without any change and provide only brief information about segments. Add new extended command like `extended show segments` to provide more information which is required for power user. Regards, only

Re: Fw:carbonthriftserver can not be load many times

2017-09-13 Thread Ravindra Pesala
Hi, I have a confusion here. 1 and 2 steps are done through one beeline session and 3,4 and 5 are done from another beeline session? And also can you try it on the current master branch if the same issue exists? Regards, Ravindra. On 13 September 2017 at 15:14, dylan

Re: [ANNOUNCE] Lu Cao as new Apache CarbonData committer

2017-09-13 Thread Ravindra Pesala
Congratulations Lu Cao. Welcome Regards, Ravindra On Wed, 13 Sep 2017 at 7:44 PM, Bhavya Aggarwal wrote: > Congrats Lu Cao .. > > Thanks and regards > Bhavya > > On Wed, Sep 13, 2017 at 7:30 PM, Raghunandan S < > carbondatacontributi...@gmail.com> wrote: > > > Congrats lu

Re: Fw:carbonthriftserver can not be load many times

2017-09-12 Thread Ravindra Pesala
Hi, It is not the behavior of carbondata, it must be a bug. Usually, when you update then the cache refreshes for next query. Please provide following information. 1. Carbondata and Spark version you are using. 2. Testcase to reproduce this issue. Regards, Ravindra. On 12 September 2017 at

Re: Add an option such as 'carbon.update.storage.level' to configurate the storage level when updating data with 'carbon.update.persist.enable'='true'

2017-09-07 Thread Ravindra Pesala
Hi, I don't see any problem in adding the options although MEMORY_AND_DISK is a preferable option. You can keep it as a developer options only and no need to expose to the user. Regards, Ravindra. On 5 September 2017 at 11:15, xm_zzc <441586...@qq.com> wrote: > I have searched and there was no

[VOTE] Apache CarbonData 1.2.0(RC1) release

2017-09-01 Thread Ravindra Pesala
Hi I submit the Apache CarbonData 1.2.0 (RC1) to your vote. 1.Release Notes: *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220=12340260 * Some key improvement in this patch release:

  1   2   >