Re: [Discussion] Implement Partition Table Feature

2017-04-15 Thread Jacky Li
> 在 2017年4月15日,下午12:00,Jacky Li 写道: > > Hi Cao Lu, > > The overall design likes good to me, I just have following points need to > confirm: > 1. Is there detele partition DDL? > 2. For the data loading part, it needs to do global shuffle before actual > data loa

Re: [Discussion] Implement Partition Table Feature

2017-04-14 Thread Jacky Li
Hi Cao Lu, The overall design likes good to me, I just have following points need to confirm: 1. Is there detele partition DDL? 2. For the data loading part, it needs to do global shuffle before actual data loading? And the partition key should not be included in SORT_COLUMNS option, right? If yes

Re: [jira] [Created] (CARBONDATA-836) Error in load using dataframe - columns containing comma

2017-04-11 Thread Jacky Li
Hi Sanoj, This is because in CarbonData loading flow, it needs to scan input data twice (one for generating global dictionary, another for actual loading). If user is using Dataframe to write to CarbonData, and if the input dataframe compute is costly, it is better to save it as a temporary CSV

[jira] [Created] (CARBONDATA-882) Add no sort support in dataframe writer

2017-04-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-882: --- Summary: Add no sort support in dataframe writer Key: CARBONDATA-882 URL: https://issues.apache.org/jira/browse/CARBONDATA-882 Project: CarbonData Issue Type

Re: [DISCUSSION]support new feature: Partition Table

2017-04-05 Thread Jacky Li
comments inline > 在 2017年4月1日,下午5:06,a 写道: > > additinal suggestion: > 1、support at least two level partition I think we can let user specify the partition columns, it can be multiple columns together to form a partition key. Is this what you mean by two level partition? Generally speaking, p

Re: [DISCUSSION]implement delta encoding for numeric type column in SORT_COLUMNS

2017-04-05 Thread Jacky Li
> 在 2017年4月5日,下午6:31,QiangCai 写道: > > Hi all, > >Now we plan to implement delta encoding for the numeric type column in > SORT_COLUMNS. > >1. use delta encoding to encode the numeric type data > I think the adaptive data type conversion still apply here, right? >2. write presen

Re: [VOTE] Apache CarbonData 1.1.0-incubating (RC1) release

2017-04-05 Thread Jacky Li
I think better to resolve following issue before 1.1.0 release document should be synchronized : [CARBONDATA-865] [CARBONDATA-862] bug: [CARBONDATA-870]

Re: [DISCUSSION]: (New Feature) Streaming Ingestion into CarbonData

2017-03-29 Thread Jacky Li
ne in phase 1. Maintain append > offsets > and metadata information. > Is the streaming data file format implemented in this phase? > AA>> I think we can directly leverage from existing V3 format without much > changes in basic writer/reader framework, in that case implementi

Re: [DISCUSSION]: (New Feature) Streaming Ingestion into CarbonData

2017-03-28 Thread Jacky Li
Hi Aniket, This feature looks great, the overall plan also seems fine to me. Thanks for proposing it. And I have some doubts inline. > 在 2017年3月27日,下午6:34,Aniket Adnaik 写道: > > Hi All, > > I would like to open up a discussion for new feature to support streaming > ingestion in CarbonData. >

[jira] [Created] (CARBONDATA-829) DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL

2017-03-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-829: --- Summary: DICTIONARY_EXCLUDE is not working when using Spark Datasource DDL Key: CARBONDATA-829 URL: https://issues.apache.org/jira/browse/CARBONDATA-829 Project

[jira] [Created] (CARBONDATA-827) Query statistics log format is incorrect

2017-03-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-827: --- Summary: Query statistics log format is incorrect Key: CARBONDATA-827 URL: https://issues.apache.org/jira/browse/CARBONDATA-827 Project: CarbonData Issue Type

Re: data not input hive

2017-03-27 Thread Jacky Li
Hi, Carbon does not support load data using Hive yet. You can use Spark to load. Regards, Jacky > 在 2017年3月27日,下午2:17,风云际会 <1141982...@qq.com> 写道: > > spark 2.1.0 > hive 1.2.1 > Couldn't find corresponding Hive SerDe for data source provider > org.apache.spark.sql.CarbonSource. Persisting data

[jira] [Created] (CARBONDATA-823) Refactory of data write step

2017-03-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-823: --- Summary: Refactory of data write step Key: CARBONDATA-823 URL: https://issues.apache.org/jira/browse/CARBONDATA-823 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-820) Redundant BitSet created in data load

2017-03-25 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-820: --- Summary: Redundant BitSet created in data load Key: CARBONDATA-820 URL: https://issues.apache.org/jira/browse/CARBONDATA-820 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-812) make vectorized reader as default reader

2017-03-23 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-812: --- Summary: make vectorized reader as default reader Key: CARBONDATA-812 URL: https://issues.apache.org/jira/browse/CARBONDATA-812 Project: CarbonData Issue Type

Re: [PROPOSAL] Update on the Jenkins CarbonData job

2017-03-18 Thread Jacky Li
+1 > 在 2017年3月17日,下午10:48,Jean-Baptiste Onofré 写道: > > Hi guys, > > Tomorrow I plan to update our jobs on Apache Jenkins as the following: > > - carbondata-master-spark-1.5 building master branch with Spark 1.5 profile > - carbondata-master-spark-1.6 building master branch with Spark 1.6 profi

Re: Question related to lazy decoding optimzation

2017-03-14 Thread Jacky Li
>> I watched one session of "Apache Carbondata" in Spark Submit 2017. The >> video is here: https://www.youtube.com/watch?v=lhsAg2H_GXc. >> >> [https://i.ytimg.com/vi/lhsAg2H_GXc/maxresdefault.jpg]< >> https://www.youtube.com/watch?v=lhsAg2H_GXc> &

Re: Removing of kettle code from Carbondata

2017-03-14 Thread Jacky Li
+1 for removing the kettle code. And how about the sorter implementation, currently, default property for off heap sort (ENABLE_UNSAFE_SORT) is false, how about making it to true and we can also remove heap sorter in the future after off heap sort is fully tested Regards, Jacky > 在 2017年3月13日

Re: column auto mapping when loading data from csv file

2017-03-14 Thread Jacky Li
Hi Yinwei, I am OK with this new feature if there is an option in load script to enable it. So user can explicitly enable it if he wants, and not changing the current 2 choices. Regards, Jacky > 在 2017年3月13日,上午10:18,Yinwei Li <251469...@qq.com> 写道: > > Hi all, > > > when loading data from

[jira] [Created] (CARBONDATA-747) Add simple performance test for spark2.1 carbon integration

2017-03-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-747: --- Summary: Add simple performance test for spark2.1 carbon integration Key: CARBONDATA-747 URL: https://issues.apache.org/jira/browse/CARBONDATA-747 Project: CarbonData

[jira] [Created] (CARBONDATA-746) Support spark-sql CLI for spark2.1 carbon integration

2017-03-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-746: --- Summary: Support spark-sql CLI for spark2.1 carbon integration Key: CARBONDATA-746 URL: https://issues.apache.org/jira/browse/CARBONDATA-746 Project: CarbonData

Re: Improving Non-dictionary storage & performance.

2017-03-02 Thread Jacky Li
you mentioned we can suggest 2-pass for first load and subsequent loads > will use single-pass to improve the performance. > > Regards, > Ravindra. > > On 2 March 2017 at 06:48, Jacky Li wrote: > >> Hi Ravindra & Vishal, >> >> Yes, I think these works

Re: [DISCUSS] For the dimension default should be no dictionary

2017-03-02 Thread Jacky Li
t;>> focus on reducing the store size and improve the filter queries on >>> non-dictionary columns.Even memory usage is higher while querying the >>> non-dictionary columns. >>> >>> Regards, >>> Ravindra. >>> >>> On 1 March 2017 at

Re: Improving Non-dictionary storage & performance.

2017-03-01 Thread Jacky Li
Hi Ravindra & Vishal, Yes, I think these works need to be done before switching no-dictionary as default. So as of now, we should use dictionary as default. I think we can suggest user to do loading as: 1. First load: use 2-pass mode to load, the first scan should discover the cardinality, and

Re: [DISCUSS] Graduation to a TLP (Top Level Project)

2017-03-01 Thread Jacky Li
+1 Thanks JB for driving it. I am super exited! Regards, Jacky > 在 2017年3月2日,上午12:33,Naresh P R 写道: > > +1 > > Thanks for your guidance JB > > Regards, > Naresh P R > > On Mar 1, 2017 3:50 PM, "Jean-Baptiste Onofré" wrote: > > Hi Liang, > > We are now good. I will update pull requests an

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
;> >> ------- >> For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as >> Inverted Index and with Minmax Index >> >> Regards >> Liang >

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
- > For example, SORT_COLUMNS=“C1,C2,C3”, means C1,C2,C3 is MDK and encoded as > Inverted Index and with Minmax Index > Sort it using original value > Regards > Liang > > 2017-02-28 19:35 GMT+08:00 Jacky Li : > >> Yes, first we should simplify the DDL optio

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this key

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this k

[ANNOUNCE] Apache CarbonData 1.0.0-incubating released

2017-01-29 Thread Jacky Li
Hi All, The Apache CarbonData PMC team is happy to annouce the release of Apache CarbonData version 1.0.0-incubating. Apache CarbonData(incubating) is an indexed columnar data format for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. The release notes is available a

Re: [RESULT][VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-24 Thread Jacky Li
PPMC vote has passed, the result as below: +1(binding) : 7(Liang Chen,Jean-Baptiste Onofré , Ravindra,Jihong Ma,Vimal,Jarray,Venkata Gollamudi) +1(non-binding) : 4 Thanks all for your vote. Regards, Jacky Li -- View this message in context: http://apache-carbondata-mailing-list-archive

Re: [VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-20 Thread Jacky Li
tor-carbondata/tree/master/build> > 在 2017年1月21日,上午9:36,Jacky Li 写道: > > Hi all, > > Please vote on releasing the following candidate as Apache > CarbonData(incubating) > version 1.0.0. > > Release Notes: > https://issues.apache.org/jira/secure/Release

[VOTE] Apache CarbonData 1.0.0-incubating release (RC2)

2017-01-20 Thread Jacky Li
Hi all, Please vote on releasing the following candidate as Apache CarbonData(incubating) version 1.0.0. Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12338020

[jira] [Created] (CARBONDATA-638) Move package in carbon-core module

2017-01-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-638: --- Summary: Move package in carbon-core module Key: CARBONDATA-638 URL: https://issues.apache.org/jira/browse/CARBONDATA-638 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-637) Remove table_status file

2017-01-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-637: --- Summary: Remove table_status file Key: CARBONDATA-637 URL: https://issues.apache.org/jira/browse/CARBONDATA-637 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-606) Add a Flink example to read CarbonData files

2017-01-07 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-606: --- Summary: Add a Flink example to read CarbonData files Key: CARBONDATA-606 URL: https://issues.apache.org/jira/browse/CARBONDATA-606 Project: CarbonData Issue

Re: carbon shell is not working with spark 2.0 version

2017-01-04 Thread Jacky Li
Hi, IMHO, user should be free to use the command line tool provided by compute engine like spark, hive, flink and others. As a file format, I think it is not carbon’s focus to provide this shell tool. Regards, Jacky > 在 2017年1月4日,下午3:10,anubhavtarar 写道: > > carbon shell is not working with s

Re: Unable to run Carbon Shell on Spark 2.0

2016-12-29 Thread Jacky Li
Hi Harmeet, Now ThriftServer that uses CarbonSession has been merged into master branch, please try the latest master. Thanks. Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-run-Carbon-Shell-on-Spark-2-0-tp5198p526

[jira] [Created] (CARBONDATA-571) clean up code for carbon-spark module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-571: --- Summary: clean up code for carbon-spark module Key: CARBONDATA-571 URL: https://issues.apache.org/jira/browse/CARBONDATA-571 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-570) clean up code for carbon-hadoop module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-570: --- Summary: clean up code for carbon-hadoop module Key: CARBONDATA-570 URL: https://issues.apache.org/jira/browse/CARBONDATA-570 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-572) clean up code for carbon-spark-common module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-572: --- Summary: clean up code for carbon-spark-common module Key: CARBONDATA-572 URL: https://issues.apache.org/jira/browse/CARBONDATA-572 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-569) clean up code for carbon-core module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-569: --- Summary: clean up code for carbon-core module Key: CARBONDATA-569 URL: https://issues.apache.org/jira/browse/CARBONDATA-569 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-568) clean up code for carbon-core module

2016-12-27 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-568: --- Summary: clean up code for carbon-core module Key: CARBONDATA-568 URL: https://issues.apache.org/jira/browse/CARBONDATA-568 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-566) clean up code for carbon-spark2 module

2016-12-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-566: --- Summary: clean up code for carbon-spark2 module Key: CARBONDATA-566 URL: https://issues.apache.org/jira/browse/CARBONDATA-566 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-565) Clean up code

2016-12-26 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-565: --- Summary: Clean up code Key: CARBONDATA-565 URL: https://issues.apache.org/jira/browse/CARBONDATA-565 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-546) Extract data management command to carbon-spark-common module

2016-12-20 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-546: --- Summary: Extract data management command to carbon-spark-common module Key: CARBONDATA-546 URL: https://issues.apache.org/jira/browse/CARBONDATA-546 Project

[jira] [Created] (CARBONDATA-539) Return empty row in map reduce application

2016-12-18 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-539: --- Summary: Return empty row in map reduce application Key: CARBONDATA-539 URL: https://issues.apache.org/jira/browse/CARBONDATA-539 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-538) Add test case to spark2 integration

2016-12-15 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-538: --- Summary: Add test case to spark2 integration Key: CARBONDATA-538 URL: https://issues.apache.org/jira/browse/CARBONDATA-538 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-537) Bug fix for DICTIONARY_EXCLUDE option in spark2 integration

2016-12-15 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-537: --- Summary: Bug fix for DICTIONARY_EXCLUDE option in spark2 integration Key: CARBONDATA-537 URL: https://issues.apache.org/jira/browse/CARBONDATA-537 Project: CarbonData

Re: Some questions about compiling carbondata

2016-12-15 Thread Jacky Li
Hi, You do not need to specify spark.version variable, you can try these: mvn clean package -DskipTests -Pspark-2.0 (to build carbon with spark-2.0.2) mvn clean package -DskipTests (to build carbon with spark-1.5.2, which is default profile) Regards, Jacky -- View this message in context:

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Sorry for the incorrect formatting of previous post. I corrected it in this post. Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do act

[DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread Jacky Li
Hi community, Since CarbonData has global dictionary feature, currently when loading data to CarbonData, it requires two times of scan of the input data. First scan is to generate dictionary, second scan to do actual data encoding and write to carbon files. Obviously, this approach is simple,

[jira] [Created] (CARBONDATA-531) Remove spark dependency in carbon core

2016-12-13 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-531: --- Summary: Remove spark dependency in carbon core Key: CARBONDATA-531 URL: https://issues.apache.org/jira/browse/CARBONDATA-531 Project: CarbonData Issue Type

Re: Hi dev,Apache CarbonData CI now is working for auto-checking all PRs

2016-12-07 Thread Jacky Li
Hi, It is really great, now both default profile and -Pspark-2.0 can get verified for every PR automatically. Thanks for your effort! Regards, Jacky > 在 2016年12月7日,上午9:45,Liang Chen 写道: > > > Hi > > Share the full picture with all of you about Apache CarbonData CI. > ---

[jira] [Created] (CARBONDATA-513) Reduce number of BigDecimal objects for scan

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-513: --- Summary: Reduce number of BigDecimal objects for scan Key: CARBONDATA-513 URL: https://issues.apache.org/jira/browse/CARBONDATA-513 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-512) Reduce number of Timestamp formatter

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-512: --- Summary: Reduce number of Timestamp formatter Key: CARBONDATA-512 URL: https://issues.apache.org/jira/browse/CARBONDATA-512 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-511) Integrate with Spark's TaskMemoryManager

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-511: --- Summary: Integrate with Spark's TaskMemoryManager Key: CARBONDATA-511 URL: https://issues.apache.org/jira/browse/CARBONDATA-511 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-498) Refactor compression model

2016-12-06 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-498: --- Summary: Refactor compression model Key: CARBONDATA-498 URL: https://issues.apache.org/jira/browse/CARBONDATA-498 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-495) Unify compressor interface

2016-12-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-495: --- Summary: Unify compressor interface Key: CARBONDATA-495 URL: https://issues.apache.org/jira/browse/CARBONDATA-495 Project: CarbonData Issue Type: Bug

Re: Why INT type is stored like BIGINT?

2016-12-05 Thread Jacky Li
Hi, As Ravindra explained, when writing to file, INT is stored using least datatype adaptively according to the actual data in the column chunk, it could be byte, or short, or int. But during decoding and encoding, it does use long (bigint) as temporary structure. I am working on a patch to op

[jira] [Created] (CARBONDATA-490) Unify all RDD in carbon-spark and carbon-spark2 module

2016-12-02 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-490: --- Summary: Unify all RDD in carbon-spark and carbon-spark2 module Key: CARBONDATA-490 URL: https://issues.apache.org/jira/browse/CARBONDATA-490 Project: CarbonData

[jira] [Created] (CARBONDATA-487) spark2 integration is not compiling

2016-12-02 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-487: --- Summary: spark2 integration is not compiling Key: CARBONDATA-487 URL: https://issues.apache.org/jira/browse/CARBONDATA-487 Project: CarbonData Issue Type: Bug

[jira] [Created] (CARBONDATA-480) Add file format version enum

2016-12-01 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-480: --- Summary: Add file format version enum Key: CARBONDATA-480 URL: https://issues.apache.org/jira/browse/CARBONDATA-480 Project: CarbonData Issue Type

[jira] [Created] (CARBONDATA-463) Extract spark-common module

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-463: --- Summary: Extract spark-common module Key: CARBONDATA-463 URL: https://issues.apache.org/jira/browse/CARBONDATA-463 Project: CarbonData Issue Type: Sub-task

[jira] [Created] (CARBONDATA-462) Clean up code before moving to spark-common package

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-462: --- Summary: Clean up code before moving to spark-common package Key: CARBONDATA-462 URL: https://issues.apache.org/jira/browse/CARBONDATA-462 Project: CarbonData

[jira] [Created] (CARBONDATA-461) Clean partitioner in RDD package

2016-11-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-461: --- Summary: Clean partitioner in RDD package Key: CARBONDATA-461 URL: https://issues.apache.org/jira/browse/CARBONDATA-461 Project: CarbonData Issue Type: Sub

Re: [Feature Proposal] Spark 2 integration with CarbonData

2016-11-28 Thread Jacky Li
Hi Ramana, Sure, I can work out a subtasks list and put it under CARBONDATA-322 Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3278.html Sent from the Apache Carbon

Re: [Feature ]Design Document for Update/Delete support in CarbonData

2016-11-26 Thread Jacky Li
Hi Aniket, Yes, background monitor process is preferred in the future. And there are other places need this process already, like refreshing the caches in driver and executors. Currently, dictionary caches and index caches are refreshed by checking timestamp in every query, which introduces unnece

[Feature Proposal] Spark 2 integration with CarbonData

2016-11-26 Thread Jacky Li
working in next CarbonData release. What do you think about this idea? All kinds of contribution and suggestions are welcomed. Regards, Jacky Li -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-

[jira] [Created] (CARBONDATA-449) Remove unnecessary log property

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-449: --- Summary: Remove unnecessary log property Key: CARBONDATA-449 URL: https://issues.apache.org/jira/browse/CARBONDATA-449 Project: CarbonData Issue Type

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-24 Thread Jacky Li
+1, and comments inline > 在 2016年11月24日,上午12:09,Venkata Gollamudi 写道: > > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. > https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D%20CARB

[jira] [Created] (CARBONDATA-448) Solve compilation error in core for spark2

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-448: --- Summary: Solve compilation error in core for spark2 Key: CARBONDATA-448 URL: https://issues.apache.org/jira/browse/CARBONDATA-448 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-447) Use Carbon log service instead of spark Logging

2016-11-24 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-447: --- Summary: Use Carbon log service instead of spark Logging Key: CARBONDATA-447 URL: https://issues.apache.org/jira/browse/CARBONDATA-447 Project: CarbonData

[jira] [Created] (CARBONDATA-441) Add module for spark2

2016-11-23 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-441: --- Summary: Add module for spark2 Key: CARBONDATA-441 URL: https://issues.apache.org/jira/browse/CARBONDATA-441 Project: CarbonData Issue Type: Improvement

[jira] [Created] (CARBONDATA-429) Remove unnecessary file name check in dictionary cache

2016-11-21 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-429: --- Summary: Remove unnecessary file name check in dictionary cache Key: CARBONDATA-429 URL: https://issues.apache.org/jira/browse/CARBONDATA-429 Project: CarbonData

Re: [Feature] proposal for update and delete support in Carbon data

2016-11-15 Thread Jacky Li
Hi Vinod, It is great to have this feature, as there were many people asking for data update during the CarbonData meetup earlier. I believe it will be useful for many big data applications. For the solution you proposed, I have following doubts: 1. Data update is complex as if transaction is

Re: Single Pass Data Load Design

2016-11-13 Thread Jacky Li
Hi Ravindra, Thanks for proposing this design. It is really exciting if CarbonData can do 1-pass solution for loading. I have given some comment in the design document. Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Single-P

Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread Jacky Li
+1 binding Regards, Jacky ---Original--- From: "Aniket Adnaik" Date: 2016/11/10 14:43:49 To: "dev";"chenliang613"; Subject: Re: [VOTE] Apache CarbonData 0.2.0-incubating release +1 Regards, Aniket On 9 Nov 2016 3:17 p.m., "Liang Chen" wrote: > Hi all, > > I submit the CarbonData 0.2.0-incu

[jira] [Created] (CARBONDATA-403) add example for data load without using kettle

2016-11-10 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-403: --- Summary: add example for data load without using kettle Key: CARBONDATA-403 URL: https://issues.apache.org/jira/browse/CARBONDATA-403 Project: CarbonData

Re: Use of ANTLR instead of CarbonSqlParser

2016-11-08 Thread Jacky Li
in near future, we can switch > to ANTLR parser at that time as well. > > On Mon, Nov 7, 2016 at 6:59 AM, Jacky Li wrote: > >> Hi, >> >> It is because CarbonData currently is integrated with Spark 1.5/1.6 and >> CarbonContext is based on HiveContext, so i

Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-08 Thread Jacky Li
+1 Regards, Jacky > 在 2016年11月9日,上午9:05,Jay <2550062...@qq.com> 写道: > > +1 > regards > Jay > > > > > -- 原始邮件 -- > 发件人: "向志强";; > 发送时间: 2016年11月9日(星期三) 上午8:59 > 收件人: "dev"; > > 主题: Re: As planed, we are ready to make Apache CarbonData 0.2.0 release: > > >

Re: Use of ANTLR instead of CarbonSqlParser

2016-11-06 Thread Jacky Li
Hi, It is because CarbonData currently is integrated with Spark 1.5/1.6 and CarbonContext is based on HiveContext, so it is based on the Hive parser in HiveContext. But you are right, there is no design limitation about this, Carbon can switch to use ANTLR. I see in Spark 2.0, spark is using AN

Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-03 Thread Jacky Li
The proposed change is reasonable, +1. But is there a plan to make the reader backward compatible with the old format? So the impact to the current deployment is minimum. Regards, Jacky > 在 2016年11月2日,上午12:38,Kumar Vishal 写道: > > Hi Xiaoqiao He, > > Please find the attachment. > > -Re

Re: please vote and comment: remove thrift solution

2016-10-24 Thread Jacky Li
I agree with Ravindra, this is not the best approach. But since this issue block integration with Apache CI, I think it makes sense to solve it quickly and do the best approach later. Without CI automation, it is really a pain to manually trigger the CI for every PR. So, +1 Regards, Jacky >

[jira] [Created] (CARBONDATA-331) Support no compression option while loading

2016-10-20 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-331: --- Summary: Support no compression option while loading Key: CARBONDATA-331 URL: https://issues.apache.org/jira/browse/CARBONDATA-331 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

2016-10-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-318: --- Summary: Implement an ExternalSorter that makes maximum usage of memory while sorting Key: CARBONDATA-318 URL: https://issues.apache.org/jira/browse/CARBONDATA-318

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
Hi, I can offer one more approach for this discussion, since new dictionary values are rare in case of incremental load (ensure first load having as much dictionary value as possible), so synchronization should be rare. So how about using Zookeeper + HDFS file to provide this service. This is w

[jira] [Created] (CARBONDATA-314) Make CarbonContext to use standard Datasource strategy

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-314: --- Summary: Make CarbonContext to use standard Datasource strategy Key: CARBONDATA-314 URL: https://issues.apache.org/jira/browse/CARBONDATA-314 Project: CarbonData

[jira] [Created] (CARBONDATA-313) Update CarbonSource to use CarbonDatasourceHadoopRelation

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-313: --- Summary: Update CarbonSource to use CarbonDatasourceHadoopRelation Key: CARBONDATA-313 URL: https://issues.apache.org/jira/browse/CARBONDATA-313 Project: CarbonData

[jira] [Created] (CARBONDATA-312) Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-312: --- Summary: Unify two datasource: CarbonDatasourceHadoopRelation and CarbonDatasourceRelation Key: CARBONDATA-312 URL: https://issues.apache.org/jira/browse/CARBONDATA-312

[jira] [Created] (CARBONDATA-309) Support two types of ReadSupport in CarbonRecordReader

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-309: --- Summary: Support two types of ReadSupport in CarbonRecordReader Key: CARBONDATA-309 URL: https://issues.apache.org/jira/browse/CARBONDATA-309 Project: CarbonData

[jira] [Created] (CARBONDATA-308) Support multiple segment in CarbonHadoopFSRDD

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-308: --- Summary: Support multiple segment in CarbonHadoopFSRDD Key: CARBONDATA-308 URL: https://issues.apache.org/jira/browse/CARBONDATA-308 Project: CarbonData Issue

[jira] [Created] (CARBONDATA-307) Support full functionality in CarbonInputFormat

2016-10-12 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-307: --- Summary: Support full functionality in CarbonInputFormat Key: CARBONDATA-307 URL: https://issues.apache.org/jira/browse/CARBONDATA-307 Project: CarbonData

Re: [Discussion] Code generation in carbon result preparation

2016-10-12 Thread Jacky Li
Hi Vishal, Which part of the preparation are you considering? The column stitching in the executor side? Regards, Jacky > 在 2016年10月12日,下午9:24,Kumar Vishal 写道: > > Hi All, > Currently we are preparing the final result row wise, as number of columns > present in project list(80 columns) is hig

Re: Discussion regrading design of data load after kettle removal.

2016-10-11 Thread Jacky Li
Hi Ravindra, Regarding the design (https://drive.google.com/file/d/0B4TWTVbFSTnqTF85anlDOUQ5S1BqYzFpLWcwZnBLSVVqSWpj/view), I have following question: 1. In SortProcessorStep, I think it is better to include MergeSort in this step also, so it includes all logic for sorting. In this case, develope

Re: Discussion regrading design of data load after kettle removal.

2016-10-10 Thread Jacky Li
Hi Ravindra, I have following questions: 1. How does DataLoadProcessorStep inteface work? For each step, it will call its child step to execute and apply its logic to the returned iterator of the child? And how does it map to OutputFormat in hadoop interface? 2. This step interface relies on ite

Re: Discussion regrading design of data load after kettle removal.

2016-10-10 Thread Jacky Li
Hi Ravindra, It seems the picture is missing, can you post it in a URL and share the link? Regards, Jacky -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-regrading-design-of-data-load-after-kettle-removal-tp1672p1725.html Sent fr

Re: Discussion about using multi local directorys to improve dataloading perfomance

2016-10-08 Thread Jacky Li
Yes, I think it is a good feature to have. Please feel free to create JIRA issue and Pull Request. Regards, Jacky > 在 2016年10月9日,上午12:04,caiqiang 写道: > > Hi All, > For each dataloading, we write the sorted temp files into only one different > local directory. I think this is a bottle neck o

[jira] [Created] (CARBONDATA-286) Support Append mode when writing Dataframe to CarbonData

2016-10-05 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-286: --- Summary: Support Append mode when writing Dataframe to CarbonData Key: CARBONDATA-286 URL: https://issues.apache.org/jira/browse/CARBONDATA-286 Project: CarbonData

[jira] [Created] (CARBONDATA-285) Use path parameter in Spark datasource API

2016-10-04 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-285: --- Summary: Use path parameter in Spark datasource API Key: CARBONDATA-285 URL: https://issues.apache.org/jira/browse/CARBONDATA-285 Project: CarbonData Issue

  1   2   >