Re: [DISCUSS] Data loading improvement

2017-04-22 Thread Liang Chen
Jacky, thank you list these constructive improvements of data loading. Agree to consider all these improvement points, only the below one i have some concerns. Before considering open interfaces for data loading, we need to more clearly define block/blocklet/page which play what different roles, t

Use dev@carbondata.apache.org to test new mailing list without incubator

2017-04-25 Thread Liang Chen

Re: introduce complex data-type for query "Alter table tableName change columnName copyColumnName dataType"

2017-05-05 Thread Liang Chen
+1. Regards Liang 2017-05-03 13:46 GMT+08:00 rahulcarbondata : > Hi all,currently "Alter table tableName change columnName copyColumnName > dataType" query supports only primitive type . I propose it should also > support complex data type . e.g. CREATE TABLE > changecomplexdatatype(arrayField *

Re: [jira] [Created] (CARBONDATA-1030) Support reading specified segment or carbondata file

2017-05-07 Thread Liang Chen
Hi +1 for this feature. How about the DDL script as below : carbon.sql("select * from carbontable in segmentid(0,3,5,7) where filter conditions").show() Regards Liang 2017-05-05 22:33 GMT+08:00 Jin Zhou (JIRA) : > Jin Zhou created CARBONDATA-1030: > > >

Re: [jira] [Created] (CARBONDATA-1030) Support reading specified segment or carbondata file

2017-05-07 Thread Liang Chen
Hi +1 for this feature. How about the DDL script as below : carbon.sql("select * from carbontable in segmentid(0,3,5,7) where filter conditions").show() Regards Liang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/jira-Created-CARBO

Please send usage's (questions and discussions) to u...@carbondata.apache.org

2017-05-10 Thread Liang Chen
Hi all As you know , CarbonData community growing well, there are more and more users. >From now on , let us separate dev mailinglist and user mailing list. Please send all usage 's (questions and discussions) to u...@carbondata.apache.org. only dev questions and discussions send to dev@carbonda

Re: test email

2017-05-10 Thread Liang Chen
ACK 2017-05-11 2:00 GMT+08:00 Aniket Adnaik : > > -- Regards Liang

Re: [VOTE] Apache CarbonData 1.1.0 (RC3) release

2017-05-12 Thread Liang Chen
+1(binding) LICENSE,NOTICE are ok no binary file compile is ok with spark 1.6 and 2.1 *mvn clean -Pspark-1.6 package* [INFO] Apache CarbonData :: Parent SUCCESS [ 1.520 s] [INFO] Apache CarbonData :: Common SUCCESS [ 2.546 s] [INFO] Apache Carbo

Re: [jira] [Created] (CARBONDATA-1051) why sort_columns?

2017-05-13 Thread Liang Chen
Hi Sehriff Good question. First, please check this doc: http://carbondata.apache.org/useful-tips-on-carbondata.html, see if can help you to understand CarbonData's index usage. Like you mentioned that 1.2 will introduce sort columns feature to help users to more easily specify which columns need

Re: why sort_columns?

2017-05-14 Thread Liang Chen
I have replied this question in another topic session as below : First, please check this doc: http://carbondata.apache.org/ useful-tips-on-carbondata.html, see if can help you to understand CarbonData's index usage. Like you mentioned that 1.2 will introduce sort columns feature to help users to

Re: Compilation error on presto Branch

2017-05-15 Thread Liang Chen
Hi Pallavi Let me take a look. you are right, this is jar dependency issue, need to use the new version jars(without incubating) Regards Liang 2017-05-15 1:39 GMT-07:00 Pallavi Singh : > We are getting the following error > Error:(611, 11) java: constructor BTreeDataRefNodeFinder in class > o

Re: Request for subscription to Carbondata Community

2017-05-15 Thread Liang Chen
Hi First please send one mail to dev-subscr...@carbondata.apache.org for joining mailing list group. After you join mailing list group, please send your question again to dev@carbondata.apache.org 2017-05-15 0:45 GMT-07:00 Ramandeep Kaur : > Kindly subscribe me to Carbondata Community. > > Th

Re: [DISCUSSION] Encoding override and extensibility

2017-05-16 Thread Liang Chen
Hi This is a great discussion for further making "encoding functions" easier use. Expose all these options to users for different business cases, this is good.But to be frank, it is difficult for general users to understand all options and do an exact configuration. So we need to consider more a

Re: [DISCUSSION] CarbonData storage service

2017-05-16 Thread Liang Chen
Hi jacky One question : Can you explain that proposed CarbonData Storage Service would store what information? For users how to pre-configure memory resource for the service? as big as possible memory? --

Fwd: MODERATE for dev@carbondata.apache.org

2017-05-16 Thread Liang Chen
First please send one mail to dev-subscr...@carbondata.apache.org for joining mailing list group. After you join mailing list group, please send your question again to dev@carbondata.apache.org Regards Liang -- Forwarded message -- From: Date: 2017-05-16 8:39 GMT-04:00 Subjec

Re: [DISCUSSION] Encoding override and extensibility

2017-05-16 Thread Liang Chen
options. For example, if user does not set encoding option for > high cardinality dimension column, carbon will use default encoding which > is LV_BYTES_ENCODE for this column. > > Regards, > Jacky > > > 在 2017年5月16日,下午5:54,Liang Chen 写道: > > > > Hi > > >

[ANNOUNCE] Cai Qiang as new Apache CarbonData committer

2017-05-17 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Cai Qiang as new Apache CarbonData committer, and the invite has been accepted ! Congrats to Cai Qiang and welcome aboard. Regards Liang

Re: [Discussion] Minimize the Btree size and unify the driver and executor Btrees.

2017-05-17 Thread Liang Chen
Hi Ravi Thank you bringing this improvement discussion to mailing list. One question , the point1 how to solve the below issues ? there are still two part index info in driver and executor side ?

[ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-19 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Ravindra as new Apache CarbonData PMC member, and the invite has been accepted ! Congrats to Ravindra and welcome aboard. Thanks The Apache CarbonData team

Re: Comparative testing of CarbonData and Parquet

2017-05-21 Thread Liang Chen
Hi Thank you shared the test result, and very happy to hear that you already started to migrate business to CarbonData. Two suggestions: 1.Can you use the latest release 1.1.0 to test it again, because 1.1.0 introduced V3 format for further improving scan performance(for example:query 6). 2.As

Re: Logging problem

2017-05-25 Thread Liang Chen
Hi Rana Your this query is in Spark-shell ? Please try the below script: import org.apache.log4j.Logger import org.apache.log4j.Level Logger.getLogger("org").setLevel(Level.OFF) Logger.getLogger("akka").setLevel(Level.OFF) Regards Liang Rana Faisal Munir wrote > Hi, > > Today, I was running a

[INFORM]Hive and Presto branch have been merged into master ,please rebase your related PRs.

2017-05-26 Thread Liang Chen
Hi dev Hive and Presto branch have been merged into master ,please rebase your related PRs. Regards Liang

Re: when plan to implemnt merge operation

2017-05-26 Thread Liang Chen
Hi 1. Can you give a specific example, let us first understand your requirement exactly. Like below, to provide some fact data. ID date name age 1 2017-05-1carbon 21 2 2017-05-23 spark 30 .. 2. I would like to kindly invite your team guys to participate

Re: Logging problem

2017-05-26 Thread Liang Chen
Hi Rana Please let us know if your issue be solved? Regards Liang 2017-05-25 20:38 GMT+08:00 Liang Chen : > Hi Rana > > Your this query is in Spark-shell ? > Please try the below script: > > import org.apache.log4j.Logger > import org.apache.log4j.Level > Logger.g

Re: Compilation error on presto Branch

2017-05-26 Thread Liang Chen
ue and build failure. Please check, below is the PR link: > > https://github.com/apache/carbondata/pull/941 > > On Mon, May 15, 2017 at 2:37 PM, Liang Chen > wrote: > > > Hi Pallavi > > > > Let me take a look. you are right, this is jar dependency issue, need to

Re: Delete ERROR

2017-05-26 Thread Liang Chen
Hi sunerhan I tested at my local machine, can delete more than 1000 rows at one batch. Need to reproduce the error : ERROR deleteExecution$: main Delete data operation is failed due to failure in creating delete delta file for segment : null block : null Regards Liang 2017-05-23 11:52 GMT+08

Re: Implementing Streaming Ingestion Support in CarbonData

2017-05-26 Thread Liang Chen
Hi Aniket and Prabhat Thanks for you guys started streaming ingestion feature. I have created one branch with name "streaming_ingest". Suggest creating an independent module under integration/streaming. Regards Liang 2017-05-22 15:54 GMT+08:00 prabhatkashyap : > Hi Aniket, > > We have started

Re: Compilation error on presto Branch

2017-05-27 Thread Liang Chen
Hi This issue has been solved at master. Please check it again. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Compilation-error-on-presto-Branch-tp12673p13325.html Sent from the Apache CarbonData Dev Mailing List archive

Re: when plan to implemnt merge operation

2017-05-29 Thread Liang Chen
Hi For your this case, use delete and append whether can meet your requirements? Obviously , merge would impact index, so we should find out one best way to implement this feature. please other people give some comment also. Regards Liang 2017-05-27 9:45 GMT+08:00 Mic Sun : > merge example l

Re: Carbondata hive integration Plan

2017-06-01 Thread Liang Chen
Hi cenyuhai Thanks for you started this discussion about hive integration: 1、Make carbon schema compatible with hive(CARBONDATA-1008)(create table and alter table) Liang: Like you mentioned, for first phase(1.2.0), supports read carbondata files in hive. so can i understand the flow sh

Re: [INFO] Jenkins is fixed and GitHub/Jenkins integration is back

2017-06-02 Thread Liang Chen
Yes, now it is working fine. thanks for your help ,JB. Regards Liang 2017-06-02 20:44 GMT+08:00 Jean-Baptiste Onofré : > Hi team, > > I fixed the issue we got on the Apache Jenkins CarbonData jobs. > > We used "Maven (latest)" for our build, and we just got the last Maven > 3.5.0 update, which s

Re: [DISCUSSION] Whether Carbondata should keep carbon-sql-shell script

2017-06-06 Thread Liang Chen
Hi correct the file name, should be : ./bin/carbon-spark-sql and ./bin/carbon-spark-shell. Are you suggesting removing both file or only carbon-spark-shell ? Regards Liang 2017-06-07 0:24 GMT+08:00 Erlu Chen : > Hi community, > > Recently, I viewed the implementation of carbon-sql-shell and tr

Re: [DISCUSSION] Whether Carbondata should support Spark-2.2 in the next release version(1.2.0)

2017-06-09 Thread Liang Chen
Hi My vote for Apache CarbonData 1.2 supports Spark2.1.1 Regards Liang 2017-06-09 17:24 GMT+08:00 Ravindra Pesala : > Hi, > > I think it would be better support 2.1.1 in Carbon 1.2 version. Since Spark > 2.2.0 is not yet released so we can better wait and support it in Carbon > 1.3 version. >

Fwd: JIRA upgrade / Downtime notification 12 June 0100 UTC

2017-06-10 Thread Liang Chen
Hi all Please be informed about JIRA upgrade this weekend. Regards Liang -- Forwarded message -- From: Chris Lambertus Date: 2017-06-10 12:28 GMT+08:00 Subject: JIRA upgrade / Downtime notification 12 June 0100 UTC To: committers Cc: ASF Operations , ASF Executive Assistant < e

Re: About ColumnGroup feature

2017-06-10 Thread Liang Chen
Hi +1 for removal ColumnGroup. it would be helpful to simplify the current system code. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/About-ColumnGroup-feature-tp14436p14458.html Sent from the Apache CarbonData Dev Maili

[VOTE] Presto integration version :Re: [DISCUSSION] Whether Carbondata should work with Presto in the next release version(1.2.0)

2017-06-13 Thread Liang Chen
Hi +1 for supporting presto integration. I propose to support 0.166 to match some community users(Ctrip) which already be used in production, please dev vote the presto version also. Regards Liang 2017-06-12 13:56 GMT+08:00 Bhavya Aggarwal : > Hi All, > > We can add the Presto integration as o

[DISCUSSION] In 1.2.0, use Spark 2.1 and Hadoop 2.7.2 as default compilation in pom.

2017-06-15 Thread Liang Chen
Hi Dev In 1.2, there are many features developing based on Spark 2.1 and Hadoop 2.7.2, so i propose to use Spark2.1 and Hadoop 2.7.2 as default compilation in pom. Please discuss and vote. Regards Liang

Re: update bug with carbondata1.1.0 and spark1.6.0

2017-06-19 Thread Liang Chen
Hi Just i copied your code and tested it at my local machine , the result as below. Because there are no any "india" value in name column, so don't do any update. Please check it again. BTW, suggest you do test at branch-1.1, because the community is testing branch-1.1 for preparing 1.1.1 patch

Re: update bug with carbondata1.1.0 and spark1.6.0

2017-06-19 Thread Liang Chen
Hi Correct my info, can do update as below , it is successful. +---+-++---+ | id| name|city|age| +---+-++---+ | 10|india|city| 10| +---+-++---+ Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/updat

[DISCUSSION] Propose to move notification of "jira Created" to issues@mailing list from dev

2017-06-29 Thread Liang Chen
Hi dev As you know, currently when each Apache JIRA created, will send one notification mail to dev@carbondata.apache.org like "[jira] [Created] (CARBONDATA-1250) Change default partition id from Max to 0". For facilitating community users to read discussion info of dev@mailing list, propose to m

Re: [DISCUSSION] CarbonData Integration with Presto

2017-07-01 Thread Liang Chen
Hi Bhavya Currently, 1.2.0 propose to support presto version with 0.166. Is there any performance difference between 0.179 and 0.166? Regards Liang 2017-07-01 13:12 GMT+08:00 Bhavya Aggarwal : > Hi, > > Please find the configuration setting that we used attached with this > email , we are runn

Re: [Discussion] CarbonOutputFormat Implementation

2017-07-05 Thread Liang Chen
Hi +1 for supporting OutputFormat. Regards Liang Divya Gupta wrote > Thanks Jacky and Venkata for the suggestions. I am working on the design > part and will post on this discussion in case of any queries. I will share > the design soon. > > Regards > Divya Gupta > Project Lead > > > *Knoldu

Re: XOR encoding for floating point

2017-07-05 Thread Liang Chen
Hi Geetika Very happy to see that you are interested in contributing this feature. Please have the design discussion before you start to code. Regards Liang Geetika Gupta wrote > Hi Community, > > I was looking into CARBONDATA-1128 > ;. Th

Re: Why is slower that build ChunkRowIterator object in presto plugin of carbondata?

2017-07-05 Thread Liang Chen
Hi In Spark-shell, you can use the below script : import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.constants.CarbonCommonConstants CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER, "true") CarbonProperties.getInstance

[VOTE] Apache CarbonData 1.1.1(RC1) release

2017-07-06 Thread Liang Chen
Hi I submit the Apache CarbonData 1.1.1 (RC1) to your vote. 1.Release Notes: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12340313 Some key improvement in this patch release: 1) Data update and delete with Spark 2.1 2) Improve measure filter perfo

[DISCUSSION] Propose to remove "support spark 1.5" from CarbonData 1.2.0 onwards

2017-07-08 Thread Liang Chen
Hi Dev As you know, some key features only be supported by Spark 1.6 and Spark 2.1, for example : update and delete. so i propose to remove "support spark 1.5" from CarbonData 1.2.0 onwards, please discuss and vote. Regards Liang

Re: [VOTE] Apache CarbonData 1.1.1(RC1) release

2017-07-08 Thread Liang Chen
+1(binding) Regards Liang 2017-07-09 1:22 GMT+08:00 Jacky Li : > +1 (binding) > > > 在 2017年7月8日,上午7:46,bill.zhou 写道: > > > > +1 > > > > > > Liang Chen-2 wrote > >> Hi > >> > >> I submit the Apache CarbonData 1.1.1 (RC1) to

Re: [DISCUSSION] Propose to remove "support spark 1.5" from CarbonData 1.2.0 onwards

2017-07-09 Thread Liang Chen
with spark 1.5 and spark 1.6 almost share > the same code. Is there any overhead or difficulties to maintain spark 1.5 > integration code onward? > > Regards, > Jacky > > > 在 2017年7月8日,下午11:44,Liang Chen 写道: > > > > Hi Dev > > > > As you know, some

[RESULT]Re: [VOTE] Apache CarbonData 1.1.1(RC1) release

2017-07-10 Thread Liang Chen
Hi all PMC vote has passed for Apache Carbondata 1.1.1 release, the result as below: +1(binding) : 4(Liang,Jacky,Ravindra, Jarray) +1(non-binding) : 6 Thanks all for your vote. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabb

Re: [branch-1.1] delete problem

2017-07-14 Thread Liang Chen
Hi Ashwini K added a comment - 2 days ago delete is working fine for me . could you please share your table schema and data file you are using ? -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/branch-1-1-delete-problem-tp18092p18311.html

Re: [1.2.0-SNAPSHOT]-delete problem

2017-07-14 Thread Liang Chen
Hi Please provide your test script, this will help us to reproduce your issue. otherwise, we could not know what is your exact problem. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/1-2-0-SNAPSHOT-delete-problem-tp18090

Re: FileNotFoundExceptions while running CarbonData

2017-07-18 Thread Liang Chen
Hi Swapnil Very look forward to seeing your PR. Please let me know your Apache JIRA email id, i will add the contributor right for you. Regards Liang 2017-07-18 6:49 GMT+08:00 Swapnil Shinde : > Thanks. I think I fixed it support maprFS. I will do some more testing and > then add a jira ticket

Re: [Discussion] Using Lazy Dictionary Decode for Presto Integration

2017-07-18 Thread Liang Chen
+1, use the laze decode to utilize carbondata's dictionary, it would improve aggregation performance. Please consider adding these code to presto integration module, don't directly reuse spark module code. Regards Liang 2017-07-18 23:46 GMT+08:00 Bhavya Aggarwal : > We were trying the Presto wit

Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
Hi Below are some proposed items for Presto optimization: 1) Remove the extra loops for data conversion in Presto Format to increase the performance. 2) Modularize and optimize the filters . 3) Optimize the Carbondata Metadata reading. 4) Lazy decoding of the dictionary. 5) Batch reading of the

Re: Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
Hi For -- 4) Lazy decoding of the dictionary, just i tested 180 millions rows data with the script: "select province,sum(age),count(*) from presto_carbondata group by province order by province" Spark integration module has "dictionary lazy decode", presto doesn't have "dictionary lazy decode",

Re: Presto+CarbonData optimization work discussion

2017-07-19 Thread Liang Chen
ether it is really a lazy decoding issue or > not. > > Regards, > Ravindra > > On 20 July 2017 at 08:04, Liang Chen wrote: > > > Hi > > > > For -- 4) Lazy decoding of the dictionary, just i tested 180 millions > rows > > data with the script: >

Re: [question] about new table property "sort_column"

2017-07-20 Thread Liang Chen
Hi Jin zhou Yes, your understanding is correct. The MDK(multi-dimension index) will be created as per your specified sort_columns order. Regards Liang 2017-07-21 10:51 GMT+08:00 Jin Zhou : > > Hi,all > > I notice there is a new table property: sort_column and want to confirm: > > 1) when a NON-

[ANNOUNCE] Apache CarbonData 1.1.1 release

2017-07-20 Thread Liang Chen
Hi All, The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.1.1. This release(1.1.1) is a patch, some key improvements and bug fix as below : - Data update and delete with Spark 2.1. - Improve measure filter performance by ~2-4 times. - Some

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Swapnil Actually, current system's behavior is : Index and dictionary encoding are decoupled, no relationship. 1. If you want to make some columns have good filter , just add these columns to sort_columns (like tblproperties('sort_columns'='empno')), to build good MDX index for these columns

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Some more info : In release 1.1.1, there was a good improvement "measure filter optimization", system will use minmax index to do filter for measure column filter. So for INT Regards Liang 2017-07-22 9:22 GMT+08:00 Liang Chen : > Hi Swapnil > > Actually, current s

Re: carbon data performance doubts

2017-07-21 Thread Liang Chen
Hi Some more info : In release 1.1.1, there was a good improvement "measure filter optimization", system will use minmax index to do filter for measure column filter. So for INT column to get good filter: one way you can add the INT column to sort_columns, another way, system will automati

Re: carbon data performance doubts

2017-07-23 Thread Liang Chen
Hi simafengyun Can you write a example to introduce how to use sort_columns and update the documents also, thanks. Regards Liang -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/carbon-data-performance-doubts-tp18438p18703.html Sent from

Re: Can I set a larger HDFS block size, like 4 or 8 GB in production environment? What is the problem with large blocks?

2017-08-08 Thread Liang Chen
Hi, In theory, it should support But practically, 1. It may take long time to replicate in case any of the replica is lost/moved due to balancer/mover/replication 2. In case of pipeline recoveries during write/append, if new node is replaced the failed node, then existing data will be copie

Re: [DISCUSSION] Interfaces for index frame work

2017-08-14 Thread Liang Chen
Hi Nice feature, +1. -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Interfaces-for-index-frame-work-tp13274p20217.html Sent from the Apache CarbonData Dev Mailing List archive mailing list archive at Nabble.com.

Re: [DISCUSSION] About partition table query performance

2017-08-17 Thread Liang Chen
Hi +1.Very nice feature, Thanks for your good contribution. Look forward to seeing the test report. Regards Liang lionel061201 wrote > Hi dev, > Partition feature is now available on master and I just created a guidance > doc in > https://github.com/apache/carbondata/pull/1258 > > I added some

Re: ClassNotFound error when insert carbontable from hive table

2017-08-22 Thread Liang Chen
Hi lionel Can you share with us how did you fix this issue? Regards Liang lionel061201 wrote > This issue had been fixed. > > On Mon, Aug 21, 2017 at 4:04 PM, Lu Cao < > whucaolu@ > > wrote: > >> Hi dev, >> >> I'm trying to insert data from a hive table to carbon table: >> >> cc.sql("insert

Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-23 Thread Liang Chen

Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-23 Thread Liang Chen
-- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Apache-CarbonData-6th-meetup-in-Shanghai-on-2nd-Sep-2017-at-https-jinshuju-net-f-X

[ANNOUNCE] Manish Gupta as new Apache CarbonData

2017-08-25 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Manish Gupta as new Apache CarbonData committer, and the invite has been accepted ! Congrats to Manish Gupta and welcome aboard. Regards The Apache CarbonData PMC

Re: [ANNOUNCE] Manish Gupta as new Apache CarbonData committer

2017-08-25 Thread Liang Chen
Correct the title , to add "committer" info. 2017-08-25 23:56 GMT+08:00 Liang Chen : > Hi all > > We are pleased to announce that the PMC has invited Manish Gupta as new > Apache CarbonData committer, and the invite has been accepted ! > > Congrats to Manish Gupta and

Re: Apache CarbonData 6th meetup in Shanghai on 2nd Sep,2017 at : https://jinshuju.net/f/X8x5S9?from=timeline

2017-08-30 Thread Liang Chen
Hi Ohh , Really? a big big welcome! Regards Liang Jean-Baptiste Onofré wrote > Awesome. > > I would love to be there. Let me check if I can. > > Regards > JB > > On Aug 23, 2017, 08:48, at 08:48, Liang Chen < > chenliang6136@ > > wrote: >>

Re: Presto+CarbonData optimization work discussion

2017-09-01 Thread Liang Chen
QC | 57467886 | 1385076 SK | 57385152 | 1382364 YT | 57377556 | 1383900 (13 rows) Query 20170902_033821_6_h6g24, FINISHED, 1 node Splits: 50 total, 50 done (100.00%) 0:03 [18M rows, 0B] [6.62M rows/s, 0B/s] Regards Liang Liang Chen wrote > Hi > > For -- 4) Lazy dec

Re: Block B-tree loading failed

2017-09-13 Thread Liang Chen
Hi Looks that the path is invalid, can you provide full script: how you created carbonsession? - Caused by: org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid carbon data file: hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_c

[ANNOUNCE] Lu Cao as new Apache CarbonData committer

2017-09-13 Thread Liang Chen
Hi all We are pleased to announce that the PMC has invited Lu Cao as new Apache CarbonData committer, and the invite has been accepted ! Congrats to Lu Cao and welcome aboard. Regards The Apache CarbonData PMC -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130

Re: MODERATE for dev@carbondata.apache.org

2017-09-16 Thread Liang Chen
Hi First please send one mail to dev-subscr...@carbondata.apache.org for joining mailing list group. After you join mailing list group, please send your question again to dev@carbondata.apache.org Please provide the error log message, and your create table script. Regards Liang 2017-09-15 1

Re: [VOTE] Apache CarbonData 1.2.0(RC2) release

2017-09-18 Thread Liang Chen
Hi I think you may input the wrong description "apache-carbondata-1.2.0-rc1"? 2. The tag to be voted upon : apache-carbondata-1.2.0-rc1(commit: ede03f5c963b13cc640feba799a22466246951c6) *https://github.com/apache/carbondata/relea

Re: carbondata 加载数据问题咨询

2017-09-18 Thread Liang Chen
Hi I have the same comments as cenyuhai, please provide more detail info, which version you used? Please refer to https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md, for high cardinality columns, you can use script like TBLPROPERTIES ('DICTIONARY_EXCLUDE'='MSISD

Re: [VOTE] Apache CarbonData 1.2.0(RC2) release

2017-09-18 Thread Liang Chen
Hi 1.Source code can be compiled successfully with script "mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format package" 2.Can query carbondata file properly in Spark-shell. 3.License file looks good. 4.Signature file looks good 5.Hash checksum files look good 6.NOTICE file looks good My vote :

Re: [VOTE] Apache CarbonData 1.2.0(RC3) release

2017-09-23 Thread Liang Chen
1.Source code can be compiled successfully with script "mvn clean -DskipTests -Pspark-2.1 -Pbuild-with-format package" ​ 2.Can query carbondata file properly in Spark-shell. 3.License file looks good. 4.Signature file looks good 5.Hash checksum files look good 6.NOTICE file looks good My vote :

Re: [DISCUSSION] optimization of OrderBy sorted columns + Limit Query

2017-09-29 Thread Liang Chen
Hi Jarck Did this solution use dictionary to do limit , right ? this solution can't make sure the data correctness --- Use orderby +limit optimized carbondata1.2 master code + spark1.6.3 @Ravindra @Jarck : let us discus

[DISCUSSION] Apache CarbonData 1.3.0 scope

2017-09-29 Thread Liang Chen
Hi all First , on behalf of Apache CarbonData community, thanks for all contributors who are from 20+ different organizations. This mail is for discussing 1.3.0 scope (around 3-4 months), i propose the following feature can be considered. 1)Spark 2.2.0 integration (propose committer Ravindra to

Re: [DISCUSSION] Apache CarbonData 1.3.0 scope

2017-10-10 Thread Liang Chen
Hi yuhai I have same comment as Jacky,please provide more info about this requirement. It would be better if you could create a new topic to detailedly discuss this requirement. Regards Liang Jacky Li wrote > Hi Cenyuhai, > > Can you further describe your requirement? Currently carbon supports

Re: [DISCUSSION] support user specified segment reading for query

2017-10-11 Thread Liang Chen
Hi Rahul I suggest only doing "Query HINT". Please finalize the query script : select * from t1 [in SEGMENTS(1,3,5)] or SELECT /*+SEGMENTS(1,3,5) */ from t1 Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Does index be used when doing "join" operation between a big table and a small table?

2017-10-11 Thread Liang Chen
Hi If the index be used for filtering data, the number of tasks would be more less. Can you share the script(create table and query), let us check if created the effective index for filter columns. Regards Liang Mic Sun wrote > hello, > > I have 2 tables need to do "join" operation by their

Re: Does index be used when doing "join" operation between a big table and a small table?

2017-10-11 Thread Liang Chen
If the index be used, the number of tasks would be less. Can you share your script (create table script and query script), let us check if you created the effective index for filter columns. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[DISCUSSION] Optimize the default value for some parameters

2017-10-11 Thread Liang Chen
Hi All As you know, some default value of parameters need to adjust for most of cases, this discussion is for collecting which parameters' default value need to be optimized: 1. TABLE_BLOCKSIZE: current default is 1G, propose to adjust to 512M 2. Please append at here if you propose to adjust

Re: [DISCUSSION] Support only spark 2 in carbon 1.3.0

2017-10-14 Thread Liang Chen
Hi lionel As per mailing list discussion result, no objection. so can you create an umbrella jira to remove spark 1.5 & 1.6 code in 1.3.0. Regards Liang lionel061201 wrote > Hi community, > Currently we have three spark related module in carbondata(spark 1.5, 1.6, > 2.1), the project has becom

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-14 Thread Liang Chen
Hi Jacky Thanks for you started this discussion, this is a great feature in carbondata. One question: For sub_jar "Handle alter table scenarios for aggregation table", please give more detail info. Just i viewed the pdf attachment as below, looks no need to do any handles for agg table if users d

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-10-16 Thread Liang Chen
uild pre-aggregate table as >>> update scenario” >>> User need to drop the associated aggregate table and perform alter >>> table, >>> or data update/delete, or delete segment operation, then he can create >>> the >>> pre-agg table using CT

Re: Encountered some problems when querying data

2017-10-16 Thread Liang Chen
Hi Can you raise an apache JIRA at : https://issues.apache.org/jira/projects/CARBONDATA and provide the test data and script, need to reproduce this issue. Regards Liang 刘feng wrote > Hello,dev > >1,When using the ‘like’query in sql, I found a bug. > > E.g: select ake005,count(1) from ca_

Re: Query failed after "update" statement interruptted

2017-10-16 Thread Liang Chen
Hi Can you provide the full script? what is your update script? how to reproduce ? Regards Liang yixu2001 wrote > dev > > On the process of "update" statement execution, interruption happened. > After that, the "select" statement failed. > Sometimes the "select" statement will recover to s

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

2017-10-19 Thread Liang Chen
Hi Execute the below query, return one row record or multiple row records ? - select a.remark from c_indextest1 a where a.id=b.id Regards Liang yixu2001 wrote > dev > You can follow the steps below to reproduce the problem. > tables c_indextest2 has 1700w rec

Re: [Discussion] Carbon Store abstraction

2017-10-20 Thread Liang Chen
Hi Thank you started this discussion. agree, for exposing the clear interface to users, there are some optimization works. Can you list the more detail about your proposal? for example: what class you propose to move to carbon store, what api you propose to create and expose to users. I suggest

Re: [Discussion] Merging carbonindex files for each segments and across segments

2017-10-20 Thread Liang Chen
+1 for this proposal and solution, thanks, Ravi Regards Liang 2017-10-20 19:13 GMT+05:30 Ravindra Pesala : > Hi, > > Problem : > The first-time query of carbon becomes very slow. It is because of reading > many small carbonindex files and cache to the driver at the first time. > Many carbonind

Re: [Disscussion] Support Streaming Ingest

2017-10-21 Thread Liang Chen
Hi One question: Why not supports structured streaming to replace spark streaming ? --- In first phase implementation, it should support kafka and spark streaming integration. More streaming framework support is preferable in the future. Regards Liang

Re: [DISCUSSION] Optimize the default value for some parameters

2017-10-26 Thread Liang Chen
gt; property for blocklet size to configure while creating a table. > > Regards, > Ravindra. > > On 11 October 2017 at 13:36, Liang Chen < > chenliang6136@ > > wrote: > >> Hi All >> >> As you know, some default value of parameters need to adju

Re: [Discussion]support user specified segments in major compation

2017-10-26 Thread Liang Chen
Hi Jin Zhou Thanks for starting this discussion. 1. For your first proposal : Currently , segment is the system internal concept, not expose to outside. Can you provide what exact problems do you encounter? we can find the alternative solution for your problems. --

Re: [Discussion] Merging carbonindex files for each segments and across segments

2017-10-26 Thread Liang Chen
Yes, Jin Zhou. Merge all index files to one in a segment would be useful feature. it would significantly improve query performance. Regards Liang Jin Zhou wrote > Hi, ravipesala > > Thank you for your proposal, merging index file is a very useful feature > as > we have already met serious perfo

Re: [PROPOSAL] Tag Pull Request with feature tag

2017-10-28 Thread Liang Chen
+1, agree with this proposal. Regards Liang -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: After MAJOR index lost

2017-11-01 Thread Liang Chen
Hi Yes, checked the log message, looks have some issues. Can you share the reproduce steps: Did you use how many machines to do data load, and load how many times? Regards Liang yixu2001 wrote > dev > environment spark.2.1.1 carbondata 1.1.1 hadoop 2.7.2 > > run ALTER table e_carbon.prod

  1   2   3   >