Re: [DISCUSS] Data loading improvement

2017-05-21 Thread Jacky Li
Yes, Ravindra, unsafe sort will be better. In my last mail, I mentioned a 8-bytes encoded format for RowID + SORT_COLUMNS, if SORT_COLUMNS are dictionary encoded, I think it is effectively like unsafe which is only type of byte[8], right? So we can do this by ourselves instead of depending on 3r

Re: [DISCUSS] Data loading improvement

2017-05-21 Thread Jacky Li
For sorting, I think more optimization we can do, I am currently thinking these: 1. Do not sort the whole TablePage, only KeyPage is required as the sort key 2. Should find a more memory efficient sorting algorithm than System.arraycopy which requires doubling space. 3. Should try to hold the Ke

Re: [DISCUSS] Data loading improvement

2017-05-21 Thread Ravindra Pesala
Hi, Using Object[] as a row while loading is not efficient in terms of memory usage. It would be more efficient to keep them in unsafe as it can keep the data in more compacted way as per data type. And regarding sorting it would be good to concentrate on single sorting solution. Since we already

Re: [DISCUSS] Data loading improvement

2017-05-21 Thread David CaiQiang
As I known, System.arrayCopy of object array is a shallow copy, so I think both KeyPage and TablePage maybe have the same performance on Arrays.sort. - Best Regards David Cai -- View this message in context: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSS-Da

Re: [DISCUSS] Data loading improvement

2017-05-21 Thread Jacky Li
Hi, While I am working on data load improvement and encoding override feature, I found that it is not efficient to use the CarbonRow with Object[]. I think a better way is to use fix length primitive type instead of Object. Since currently SORT_COLUMNS is merged, I think it is possible to: 1.

[jira] [Created] (CARBONDATA-1075) Close Dictionary Server when application ends

2017-05-21 Thread Kunal Kapoor (JIRA)
Kunal Kapoor created CARBONDATA-1075: Summary: Close Dictionary Server when application ends Key: CARBONDATA-1075 URL: https://issues.apache.org/jira/browse/CARBONDATA-1075 Project: CarbonData

[jira] [Created] (CARBONDATA-1074) Add TablePage for data load process

2017-05-21 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1074: Summary: Add TablePage for data load process Key: CARBONDATA-1074 URL: https://issues.apache.org/jira/browse/CARBONDATA-1074 Project: CarbonData Issue Type:

Re: Questions about Dictionnary Server

2017-05-21 Thread Ravindra Pesala
Hi, To generate global dictionary CarbonData first scan all input data and finds unique data for each column and assign dictionary for each value. So it is two step process. Irrespective of any new unique dictionary values are added or not it always need to scan all data to get the dictionary. To

Re: Comparative testing of CarbonData and Parquet

2017-05-21 Thread Liang Chen
Hi Thank you shared the test result, and very happy to hear that you already started to migrate business to CarbonData. Two suggestions: 1.Can you use the latest release 1.1.0 to test it again, because 1.1.0 introduced V3 format for further improving scan performance(for example:query 6). 2.As

Comparative testing of CarbonData and Parquet

2017-05-21 Thread Jin Zhou
Hi,community, Recently our team did some comparative test of CarbonData and parquet. We choose spark2.1 as the upper-layer processing engine, and the total row number of test data is 200 billion. *Environment* Below is the testing environment infomation: > Number of nodes: 10, CPU: 48cores, memor

Questions about Dictionnary Server

2017-05-21 Thread Sea
Hi, all: I have a question, when we should use DictionaryServer?

Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-21 Thread Mohammad Shahid Khan
Congratulations Ravindra 😊 On Sun, May 21, 2017 at 2:59 PM, manish gupta wrote: > Congratulations Ravindra..:) > > Regards > Manish Gupta > > On Sat, May 20, 2017 at 2:28 PM, Naresh P R > wrote: > > > Congrats Ravindra. > > --- > > Regards, > > Naresh P R > > > > On May 19, 2017 4:56 PM, "Liang

Re: [ANNOUNCE] Ravindra as new Apache CarbonData PMC

2017-05-21 Thread manish gupta
Congratulations Ravindra..:) Regards Manish Gupta On Sat, May 20, 2017 at 2:28 PM, Naresh P R wrote: > Congrats Ravindra. > --- > Regards, > Naresh P R > > On May 19, 2017 4:56 PM, "Liang Chen" wrote: > > > Hi all > > > > We are pleased to announce that the PMC has invited Ravindra as new > Ap