Yes, Ravindra, unsafe sort will be better.
In my last mail, I mentioned a 8-bytes encoded format for RowID + SORT_COLUMNS,
if SORT_COLUMNS are dictionary encoded, I think it is effectively like unsafe
which is only type of byte[8], right? So we can do this by ourselves instead of
depending on 3r
For sorting, I think more optimization we can do, I am currently thinking these:
1. Do not sort the whole TablePage, only KeyPage is required as the sort key
2. Should find a more memory efficient sorting algorithm than System.arraycopy
which requires doubling space.
3. Should try to hold the Ke
Hi,
Using Object[] as a row while loading is not efficient in terms of memory
usage. It would be more efficient to keep them in unsafe as it can keep the
data in more compacted way as per data type.
And regarding sorting it would be good to concentrate on single sorting
solution. Since we already
As I known, System.arrayCopy of object array is a shallow copy, so I think
both KeyPage and TablePage maybe have the same performance on Arrays.sort.
-
Best Regards
David Cai
--
View this message in context:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSS-Da
Hi,
While I am working on data load improvement and encoding override feature, I
found that it is not efficient to use the CarbonRow with Object[]. I think a
better way is to use fix length primitive type instead of Object.
Since currently SORT_COLUMNS is merged, I think it is possible to:
1.
Kunal Kapoor created CARBONDATA-1075:
Summary: Close Dictionary Server when application ends
Key: CARBONDATA-1075
URL: https://issues.apache.org/jira/browse/CARBONDATA-1075
Project: CarbonData
Jacky Li created CARBONDATA-1074:
Summary: Add TablePage for data load process
Key: CARBONDATA-1074
URL: https://issues.apache.org/jira/browse/CARBONDATA-1074
Project: CarbonData
Issue Type:
Hi,
To generate global dictionary CarbonData first scan all input data and
finds unique data for each column and assign dictionary for each value. So
it is two step process. Irrespective of any new unique dictionary values
are added or not it always need to scan all data to get the dictionary.
To
Hi
Thank you shared the test result, and very happy to hear that you already
started to migrate business to CarbonData.
Two suggestions:
1.Can you use the latest release 1.1.0 to test it again, because 1.1.0
introduced V3 format for further improving scan performance(for
example:query 6).
2.As
Hi,community,
Recently our team did some comparative test of CarbonData and parquet. We
choose spark2.1 as the upper-layer processing engine, and the total row
number of test data is 200 billion.
*Environment*
Below is the testing environment infomation:
> Number of nodes: 10, CPU: 48cores, memor
Hi, all:
I have a question, when we should use DictionaryServer?
Congratulations Ravindra 😊
On Sun, May 21, 2017 at 2:59 PM, manish gupta
wrote:
> Congratulations Ravindra..:)
>
> Regards
> Manish Gupta
>
> On Sat, May 20, 2017 at 2:28 PM, Naresh P R
> wrote:
>
> > Congrats Ravindra.
> > ---
> > Regards,
> > Naresh P R
> >
> > On May 19, 2017 4:56 PM, "Liang
Congratulations Ravindra..:)
Regards
Manish Gupta
On Sat, May 20, 2017 at 2:28 PM, Naresh P R
wrote:
> Congrats Ravindra.
> ---
> Regards,
> Naresh P R
>
> On May 19, 2017 4:56 PM, "Liang Chen" wrote:
>
> > Hi all
> >
> > We are pleased to announce that the PMC has invited Ravindra as new
> Ap
13 matches
Mail list logo