How carbondata handle greater than with global dict column?

2018-11-06 Thread carbondata-newuser
For example: version column is a dict column explain select A from test_carbondata.table where date='2018-09-05' and version >= "1.8.5" ; | == Physical Plan == *(1) CarbonDictionaryDecoder [test_carbondata_m_device_distinct_for_bdindex], ExcludeProfile(ArrayBuffer()),

Is carbondata replace global dictionary in driver site?

2018-10-26 Thread carbondata-newuser
For example. Assume column A has a global dictionary encoding, and the dictionary is { "A": 1, "B": 2, "C": 3 } executor 1 return the result [1,2,3] Finally the driver will replace the 1 to 'A', 2 to "B"? The replace occurs in driver not executor? If so, I

Why not support global sort in partition table?

2018-07-19 Thread carbondata-newuser
Such table can be created but if you insert data to the table. It will throw an error like: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: Don't support use global sort on partitioned table. -- Sent from:

How to collect carbondata profile info?

2018-07-19 Thread carbondata-newuser
I have noticed that carbondata already provide profiler in version 1.4. It can collect lots of information like partitions.length,startTime,endTime,getSplitsStartTime,getSplitsEndTime,numSegments,numStreamSegments,numBlocks,distributeStartTime,distributeEndTime. How can I get this information?

How to look up how many date in carbon without partition.

2018-07-19 Thread carbondata-newuser
It seems carbondata not recommend use partitionby and partitionby is not supported in global sort scope. It is very conveniently to look up how many date partition(along with the partition size every day) already exists in hive(save as parquet). In carbondata I add the date column to first sort

Index file cache will not work when the table has invalid segment.

2018-07-10 Thread carbondata-newuser
Carbon version is 1.4 rc2. create table( col1 string, col2 int, col2 string, date string ) *First step:* insert into table carbonTest select col1,col2,col3,"20180707" from hiveTable2 where date="20180707"; The col3 is a hive map type, so this insert will be failed. And it will create invalid

Re: Carbon file size is so big than parquet.

2018-07-03 Thread carbondata-newuser
Finally I find the reason, in parquet we used gzip compressor and carbondata used snappy. Gzip has better compression ratio than snappy . -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Carbon file size is so big than parquet.

2018-07-02 Thread carbondata-newuser
'SORT_COLUMNS'='app_name,app_id,is_AAA,os,platform,activation_channel,app_version,channel,ut,language,is_F,is_BBB,version_code,os_version,timezone,display_density' -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Carbon file size is so big than parquet.

2018-07-02 Thread carbondata-newuser
I have disable inverted list in all columns, but it still 50% larger than parquet. 31G(parquet) vs 48G (carbondata) with 424,000,000 records. Carbondata version is 1.3 CREATE TABLE growth.carbondata_m_device_distinct ( A_id bigint, app_namestring, app_id int, platformstring, is_F