Thank you, Manish.
Is dictionary exclude supported for datatypes other than String?
https://github.com/apache/carbondata/blob/6488bc018a2ec715b31407d12290680d388a43b3/integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala#L706
-
Swapnil
On Wed, Jul 19, 2017
Hi Ravi
Thanks for your comment.
I tested again with excluding province as dictionary. In spark, the query
time is around 3 seconds, in presto same is 9 seconds. so for this query
case(short string), dictionary lazy decode might not be the key factor.
Regards
Liang
2017-07-20 10:56 GMT+08:00
Hi Liang,
I see that province column data is not big, so I guess it hardly make any
impact with lazy decoding in this scenario. Can you do one more test by
excluding the province from dictionary in both presto and spark
integrations. It will tell whether it is really a lazy decoding issue or
not.
Hi Swapnil
Please find my answers inline.
1. What is the use of *carbon.number.of.cores *property and how is it
different from spark's executor cores?
-carbon.number.of.cores is used for reading the footer and header of the
carbondata file during query execution. Spark executor cores is a proper
Hi
For -- 4) Lazy decoding of the dictionary, just i tested 180 millions rows
data with the script:
"select province,sum(age),count(*) from presto_carbondata group by province
order by province"
Spark integration module has "dictionary lazy decode", presto doesn't have
"dictionary lazy decode",
Hi
Below are some proposed items for Presto optimization:
1) Remove the extra loops for data conversion in Presto Format to increase
the performance.
2) Modularize and optimize the filters .
3) Optimize the Carbondata Metadata reading.
4) Lazy decoding of the dictionary.
5) Batch reading of the
Hello All
I am trying carbon data for the first time and having few question on
improving performance -
1. What is the use of *carbon.number.of.cores *property and how is it
different from spark's executor cores?
2. Documentation says, by default, all non-numeric columns (except complex
type