Hello guys, hope everyone is well. I am having an encoding issue when converting a table from parquet into csv files, I wonder if someone could shed some light on it ?
One of my data sets has data in French with lots of accentuation, and it is persisted in HDFS as parquet. When I query the parquet table with: *select `city` from dfs.parquets.`file` , *it properly return the data encoded. *city* *Montréal* Then I convert this table into a CSV file with the following query: *alter session set `store.format`='csv'* *create table dfs.csvs.`converted` as select * from dfs.parquets.`file`* Then when I run a select query on it, it returns data not properly encoded: *select columns[0] from dfs.csvs.`converted`* Returns: *Montr?al* My storage plugin is pretty standard: "csv" : { "type" : "text", "extensions" : [ "csv" ], "delimiter" : ",", "skipFirstLine": true }, Should I explicitly add an charset option somewhere ? Couldn't find anything helpful on the docs. Tried adding *export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Dsaffron.default.charset=UTF-8"* to drill-env.sh file, but no luck. Have anyone ran into similar issues ? Thank you !