Hi Carlos

It looks similar to an issue reported previously:
https://lists.apache.org/thread.html/1f3d4c427690c06f1992bc5070f355689ccc5b1ed8cc3678ad8e9106@<user.drill.apache.org>
 

Could you try setting the JVM's file encoding to UTF-8 and retry? If it does 
not work, please file a JIRA in https://issues.apache.org 

Thanks
Kunal
On 7/16/2018 1:25:45 PM, Carlos Derich <carlosder...@gmail.com> wrote:
It seems to be an issue only with CSV/TSV files.

Tried writing the output as JSON and it handles the encoding properly.

alter session set `store.format`='json'
create table dfs.tmp.test3 as select `city` from dfs.parquets.`file`

Returns:

{"city": "Montréal"}


additional info:

parquet-tools schema:

message root {
optional binary city (UTF8);
}


On Mon, Jul 16, 2018 at 2:49 PM, Carlos Derich
wrote:

> Hello guys, hope everyone is well.
>
> I am having an encoding issue when converting a table from parquet into
> csv files, I wonder if someone could shed some light on it ?
>
> One of my data sets has data in French with lots of accentuation, and it
> is persisted in HDFS as parquet.
>
>
> When I query the parquet table with: *select `city` from
> dfs.parquets.`file` , *it properly return the data encoded.
>
>
> *city*
>
> *Montréal*
>
>
> Then I convert this table into a CSV file with the following query:
>
> *alter session set `store.format`='csv'*
> *create table dfs.csvs.`converted` as select * from dfs.parquets.`file`*
>
>
> Then when I run a select query on it, it returns data not properly encoded:
>
> *select columns[0] from dfs.csvs.`converted`*
>
> Returns:
>
> *Montr?al*
>
>
> My storage plugin is pretty standard:
>
> "csv" : {
> "type" : "text",
> "extensions" : [ "csv" ],
> "delimiter" : ",",
> "skipFirstLine": true
> },
>
> Should I explicitly add an charset option somewhere ? Couldn't find
> anything helpful on the docs.
>
> Tried adding *export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> -Dsaffron.default.charset=UTF-8"* to drill-env.sh file, but no luck.
>
> Have anyone ran into similar issues ?
>
> Thank you !
>

Reply via email to