Re: CTAS and save as parquet last column values are shown as null

Divya Gehlot Sun, 23 Jul 2017 23:54:04 -0700

Thank you so much it worked

Can you please provide me the pointer to the documentation where updation
for different format type are mentioned .


As I am facing another issue with date type as the data which I receive in
csv format has the format of 15/1/2016 when I try to cast or convert
to_date it throws me error




Thanks ,
Divya

On 24 July 2017 at 14:17, Abhishek Girish <agir...@apache.org> wrote:

> Can you update your csv format plugin as shown below and retry your query?
>
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "lineDelimiter": "\r\n",
>       "extractHeader": true,
>       "delimiter": ","
>     }
>
> On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
> > 0: jdbc:drill:zk=local> select * FROM
> >  dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/data/mapping/
> > PublicHoliday/PublicHoliday.csv`
> > limit 10 ;
> > +-----------------------------------------+
> > |                 columns                 |
> > +-----------------------------------------+
> > | ["Day","Date","Area\r"]                 |
> > | ["Friday","15/1/2016","Karnataka\r"]    |
> > | ["Tuesday","26/1/2016","Karnataka\r"]   |
> > | ["Monday","7/3/2016","Karnataka\r"]     |
> > | ["Friday","25/3/2016","Karnataka\r"]    |
> > | ["Friday","1/4/2016","Karnataka\r"]     |
> > | ["Friday","8/4/2016","Karnataka\r"]     |
> > | ["Thursday","14/4/2016","Karnataka\r"]  |
> > | ["Tuesday","19/4/2016","Karnataka\r"]   |
> > | ["Sunday","1/5/2016","Karnataka\r"]     |
> > +-----------------------------------------+
> > 10 rows selected (0.122 seconds)
> > 0: jdbc:drill:zk=local> select * from
> > `dfs`.`tmp`.`installedsoftwares/ApacheDrill/apache-drill-1.10.
> > 0.tar/apache-drill-1.10.0/sample-data/jll/publicholiday.parquet`
> > limit 10 ;
> > +-----------+------------+-------+
> > |    Day    |    Date    | Area  |
> > +-----------+------------+-------+
> > | Friday    | 15/1/2016  | null  |
> > | Tuesday   | 26/1/2016  | null  |
> > | Monday    | 7/3/2016   | null  |
> > | Friday    | 25/3/2016  | null  |
> > | Friday    | 1/4/2016   | null  |
> > | Friday    | 8/4/2016   | null  |
> > | Thursday  | 14/4/2016  | null  |
> > | Tuesday   | 19/4/2016  | null  |
> > | Sunday    | 1/5/2016   | null  |
> > | Monday    | 9/5/2016   | null  |
> > +-----------+------------+-------+
> > 10 rows selected (0.1 seconds)
> > 0: jdbc:drill:zk=local>
> >
> >
> > *Drill set up* : Aapche drill is set up on Windows machine in embedded
> mode
> > .
> >
> > On 24 July 2017 at 13:30, Divya Gehlot <divya.htco...@gmail.com> wrote:
> >
> > >
> > > Pasting the result set in text format
> > >
> > > *Reading parquet file format :*
> > >
> > >> Day       Date         Area
> > >> Friday 15/1/2016 null
> > >> Tuesday 26/1/2016 null
> > >> Monday 7/3/2016 null
> > >> Friday 25/3/2016 null
> > >> Friday 1/4/2016 null
> > >> Friday 8/4/2016 null
> > >
> > >
> > >
> > > *Reading csv file format *
> > >
> > >> columns
> > >> ["Day","Date","Area\r"]
> > >> ["Friday","1/4/2016","Karnataka\r"]
> > >> ["Friday","15/1/2016","Karnataka\r"]
> > >> ["Friday","25/3/2016","Karnataka\r"]
> > >> ["Friday","8/4/2016","Karnataka\r"]
> > >> ["Monday","7/3/2016","Karnataka\r"]
> > >
> > >
> > >
> > >
> > >
> > > *CTAS query csv to parquet :*
> > >
> > > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> SELECT
> > >> CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> AS
> > >> VARCHAR(100)) END AS `Day`,
> > >> CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Date`
> > AS
> > >> VARCHAR(100)) END AS `Date`,
> > >> CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Area`
> > AS
> > >> VARCHAR(100)) END AS `Area`
> > >> FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > ',',
> > >> extractHeader => true))
> > >
> > >
> > >
> > > Thanks,
> > > Divya
> > >
> > > On 24 July 2017 at 13:20, Abhishek Girish <agir...@apache.org> wrote:
> > >
> > >> Unfortunately, the attachments / pictures haven't come through.
> Mailing
> > >> lists sometimes do not support these. Can you paste as text or share
> > links
> > >> to it instead?
> > >>
> > >> On Sun, Jul 23, 2017 at 9:14 PM, Divya Gehlot <
> divya.htco...@gmail.com>
> > >> wrote:
> > >>
> > >> > yes it shows the proper values when I query the csv file.
> > >> > CTAS query csv to parquet :
> > >> > Create table `dfs`.`tmp`.`publicholiday.parquet` AS
> > >> > SELECT
> > >> > CASE WHEN `Day` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE CAST(`Day`
> > AS
> > >> > VARCHAR(100)) END AS `Day`,
> > >> > CASE WHEN `Date` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Date`
> > >> AS
> > >> > VARCHAR(100)) END AS `Date`,
> > >> > CASE WHEN `Area` = '' THEN CAST(NULL AS VARCHAR(100)) ELSE
> CAST(`Area`
> > >> AS
> > >> > VARCHAR(100)) END AS `Area`
> > >> > FROM TABLE (dfs.`PublicHoliday.csv`(type => 'text',fieldDelimiter =>
> > >> ',',
> > >> > extractHeader => true))
> > >> >
> > >> > CSV File
> > >> >
> > >> > Parquet File
> > >> >
> > >> >
> > >> >
> > >> > Appreciate the help !
> > >> >
> > >> > Thanks,
> > >> > Divya 
> > >> >
> > >> > On 24 July 2017 at 11:52, Abhishek Girish <agir...@apache.org>
> wrote:
> > >> >
> > >> >> Can you share a sample row from the CSV and the CTAS query? Also
> test
> > >> if a
> > >> >> select columns[n] query on the CSV file works as expected [1] ?
> > >> >>
> > >> >> It could be an issue with delimiters.
> > >> >>
> > >> >> [1]
> > >> >> https://drill.apache.org/docs/querying-plain-text-files/#col
> > >> >> umns[n]-syntax
> > >> >> On Sun, Jul 23, 2017 at 8:44 PM Divya Gehlot <
> > divya.htco...@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi ,
> > >> >> > I am facing as weird issue when I CTAS and save the csv file as
> > >> parquet
> > >> >> it
> > >> >> > displays the last column values as null .
> > >> >> > This is not the case with one file .
> > >> >> > If I take any csv file with even with any data type and do a
> > >> >> > select column1,column2,column3 from table.parquet
> > >> >> > it shows the column3 values as null.
> > >> >> >
> > >> >> > Appreciate the help.
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Divya
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: CTAS and save as parquet last column values are shown as null

Reply via email to