Hi Divya,

Drill follows the commonly-accepted practice for CSV files. The general rule is:

1. Column headers all on one line, comma separated. (Drill 1.11 has fixes in 
this area, so you’ll want to use that if you have any problems.
2. Each record on its own line, comma-separated, no leading or trailing spaces.
3. No need for quotes unless your value contains commas.

You can customize behavior using the storage plugin config:

* Choose delimiter (tab for TSV, | for PSV, etc.)
* Choose to read or skip the header.

You’ll want to make sure to use the “,” delimiter, read and use the header. The 
docs have an example of the required setup.

Values are always read as text, so even your numbers will start as VarChar. You 
can convert to a numeric type in the query.

Example using your data:

Column1,Column2,Column3,Column4,Column5
colonedata1,coltwodata1,-35.924476,138.5987123,
colonedata2,coltwodata2,-27.4372536,153.0304583,137

Note that if columns are empty (like your first row), you still should include 
the comma separators. (Another bug fix in 1.11 fixes this case; 1.10 and 
earlier have problems if trailing columns are missing.)

Thanks,

- Paul


On Aug 1, 2017, at 11:51 PM, Divya Gehlot 
<divya.htco...@gmail.com<mailto:divya.htco...@gmail.com>> wrote:

Hi,
My column headers are in single line only i.e.
Column1,Column2,Column3,Column4,Column5
"colonedata1","coltwodata1","-35.924476","138.5987123",""
"colonedata2","coltwodata2","-27.4372536","153.0304583","137"
colonedata3","coltwodata3","-35.2793885","149.1233503","134"
"colonedata4","coltwodata4","-33.8724176","151.2067579",""

As you advised to put quotes as string delimeter for each column data and ran 
the select query.
attaching the data file too .

Appreciate the help !

Thanks,
Divya

On 2 August 2017 at 12:37, Kunal Khatua 
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:
So, the way you’ve shown your data is basically in this format:

<List of column headers, one per line>
<actual column data, one row per line>

Unfortunately, I don't believe the text reader in Drill is that advanced as to 
interpret  the list of column headers across multiple lines, while the actual 
data is in a single line per row.

Typically text data is in CSV (or other delimiters similar to the comma) and 
can have the first line representing a header.

Also, I'm not sure if there was ever an option introduced to allow skipping of 
the initial set of lines within a text file being read.


-----Original Message-----
From: Divya Gehlot 
[mailto:divya.htco...@gmail.com<mailto:divya.htco...@gmail.com>]
Sent: Tuesday, August 01, 2017 7:06 PM
To: user@drill.apache.org<mailto:user@drill.apache.org>
Subject: Re: delimiter in column values

For my sample dataset as you advised I surrounded with single columns also with 
quotes and the results are as below :
col_Column1
Column2
Column3
Column4
Column5
"Chifley" "coltwodata5" "" "" ""
"colonedata1" "coltwodata1" "-35.924476" "138.5987123" ""
"colonedata2" "coltwodata2" "-27.4372536" "153.0304583" "137"
"colonedata4" "coltwodata4" "-33.8724176" "151.2067579" ""
"colonedata5" "coltwodata5" "" "" ""
"This col6 data" "coltwodata6" "-33.869732" "151.2055553"
"This col7 data yes." "coltwodata7" "1.2845045" "103.8482739"
colonedata3" "coltwodata3" "-35.2793885" "149.1233503" "134"

Thanks,
Divya

On 1 August 2017 at 22:39, Kunal Khatua 
<kkha...@mapr.com<mailto:kkha...@mapr.com>> wrote:

> I think you need quotes around the single word datasets as well,
> because the quotes act as String delimiters and help in indicating the
> start and end of a String.
>
> Is there a reason why the single word strings cannot be in quotes as well?
>
> -----Original Message-----
> From: Divya Gehlot 
> [mailto:divya.htco...@gmail.com<mailto:divya.htco...@gmail.com>]
> Sent: Tuesday, August 01, 2017 3:04 AM
> To: user@drill.apache.org<mailto:user@drill.apache.org>
> Subject: delimiter in column values
>
> Hi,
> I have data set which has delimeter in first column value when I read
> the data set It provides the output below :
>
> col_Column1
> Column2
> Column3
> Column4
> Column5
>
> "This col6 data" coltwodata6 -33.869732 151.2055553 "This col7 data yes."
> coltwodata7 1.2845045 103.8482739 Chifley coltwodata5
> colonedata1 coltwodata1 -35.924476 138.5987123
> colonedata2 coltwodata2 -27.4372536 153.0304583 137
> colonedata3 coltwodata3 -35.2793885 149.1233503 134
> colonedata4 coltwodata4 -33.8724176 151.2067579
> colonedata5 coltwodata5
>
>
>
> How can I read the column1 values as is without getting split into two
> columns for instance the Column values should be
> Column1
> colonedata1,
> colonedata2,
> colonedata3,
> colonedata4,
> colonedata5,
> "This, col6 data"
> "This, col7 data"
> Chifley,
>
> Appreciate the help !
>
> Thanks ,
> Divya
>

<sample_data.csv>

Reply via email to