Hi Divya,

I presume that “sample_data.csv” is your file? The default CSV configuration 
reads files without headers and puts all columns into a single array called 
“columns”. Do a SELECT * and you’ll see it. You’ll see an array that contains 
your data:

[“Fred”, “Flintstone”]

So, the correct query would be:

SELECT REGEXP_REPLACE(CAST(columns[0] AS VARCHAR(100)), '[,".]', '') FROM
dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`

I notice the regex is messing with brackets. Are you trying to convert the 
array form shown above to a cleaner list? Won’t work: those brackets are not in 
the data; they are the textual sugar added to show the array when printing.

Maybe what you want is:

SELECT columns[0] as `a`, columns[1] as `b` …

Or, if your file actually contains headers, use a table function (or storage 
plugin config) to specify to use the headings to create individual columns. See 
the example at [1] under “Using the Formats Attributes as Table Function 
Parameters”.

- Paul

[1] https://drill.apache.org/docs/plugin-configuration-basics/

> On Jul 26, 2017, at 8:22 PM, Divya Gehlot <divya.htco...@gmail.com> wrote:
> 
> The another thing which I observed is when I  run below query
> SELECT  REGEXP_REPLACE('"This, col7 data yes."', '[,".]', '') FROM
> (VALUES(1))
> EXPR$0
> This col7 data yes
> 
> 
> Same when I run the csv file it gives me empty result set :
> SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '') FROM
> dfs.`installedsoftwares/ApacheDrill/apache-drill-1.10.0.tar/apache-drill-1.10.0/sample-data/sample_data.csv`
> 
> EXPR$0
> 
> 
> P.S. As advised all the above queries I ran on Drill 1.11
> 
> Appreciate the help .
> 
> Thanks,
> Divya
> 
> On 27 July 2017 at 09:54, Divya Gehlot <divya.htco...@gmail.com> wrote:
> 
>> Hi,
>> Please find attached the sample_data.csv file
>> Pasting the content of the csv file  below , in case attachment doesn't
>> reach
>> 
>>> Column1,Column2,Column3,Column4,Column5
>>> colonedata1,coltwodata1,-35.924476,138.5987123,
>>> colonedata2,coltwodata2,-27.4372536,153.0304583,137
>>> colonedata3,coltwodata3,-35.2793885,149.1233503,134
>>> colonedata4,coltwodata4,-33.8724176,151.2067579,
>>> colonedata5,coltwodata5,,,
>>> "This, col6 data",coltwodata6,-33.869732,151.2055553,351
>>> "This, col7 data yes.",coltwodata7,1.2845045,103.8482739,80
>>> Chifley,coltwodata5,,,
>> 
>> 
>> Error :
>> 
>>> Query Failed: An Error Occurred
>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>> IllegalArgumentException: reallocation size must be non-negative Fragment
>>> 0:0
>> 
>> 
>> 
>> Thanks all for the help.
>> 
>> Thanks ,
>> Divya
>> 
>> 
>> On 26 July 2017 at 23:31, Paul Rogers <prog...@mapr.com> wrote:
>> 
>>> Hi Divya,
>>> 
>>> We found a couple of issues in CSV files that would lead to the kind of
>>> errors you encountered. These issues will be fixed in the upcoming Drill
>>> 1.11 release.
>>> 
>>> Sharing a sample CSV file will let us check the issue. Even better,
>>> voting is open for the 1.11 release. Please go ahead and download it and
>>> try your file with that release. Let us know if you still have a problem.
>>> 
>>> Thanks,
>>> 
>>> - Paul
>>> 
>>>> On Jul 26, 2017, at 6:14 AM, Khurram Faraaz <kfar...@mapr.com> wrote:
>>>> 
>>>> Can you please share your CSV file, the SQL query and the version of
>>> Drill that you are on. So someone can take a look and try to reproduce the
>>> error that you are seeing.
>>>> 
>>>> 
>>>> Thanks,
>>>> 
>>>> Khurram
>>>> 
>>>> ________________________________
>>>> From: Divya Gehlot <divya.htco...@gmail.com>
>>>> Sent: Wednesday, July 26, 2017 3:18:08 PM
>>>> To: user@drill.apache.org
>>>> Subject: regex replace in string
>>>> 
>>>> Hi,
>>>> I have a CSV file where  column values are
>>>> "This is the column,one "
>>>> "This is column , two"
>>>> column3
>>>> column4
>>>> 
>>>> When I try to regex_replace it throws error
>>>> 
>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>>> IllegalArgumentException: reallocation size must be non-negative
>>> Fragment
>>>>> 0:0
>>>> 
>>>> 
>>>> How can I read the above columns as one string  like  This is the column
>>>> one
>>>> Appreciate the help
>>>> 
>>>> Thanks,
>>>> Divyab
>>> 
>>> 
>> 

Reply via email to