DRILL 1.4 - newline in strings not supported

2016-01-31 Thread Nicolas Paris
Hello, I am trying to import a csv containing large texts. They contains newline character "\n". Apache Drill conplains about that. There is a jira issue opened on https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjUscyr7tTKAhXBVhoKHf0CAjYQFggpMAE&url=http%3A

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Abdel Hakim Deneche
Hey Nicolas, what kind of queries are you running on your csv file ? On Sun, Jan 31, 2016 at 12:14 PM, Nicolas Paris wrote: > Hello, > > I am trying to import a csv containing large texts. They contains newline > character "\n". > Apache Drill conplains about that. There is a jira issue opened

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Nicolas Paris
Hello Abdel, I am creating parquet file from those CSV files. (CREATE TABLE syntax). Basically, I have a text column, with a maximum of 50k characters, containing newlines (the texts come from pdf extracted). I have multimilions tuples of texts. I am subseting texts containing some patterns (LIKE

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Abdel Hakim Deneche
Another user already reported some problems querying csv files with new line characters: http://comments.gmane.org/gmane.comp.apache.incubator.drill.user/2350 His particular problem was related to a bug in the LIKE function. Unfortunately he never got around to fill a JIRA for his issue. Is your

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Ted Dunning
If you have new lines in your files then the files becomes unsuitable for splitting. This means that the only parallelism available in a ctas statement is multiple files. Do you have a fair number of files? Sent from my iPhone > On Feb 1, 2016, at 7:26, Nicolas Paris wrote: > > Hello Abd

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Nicolas Paris
@Abdel, Yes problem is similar. By the way, the jira issue allready exists isnt'it ? https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjUscyr7tTKAhXBVhoKHf0CAjYQFggpMAE&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fdrill-dev%2F201505.mbox%2F%253CJIRA

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Abdel Hakim Deneche
When you run a select * on your csv file, does it succeed or fail ? On Mon, Feb 1, 2016 at 7:53 AM, Nicolas Paris wrote: > @Abdel, > > Yes problem is similar. By the way, the jira issue allready exists isnt'it > ? > > https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Nicolas Paris
Abdel, select * on my csv file fails as well Thanks 2016-02-01 17:16 GMT+01:00 Abdel Hakim Deneche : > When you run a select * on your csv file, does it succeed or fail ? > > On Mon, Feb 1, 2016 at 7:53 AM, Nicolas Paris wrote: > > > @Abdel, > > > > Yes problem is similar. By the way, the jira

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Abdel Hakim Deneche
Then it's similar to DRILL-3178 indeed. Unfortunately there is no way I can think of to read csv files in Drill without replacing the new line characters. As Ted mentioned, Drill expected one data row per line to allow easy splitting of csv files. On Mon, Feb 1, 2016 at 8:24 AM, Nicolas Paris wro

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Jacques Nadeau
We should enhance Drill's text reader so that you can disable splitting. Once done, an appropriately escaped newline character could be consumed. This is future work and I'm not aware of any way to solve this without this fix. -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Feb 1, 2016 at 8:

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Nicolas Paris
Splitting csv on newlines that are not surrounded by quote is a solution, no ? (I mean a regex ) Because valid csv containing newlines in texts must have quoted strings I guess. Then it could be a kind of csv config parameter allowNewlineInTexts=true ( like extractHeader by e.g. ) 2016-02-01 17

Re: DRILL 1.4 - newline in strings not supported

2016-02-01 Thread Ted Dunning
See inline. On Mon, Feb 1, 2016 at 7:53 AM, Nicolas Paris wrote: > ... > @Ted, > > > If you have new lines in your files then the files becomes unsuitable for > > splitting. This means that the only parallelism available in a ctas > > statement is multiple files > > ​Does it means newlines ar