Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Paul Rogers
Hi Scott, I created a file, "test.json", using the data from your e-mail: [ { "var1": "foo", "var2":"bar"},{"var1": "fo", "var2": "baz"}] The oldest build I have readily available is Drill 1.13. I ran that as a server, then connected with sqlline as a client. I ran a query: select * from `test

Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread scott
Paul, I'm using version 1.12. Can you tell me what version you think that was fixed in? The ticket I referenced is still open, with no comments. Scott On Mon, Aug 27, 2018 at 5:47 PM Paul Rogers wrote: > Hi David, > > JSON files are never splittable: there is no single-character way to find > t

Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Paul Rogers
Hi David, JSON files are never splittable: there is no single-character way to find the start of a JSON record within a file. Drill is supposed to support two JSON formats: the array format from the earlier post, and the non-JSON (but very common) list of objects format in this example. Thank

Re: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Paul Rogers
Hi Scott, The code to handle top-level arrays is supposed to be in Drill already. I tested it for a not-yet-committed version of the JSON parser. I thought it worked in the current version as well... Just checked the unit tests. We have one, TestJsonRecordReader.testContainingArray that reads

RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Lee, David
Get rid of the opening and closing brackets and see if you can turn the commas into newlines.. The file needs to be splittable I think to reduce memory overhead vs parsing a giant string... {"var1": "foo", "var2":"bar"} {"var1": "fo", "var2": "baz"} {"var1": "f2o", "var2": "baz2"} {"var1": "f3o

Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread scott
Hi All, I'm getting an error querying some of my json files. The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record. Current token was START_ARRAY The json files are in array format, like [ { "var1": "foo", "var2": "bar"},{"var1": "fo", "var2

Re: Love Drill - Hate Key Has String Token

2018-08-27 Thread Ted Dunning
Can you post a sample file with, say, 5-10 lines? Is it the file names? Or the data values that are giving you fits? On Mon, Aug 27, 2018, 12:51 John Folkers wrote: > Hello, I downloaded Drill over the weekend, and I love it. > > > Problem: $ string token in a key. > > > Question: How can I ge

Re: Love Drill - Hate Key Has String Token

2018-08-27 Thread Charles Givre
Hi John, Have you tried enclosing your field names in back ticks? IE SELECT `$field1`, `$field2` FROM… — C > On Aug 27, 2018, at 15:47, John Folkers wrote: > > Hello, I downloaded Drill over the weekend, and I love it. > > > Problem: $ string token in a key. > > > Question: How can I get

Re: query performance with unequal drillbits

2018-08-27 Thread Ted Dunning
Paul, Thanks for the reality side of this. Configuring a system to handle unusual setups can definitely be a challenge. Btw, the general term for running several sub-scale workers on each node to allow more flexibility is "micro-sharding". On Mon, Aug 27, 2018 at 3:24 PM Paul Rogers wrote: >

Re: query performance with unequal drillbits

2018-08-27 Thread Paul Rogers
Hi All, For those following along who have not tried Ted's idea (running multiple Drillbits per host), note that when running two or more Drillbits per node, the admin is responsible for choosing non-conflicting port numbers. The port numbers are configured in drill-override.conf. See drill-ov

Love Drill - Hate Key Has String Token

2018-08-27 Thread John Folkers
Hello, I downloaded Drill over the weekend, and I love it. Problem: $ string token in a key. Question: How can I get Drill to not trip on the $ string token when it sees it inside the keyname? Error Message Error: DATA_READ ERROR: Failure while reading ExtendedJSON typed value. Expected a V

Re: [DISCUSS] Deprecation policy in Drill

2018-08-27 Thread salim achouche
Drill is a SQL engine, which means the SQL syntax and associated options (runtime configuration and session properties) constitute its user facing APIs (if I may say). When we talk about deprecating and then removing documented session / configuration properties within the same release, then what d

Re: [DISCUSSION] current project state

2018-08-27 Thread Paul Rogers
Hi Derich, From the shameless self promotion dept., Charles and I are wrapping up the O’Reilly book “Learning Apache Drill” that gives an in-depth discussion of format plugins and UDFs. We still have a red for docs on storage plugins. - Paul Sent from my iPhone > On Aug 27, 2018, at 9:04 AM,

Failure while reading messages from kafka

2018-08-27 Thread Matt
I have a Kafka topic with some non-JSON test messages in it, resulting in errors "Error: DATA_READ ERROR: Failure while reading messages from kafka. Recordreader was at record: 1" I don't seem to be able to bypass these topic messages with "store.json.reader.skip_invalid_records" or even an OFFSET

Re: [DISCUSSION] current project state

2018-08-27 Thread Carlos Derich
Hello guys, Thanks for bringing up this discussion, I may be a little bit late but I would like to add an use case I've been through recently. I think Drill should be able to use ZK for storing session's data. In a multiple Drillbit scenario, if a second Drillbit receives a request with a session

Re: Apache Drill High Availability using HAproxy

2018-08-27 Thread John Omernik
This is a great topic, that I have run into running Drill on Apache Mesos due to each of my bits having essentially a DNS load balancer. (One DNS Name, multiple Drill bits IPs assigned to them). That said, I've run into a few issues and have a few workarounds. Note, I am talking about the REST AP

Re: query performance with unequal drillbits

2018-08-27 Thread John Omernik
I will +1 Ted's idea. By doing small drillbits, it does take a bit more overhead, but you also have an ability to scale your Drill cluster size (especially using the Drillbit shutdown features added recently). On Wed, Aug 22, 2018 at 8:23 PM, Ted Dunning wrote: > Cool > > On Wed, Aug 22, 2018,

Python Module for using Apache Drill with Jupyter Notebooks

2018-08-27 Thread John Omernik
Hey all, I know this is shameless self promotion, however, I am proud of this little module I wrote that allows you to access Apache Drill via Magic Functions in Jupyter Notebooks and Jupyter Lab. It's a neat tool that gives quite a bit of flexibility in it's approach. I wrote a blog on why/how

Re: [DISCUSS] Deprecation policy in Drill

2018-08-27 Thread Charles Givre
FWIW, since we seem to have a habit of leaving deprecated items hang around in Drill for some time, I would be in favor of having Drill throw warnings in the next version for use of deprecated items (not just options) and then removing them in version n+2. — C > On Aug 27, 2018, at 07:01, weij

Re: [DISCUSS] Deprecation policy in Drill

2018-08-27 Thread weijie tong
I think we should reserve these deprecated options to let users upgrade easier. Another solution is if we remove these deprecated ones, we should add a startup checking to let users know these options are removed . On Mon, Aug 27, 2018 at 3:54 PM Arina Ielchiieva wrote: > Hi all, > > when it sho

[DISCUSS] Deprecation policy in Drill

2018-08-27 Thread Arina Ielchiieva
Hi all, when it should be considered OK to remove deprecated options / tables in Drill? Some projects mark some notion as deprecated in one release, and then remove in the next. Will this policy be ok in Drill? Here are two latest examples: 1. store.hive.optimize_scan_with_native_readers was in