Rafael, I'd like to echo what Paul said. It's super-helpful to know how people are actually using this and what features are needed. Thanks again! -- C
> On Apr 2, 2020, at 1:18 PM, Paul Rogers <[email protected]> wrote: > > Hi Rafael, > > Thanks for the update! We were thinking to try to finish up the current PR so > we can get it merged into Drill. Then, we can add a simpler way to handle the > extra message fields, and add the filter push-down code. We look forward to > you continued advice as we add those additional features. > > Thanks, > - Paul > > > > On Thursday, April 2, 2020, 7:40:08 AM PDT, Jaimes, Rafael - 0993 - MITLL > <[email protected]> wrote: > > Hi all, > > Just an update after testing HTTP REST plugin some more. It's working well. > I'm not sure how common or standardized these operators are, but in case it > is useful to you, I've been using the following: > = =~ %3C %3E .. , > equals, not equals, less than, greater than, between, in > > Let me know if you have any questions or if additional testing would help. > > Thanks, > Rafael > > -----Original Message----- > From: Jaimes, Rafael - 0993 - MITLL <[email protected]> > Sent: Wednesday, April 1, 2020 12:43 PM > To: [email protected] > Subject: RE: REST data source? > > Yes that's correct. I saw the work you started with the env vars, but for > now I set the proxy in the plugin. > > - Rafael > > -----Original Message----- > From: Charles Givre <[email protected]> > Sent: Wednesday, April 1, 2020 12:42 PM > To: [email protected] > Subject: Re: REST data source? > > Hey Rafael, > Thanks for the feedback. My original idea was to pull the proxy from the > environment vars in HTTP_PROXY/HtTTPS_PROXY and ALL_PROXY but that part > isn't quite done yet. Did you set the proxy info via the plugin config? > -- C > > >> On Apr 1, 2020, at 10:22 AM, Jaimes, Rafael - 0993 - MITLL > <[email protected]> wrote: >> >> Hi all, >> >> I built Charles' latest branch including the proxy setup. It appears to be > >> working quite well going through the proxy. >> >> I'll continue to test and report back if I find any issues. >> >> Note: Beyond Paul's repo recommendations, I had to skip checkstyle to get > the >> maven build to complete. You're probably already aware of that, I think > it's >> just specific to this branch. >> >> Thanks! >> Rafael >> >> -----Original Message----- >> From: Paul Rogers <[email protected]> >> Sent: Wednesday, April 1, 2020 1:29 AM >> To: user <[email protected]> >> Subject: Re: REST data source? >> >> Thanks, Charles. >> >> As Charles suggested, I pushed a commit that replaces the "old" JSON > reader >> with the new EVF-based one. Eventually this will allow us to use a > "provided >> schema" to handle any JSON ambiguities. >> >> As we've been discussing, I'll try to add the ability to specify a path to > >> data: "response/payload/records" or whatever. With the present commit, > that >> path can be parsed in code, but I think a simple path spec would be > easier. >> >> Thanks, >> - Paul >> >> >> >> On Tuesday, March 31, 2020, 10:00:52 PM PDT, Charles Givre >> <[email protected]> wrote: >> >> Hello all, >> I pushed some updates to the REST PR to include initial work on proxy >> configuration. I haven't updated the docs yet (until this is finalized). > It >> adds new config variables as shown below: >> >> { >> "type": "http", >> "cacheResults": true, >> "connections": {}, >> "timeout": 0, >> "proxyHost": null, >> "proxyPort": 0, >> "proxyType": null, >> "proxyUsername": null, >> "proxyPassword": null, >> "enabled": true >> } >> I started on getting Drill to recognize the proxy info from the > environment, >> but haven't quite finished that. The plan is for the plugin config to >> override environment vars. >> Feedback is welcome. >> >> @paul-rogers, I think you can push to my branch (or submit a PR?) and that > >> will be included in the main PR. >> -- C >> >> >> >>> On Mar 31, 2020, at 10:40 PM, Rafael Jaimes III <[email protected]> > wrote: >>> >>> Yes your initial assessment was correct, there is extra material other >>> than the data field. >>> The returned JSON has some top-level fields that don't go any deeper, >>> akin to your "status" : ok field. In the example I'm running now, one >>> is called MessageState which is set to "NEW". There's another field >>> called MessageData, which, obviously, holds most of the data. There >>> are some other top-level fields, and one is called MessageHeader which >>> is nested. There's a lot of stuff here, and this is just one "table" I'm >>> querying against now. >>> Not sure how it will differ with the other services. >>> >>> The service is definitely returning multiple records - I believe it's >>> a JSON array and Drill+HTTP/plugin appears to handle it quite well. >>> >>> You're right, Drill is handling most of the structure by modifying my >>> SELECT statement as you suggested. >>> >>> For filter pushdown, expressions of that form would be great. That's >>> what I had in mind too. >>> >>> Thanks, >>> Rafael >>> >>> On Tue, Mar 31, 2020 at 10:14 PM Paul Rogers >>> <[email protected]> >>> wrote: >>> >>>> Hi Rafael, >>>> >>>> Thanks much for the info. We had already implemented filter push-down >>>> for other plugins, and for a few custom REST APIs, so should be >>>> possible to port it over to the HTTP plugin. If you can supply code, >>>> then you can convert filters to anything you want, a specialized JSON >>>> request body, etc. >>>> To do this generically, we have to make some assumptions, such as >>>> either 1) all fields can be pushed as query parameters, or 2) only >>>> those in some config list. Either way, we know how to create >>>> name=value pairs in either a GET or POST format. >>>> >>>> You mentioned that your "payload" objects are structured. Drill can >>>> already handle this; your query can map them to the top level: >>>> >>>> SELECT t.characteristic.color.name AS color_name, >>>> t.characteristic.color.confidence AS color_confidence, ... FROM >>>> yourTable AS t >>>> >>>> You'll get that "out of box." Drill does assume that data is in >>>> "record >>>> format": a single list of objects which represent records. Code would >>>> be needed to handle, say, two separate lists of objects or other, >>>> more-general, JSON structures. >>>> >>>> >>>> My specific question was more around the response from your web service. >>>> Does that have extra material besides just the data records? Something >>>> like: >>>> >>>> >>>> { "status": "ok", "data": [ {characteristic: ... }, {...}] } >>>> >>>> Or, is the response directly an array of objects: >>>> >>>> [ {characteristic: ... }, {...}] >>>> >>>> >>>> If it is just an array, then the "out of the box" plugin will work. >>>> If there is other stuff, then you'll need the new feature to tell >>>> Drill how to find the field to your data. The present version needs >>>> code, but I'm thinking we can just use an array of names in the plugin >>>> config: >>>> >>>> dataPath: [ "data" ], >>>> >>>> Or, in your case, do you get a single record per HTTP request? If a >>>> single record, then either your queries will be super-simple, or >>>> performance will be horrible when requesting multiple records. (The >>>> HTTP plugin only does one request and assumes it will get back a set >>>> of records as a JSON array or as whitespace-separated JSON objects as >>>> in a JSON file.) >>>> >>>> Can you clarify a bit which of these cases your data follows? >>>> >>>> I like your idea of optionally supplying a parser class for the "hard" >>>> cases: >>>> >>>> messageParserClass: "com.mycompany.drill.MyMessageParser", >>>> >>>> As long as the class is on the classpath, Java will find it. >>>> >>>> Finally, on the filter push-down, the existing code we're thinking of >>>> using can handle expressions of the form: >>>> >>>> column op constant >>>> >>>> Where "op" is one of the relational operators: =, !=, < etc. Also >>>> handles the obvious variations (const op constant, column BETWEEN >>>> const1 AND const2, column IN (const1, const2, ...)). >>>> >>>> The code cannot handle expressions (due to a limitation in Drill > itself). >>>> That is, this won't work as a filter push-down: col = 10 + 2 or col + >>>> 2 = 10. Nor can it handle multi-column expressions: column1 = column2, > etc. >>>> >>>> >>>> I'll write up something more specific so you can see exactly what we >>>> propose. >>>> >>>> >>>> Thanks, >>>> - Paul >>>> >>>> >>>> >>>> On Tuesday, March 31, 2020, 6:39:57 PM PDT, Rafael Jaimes III < >>>> [email protected]> wrote: >>>> >>>> Either a text description of the parse path or specifying the class >>>> with the message parser could work. >>>> I think the latter would be better, if it were simple as dropping the >>>> JAR in 3rdparty after Drill is already built. >>>> That way we can just continually add parsers ad-hoc. >>>> >>>> An example JSON response includes about 4 top-level fields, then 2 of >>>> those fields have many sub-fields. >>>> For example a field could be nested 3 levels deep and say: >>>> >>>> Characteristic: >>>> >>>> Color: >>>> >>>> Color name: "Red" >>>> >>>> Confidence: 100 >>>> >>>> Physical: >>>> >>>> Size: 405 >>>> >>>> Confidence: 95 >>>> >>>> As you can imagine, it would be difficult to flatten this because of >>>> repeated sub-field names like "Confidence". >>>> >>>> I don't think it would be easily exportable into a CSV. >>>> At least for me pandas dataframe is the ultimate destination for all >>>> of this, which also don't handle nested fields well either. >>>> I'll have to handle some parsing on my end. >>>> >>>> Filter pushdown would be huge and much desired. >>>> Our other end-users are accustomed to using SQL in that manner and >>>> the REST API we use fully support AND, OR, BETWEEN, =, <, >, etc (I >>>> can get a full list if you're interested). >>>> For example I think "between" is a ",". Converting the SQL statement >>>> into the URL format would be awesome and help streamline querying >>>> across data sources. >>>> This is one of the main reasons why we're so interested in Drill. >>>> >>>> >>>> Thanks, >>>> >>>> Rafael >>>> >> >
