Hi Rafael,

Thanks much for the info. We had already implemented filter push-down for other 
plugins, and for a few custom REST APIs, so should be possible to port it over 
to the HTTP plugin. If you can supply code, then you can convert filters to 
anything you want, a specialized JSON request body, etc. To do this 
generically, we have to make some assumptions, such as either 1) all fields can 
be pushed as query parameters, or 2) only those in some config list. Either 
way, we know how to create name=value pairs in either a GET or POST format.

You mentioned that your "payload" objects are structured. Drill can already 
handle this; your query can map them to the top level:

SELECT t.characteristic.color.name AS color_name,   
t.characteristic.color.confidence AS color_confidence, ...   FROM yourTable AS t

You'll get that "out of box." Drill does assume that data is in "record 
format": a single list of objects which represent records. Code would be needed 
to handle, say, two separate lists of objects or other, more-general, JSON 
structures.


My specific question was more around the response from your web service. Does 
that have extra material besides just the data records? Something like:


{ "status": "ok", "data": [ {characteristic: ... }, {...}] }

Or, is the response directly an array of objects:

 [ {characteristic: ... }, {...}]


If it is just an array, then the "out of the box" plugin will work. If there is 
other stuff, then you'll need the new feature to tell Drill how to find the 
field to your data. The present version needs code, but I'm thinking we can 
just use an array of names in the plugin config:

dataPath: [ "data" ],

Or, in your case, do you get a single record per HTTP request? If a single 
record, then either your queries will be super-simple, or performance will be 
horrible when requesting multiple records. (The HTTP plugin only does one 
request and assumes it will get back a set of records as a JSON array or as 
whitespace-separated JSON objects as in a JSON file.)

Can you clarify a bit which of these cases your data follows?

I like your idea of optionally supplying a parser class for the "hard" cases:

messageParserClass: "com.mycompany.drill.MyMessageParser",

As long as the class is on the classpath, Java will find it.

Finally, on the filter push-down, the existing code we're thinking of using can 
handle expressions of the form:

column op constant

Where "op" is one of the relational operators: =, !=, < etc. Also handles the 
obvious variations (const op constant, column BETWEEN const1 AND const2, column 
IN (const1, const2, ...)).

The code cannot handle expressions (due to a limitation in Drill itself). That 
is, this won't work as a filter push-down: col = 10 + 2 or col + 2 = 10. Nor 
can it handle multi-column expressions: column1 = column2, etc.


I'll write up something more specific so you can see exactly what we propose.


Thanks,
- Paul

 

    On Tuesday, March 31, 2020, 6:39:57 PM PDT, Rafael Jaimes III 
<[email protected]> wrote:  
 
 Either a text description of the parse path or specifying the class
with the message parser could work.
I think the latter would be better, if it were simple as dropping the
JAR in 3rdparty after Drill is already built.
That way we can just continually add parsers ad-hoc.

An example JSON response includes about 4 top-level fields,
then 2 of those fields have many sub-fields.
For example a field could be nested 3 levels deep and say:

Characteristic:

  Color:

      Color name: "Red"

      Confidence: 100

  Physical:

      Size: 405

      Confidence:  95

As you can imagine, it would be difficult to flatten this because of
repeated sub-field names like "Confidence".

I don't think it would be easily exportable into a CSV.
At least for me pandas dataframe is the ultimate destination for all
of this, which also don't handle nested fields well either.
I'll have to handle some parsing on my end.

Filter pushdown would be huge and much desired.
Our other end-users are accustomed to using SQL in that manner and the
REST API we use fully support AND, OR, BETWEEN, =, <, >, etc (I can
get a full list if you're interested).
For example I think "between" is a ",". Converting the SQL statement
into the URL format would be awesome and help streamline querying
across data sources.
This is one of the main reasons why we're so interested in Drill.


Thanks,

Rafael
  

Reply via email to