Re: REST data source?

Paul Rogers Tue, 31 Mar 2020 14:02:13 -0700

Hi Rafael,

You mention that your JSON response is nested. As it turns out, I just used 
something similar to Charle's HTTP plugin for a recent project. We had to deal 
with a bit of message overhead to get to the data:


{status: "ok", data: [your data here ]}

A PR was just submitted for a change to the "new" JSON parser to handle this 
case. However, the "message parser" does require code to parse its way down 
through the JSON.

The next step is to upgrade Charle's PR with the new JSON reader and support 
for the message parser. (The new JSON reader also allows you to specify a 
schema to handle messy JSON, if we could figure out where to store the schema.)

Can you perhaps share the JSON response structure you need? I'm trying to 
figure out if it is better to work out some kind of text description of the 
parse path, or just let you specify the name of a class that implements the 
message parser. Which would work better for you?

We are also trying to update an earlier ill-fated PR that adds filter 
push-down: the ability to convert a SQL WHERE expression into an HTTP 
parameter. That is WHERE foo = 'bar' becomes &foo=bar in the URL. It is easy to 
implement the "naive" approach that handles only equality, and does a direct 
mapping to HTTP query params. Would this be useful in your case? Do you need to 
parameterize your HTTP request?


Any real-world insight would be helpful.

Thanks,
- Paul

 

    On Tuesday, March 31, 2020, 1:40:17 PM PDT, Jaimes, Rafael - 0993 - MITLL 
<[email protected]> wrote:  
 
 Ok, I commented in that thread.
I think the proxy is the only missing piece. I tried connecting to a different 
service that is inside the proxy and it worked as expected. This looks like it 
will work well for our application.

FYI, Although it has basic auth, I am not using the authType field in the 
storage config.
Rather, our service authenticates from the header in this format: 
{"Authentication": "Basic <token>"}.

The response JSON is nested quite a bit but I think it can be fixed by 
modifying the SELECT as you have done in your examples.

Thanks,
Rafael

-----Original Message-----
From: Charles Givre <[email protected]> 
Sent: Tuesday, March 31, 2020 3:27 PM
To: [email protected]
Subject: Re: REST data source?

Rafael,
At the moment the plugin does not support proxy servers.  However, this is 
pretty easy to implement using the current libraries.  Could you please add a 
comment to the PR for the plugin (https://github.com/apache/drill/pull/1892 
<https://github.com/apache/drill/pull/1892>) with some explanation of what you 
need?
Thanks,
-- C

> On Mar 31, 2020, at 3:21 PM, Jaimes, Rafael - 0993 - MITLL 
> <[email protected]> wrote:
> 
> Hi Paul,
> 
> I tried that (even tried a vanilla build before on its own) and I run into 
> the same dependency problem. There is something in apache-21.pom that I 
> cannot resolve. If it works for you I am certain it is a config on our end 
> due to the way our proxies and mirrors are setup, we have to go through these 
> internal channels when building and it sometimes causes issues.
> 
> Charles,
> 
> I am almost up and running with your pre-built instance. I have narrowed the 
> problem down to possibly being another proxy issue. The GET requests don't 
> seem to be honoring my system env variable proxy settings. Do you think 
> there's any way to force Drill/plug-in to use a proxy? I'm unable to get the 
> examples you have posted working: getting Connection reset error on HTTPS and 
> Connect time out with HTTP.  The URLs work fine if I test them outside of 
> Drill.
> 
> Thanks,
> Rafael
> 
> -----Original Message-----
> From: Paul Rogers <[email protected]>
> Sent: Tuesday, March 31, 2020 2:36 PM
> To: [email protected]
> Subject: Re: REST data source?
> 
> Hi Rafael,
> 
> The easiest way to build the plugin will be to build all of Drill 1.18 
> Snapshot with the plugin included.
> 
> 1. Grab master from GitHub.
> 
> 2. Merge in Charle's PR branch.
> 
> 3. mvn clean install -DskipTests
> 
> The above usually works for me. This process ensures that all the snapshot 
> versions come from your own build.
> 
> Not sure how we started storing snapshot versions in a Maven repo. 
> This causes issues. If you rebuild part of Drill, and have not built 
> the other parts in more than a day, Maven helpfully downloads the 
> snapshots from the repo, causing all kinds of chaos. (We should fix 
> this.)
> 
> Once you do the build, you'll have a full Drill distribution, just like you'd 
> download. You can use that distribution to run Drill with the plugin included.
> 
> There are other ways that also work; the above may be the simplest.
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Tuesday, March 31, 2020, 10:51:18 AM PDT, Jaimes, Rafael - 0993 - MITLL 
><[email protected]> wrote:  
> 
> Hi Charles,
> 
> (1./2.)
> I have not been able to build Drill, from either a full clone of your tagged 
> http-storage branch or from the standard Drill 1.17 release. 
> I've narrowed it down to some dependency problems from the POM. In 
> particular, I run into issues here:
> 
> Downloading: 
> https://repo.maven.apache.org/maven2/org/apache/apache/21/apache-21.pom
> [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR]  The 
> project org.apache.drill:drill-root:1.18.0-SNAPSHOT 
> (/home/ra29435/drill-official/drill/pom.xml) has 1 error [ERROR]    
> Non-resolvable parent POM: Could not transfer artifact 
> org.apache:apache:pom:21 from/to conjars (http://conjars.org/repo): 
> Connection to http://conjars.org refurelativePath' points at no local POM @ 
> line 24, column 11: Connection timed out (Connection timed out) -> [Help 2] 
> [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with 
> the -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingExcept
> ion [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelExce
> ption
> 
> I think it has something to do with the fact that I normally resolve 
> dependencies from our local Maven repo mirrors. We have no problems getting 
> stuff from Maven Central and common places, but I am unfamiliar with 
> conjars.org. I wonder if it is related to that?
> 
> (3./4.)
> I tried putting the JAR into either jars/ or jars/3rdparty with the same 
> error. I haven't gone down the dependency tree so I have not made and JARs of 
> them, that could be a major thing I'm missing.
> 
> Yes this is still in a testing environment. I'm going to use your pre-built 
> images for testing the REST endpoint, this is extremely helpful. If it works 
> out I'll go back to trying to build it. Also, hoping that this will make its 
> way into the next (1.18) release.
> 
> Best,
> Rafael
> 
> -----Original Message-----
> From: Charles Givre <[email protected]>
> Sent: Tuesday, March 31, 2020 1:34 PM
> To: user <[email protected]>
> Subject: Re: REST data source?
> 
> Hi Rafael,
> Glad you're getting some value from Drill.  Repackaging that directory as a 
> truly pluggable jar is tricky.  A few questions:
> 1.  Did you copy the contrib/storage-http into its own folder and then do a 
> build from that?
> 2.  Did it build successfully?
> 3.  Did you copy the JARs into your Drill jars/3rdparty folder?
> 4.  You'll also have to get JARs of any dependencies as well and copy them to 
> the jars/3rdparty.  Have you done that?
> 
> I actually have a pre-built version of Drill with the storage-http plugin 
> available here: https://github.com/cgivre/drill/releases 
> <https://github.com/cgivre/drill/releases>.  Please do not use that in any 
> kind of production setup.  If you're just wanting to try this out, it might 
> be easier to d/l that and use that.
> -- C
> 
> 
> 
>> On Mar 31, 2020, at 12:57 PM, Jaimes, Rafael - 0993 - MITLL 
>> <[email protected]> wrote:
>> 
>> Hi Charles,
>> 
>> I am trying to use the http-storage plugin from your branch. I put the 
>> storage plug-in files in a jar and tried to keep the jar directory structure 
>> the same as other plug-ins. Upon starting drill-embedded I’m getting the 
>> error below.  I am using your drill-module.conf and 
>> bootstrap-storage-plugins.json from your branch. Is there another step I 
>> need to perform to get Drill to recognize the plug-in? I am using 1.17 
>> release.
>> 
>> Error: Failure in starting embedded Drillbit: 
>> java.lang.IllegalStateException: 
>> com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Could not 
>> resolve type id 'http' as a subtype of [simple type, class
>> org.apache.drill.common.logical.StoragePluginConfig]: known type ids 
>> = [InfoSchemaConfig, SystemTablePluginConfig, file, hbase, hive, 
>> jdbc, kafka, kudu, mock, mongo, named, openTSDB] (for POJO property
>> 'storage') at [Source: (String)"{
>> "storage":{
>>  "http" : {
>>    "type":"http",
>>    "connections": {},
>>    "enabled": false
>>  }
>> }
>> }
>> "; line: 4, column: 14] (through reference chain: 
>> org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.
>> util.LinkedHashMap["http"]) (state=,code=0)
>> 
>> Paul,
>> 
>> I don’t know much about this REST service quite yet (it is internal).  We 
>> utilize REST API where all responses are returned as JSON formatted strings 
>> in many places, I don’t think it is very sophisticated. I am not sure how it 
>> will handle projection and filter issues. My current pipeline involves using 
>> python requests.get() and then unpacking the response string. It does have 
>> an authentication layer, so I am mildly concerned that the 
>> HTTP-storage-plugin will have a hiccup – although it looks like it can use 
>> “Basic”. If I can get Drill to query the endpoint I will report back if I 
>> find anything else that might be useful to you.
>> 
>> Thanks both for your great work with Drill!
>> 
>> -          Rafael
>

Re: REST data source?

Reply via email to