Re: Drill + Accumulo?

Paul Rogers Wed, 19 Feb 2020 15:42:23 -0800

Hi Ron,

Given that I know next to nothing about  Accumulo other than what I just 
learned from a Google search, the answer appears to be no. The best approach 
would be to write a connector that exploits Accumulo's new 2.0 AccumuloClient 
API, perhaps along with the new Scan Executors. See [1].


I thought I had read in the past that Accumlo grew out of some other project, 
but the project history of chapter 1 of the O'Reilly Accumulo book [2] suggests 
that Accumulo was built independently. That said, to the degree that Accumulo 
has similarities with HBase, another path is to fork/extend Drill's HBase 
connector.

Yet another solution would be to use a REST API, leveraging the REST connector 
that is being built. However, a quick review of the "Accumlo Clients" page of 
the docs [3] does not suggest that Accumulo ships with a REST API. Perhaps some 
other project has added one? Of course, this just shifts the work to create an 
Accumulo connector to the work to force a generic REST connector to issue the 
requests, and read the responses that the REST proxy might use. Not sure that 
is a huge win.

Accumulo appears to have Hive integration [4], as does Drill. I wonder if that 
is possible path? I'm not very familiar with how Drill reads data from Hive, 
but if we use Hive's record format, and Accumulo can produce that format, there 
might be a path. Not sure how things like filter push-down would be handled.

All this said, Drill is designed to allow connectors. The API is not as simple 
as we'd like (we're working on it), if you need SQL access to Accumulo, writing 
a connector is a possible path.

Finally, if you need a SQL solution today, there is Presto, which already has 
an Accumulo connector. [5].


Thanks,
- Paul

[1] https://accumulo.apache.org/release/accumulo-2.0.0/

[2] https://learning.oreilly.com/library/view/accumulo/9781491947098/

[3] https://accumulo.apache.org/docs/2.x/getting-started/clients

[4] https://cwiki.apache.org/confluence/display/Hive/AccumuloIntegration

[5]  https://prestosql.io/docs/current/connector/accumulo.html







 

    On Wednesday, February 19, 2020, 2:40:17 PM PST, Ron Cecchini 
<[email protected]> wrote:  
 
 (Keeping in mind that I know next to nothing about Accumulo or Hive, etc...)

Is there currently (i.e. without writing a connector) any way whatsoever to get 
Drill to query an Accumulo db? 

Thanks.

Re: Drill + Accumulo?

Reply via email to