I wonder what would happen if you use the Simba JDBC connector with Drill.  Is 
drill's JDBC storage plugin better at pushing down filters and joins than its 
mongo connector?


This might help me solve some of my own problems, although I'm not sure whether 
pricing issues might come up with an auto-scaling environment.


On 3/4/2020 11:31:29 PM, Paul Rogers <[email protected]> wrote:
Hi Ron,

Sounds like the good news is that Drill is about as good as Presto when 
querying Mongo. Sounds like the bad news is that both are equally deficient. On 
the other hand, the other good news is that better performance is just a matter 
of adding additional planning rules (with perhaps some Mongo metadata.)


The Wikipedia page for Mongo [1] suggests several features that Mongo (Simba) 
is probably using in their own JDBC driver, but which Drill probably does not 
use:

* Primary and secondary indices
* Field, range query, and regular-expression searches
* User-defined JavaScript functions
* Three ways to perform aggregation: the aggregation pipeline, the map-reduce 
function, and single-purpose aggregation methods.

My guess is that the Mongo JDBC driver does thorough planning to exploit each 
of the above functions, while Drill may use only a few. We already noted other 
weaknesses in the filter push-down code for the Drill Mongo plugin. Seems 
fixable if we can put in the effort.


Seems Mongo provides a Simba JDBC driver, which is proprietary, so no source 
code is available we could use as a "cheat sheet" to see what's what.


Just out of curiosity, what is the query that works well with the Mongo JDBC 
driver, but poorly with Drill?

Anybody know more about how Mongo works and what Drill might be missing?


Thanks,
- Paul

[1] https://en.wikipedia.org/wiki/MongoDB





On Wednesday, March 4, 2020, 9:28:44 PM PST, Ron Cecchini wrote:

Hi, guys.

This is actually more of a Mongo question than a Drill-specific question as it 
also applies to Presto + Mongo, and the vanilla Mongo shell as well.

I'm asking here, though, because, well, I'm curious, and because you're the 
database geniuses...

So, I essentially get why a NoSQL database, in general, wouldn't be as 
performant as a SQL one at "relational" things. From what I gather, there are 
denormalization and optimization techniques and tricks you can use to speed up 
a Mongo query and so forth, but my question is:

Why is it that any Drill/Presto + Mongo CLI or JDBC query against a large 
collection (100-200 million documents) that includes even a single WHERE 
clause, or the Mongo equivalent query made via Mongo shell, basically never 
returns and has to be killed, whereas the same (Mongo equivalent) query against 
the same collection made via *Mongo's* JDBC driver takes only a second or two?

Is the Mongo JDBC using some indexing that the others aren't? (But how would 
that explain Mongo shell's non-performance... Why doesn't Mongo shell just make 
a JDBC call to the db...)

Thank you in advance for educating me.

Ron

Reply via email to