Hi Paul, A simple filter I tried was: WHERE createdAt > TIMESTAMP "2020-02-25"
This wasn't pushed down. I think I recall doing another query where it did send a filter to MongoDB so I was curious what I could expect to be applied at the mongodb level and what would not. Would drill be able to do joins between queries where it pushes down filters for the elements that were found? By the sounds of it, this may be quite far off, which does reduce Drill's appeal vs competitors to some degree. I had hoped that Drill could intelligently merge historical data saved as parquet with the latest data in mongodb, giving a kind of hybrid reporting approach that gives current data without overloading mongodb to pull millions of historical records. However, it sounds like this is not supported yet, and likely won't be for some time. On 2/25/2020 8:19:19 PM, Paul Rogers <[email protected]> wrote: Hi Dobes, Your use case is exactly the one we hope Drill can serve: integrate data from multiple sources. We may have to work on Drill a bit to get it there, however. A quick check of Mongo shows that it does implement filter push down. Check out the class MongoPushDownFilterForScan. The details appear to be in MongoFilterBuilder. This particular implementation appears to be rather limited: it seems to either push ALL filters, or none. A more advanced implementation would push those it can handle, leaving the rest to Drill. There may be limitations; it depends on what the plugin author implemented. What kind of query did you do where you saw no push-down? And, how did you check the plan? Using an EXPLAIN PLAN FOR ... command? If filters are, in fact, pushed down, there has to be some trace in the JSON plan (in some Mongo-specific format.) Given the all-or-nothing limitation of the Mongo plugin implementation, maybe try the simplest possible query such as classID = 10. Filter push-down is a common operation, most implementations are currently (incomplete) copy/pastes of other (incomplete) implementations. We're working to fix that. We had a PR for the standard (col RELOP const) cases, but reviwers asked that it be made more complete. The PR does handle partial filter pushdown. Perhaps, as we move forward, we can apply the same ideas to Mongo. Thanks, - Paul On Tuesday, February 25, 2020, 5:27:53 PM PST, Dobes Vandermeer wrote: Hi, I am trying to understand drill's performance how we can best use it for our project. We use mongo as our primary "live" database and I am looking at syncing data to Amazon S3 and using Drill to run reports off of that. I was hoping that I could have Drill connect directly to mongo for some things. For example: Our software is used to collect responses from school classroom. I thought if I was running a report for students in a given class, I could build the list of students at a school using a query to mongodb. I wanted to verify that drill would push down filters when doing a join, maybe first collecting a list of ids it is interested and use that as a filter when it scans the next mongo collection. However, when I look at the physical plan I don't see any evidence that it would do this, it shows the filter as null in this case. I also tried a query where I filtered on createdAt > date_sub(current_timestamp, interval "1" day) and it didn't apply that as a push-down filter (according to the physical plan tab) whereas I had hoped it would have calculated the resulting timestamp and applied that as a filter when scanning the collection. Is there some rule I can use to predict when a filter will be propagated to the mongo query?
