Querying nested data is very easy in MarkLogic, it was built for that. I used to work there.
The founder is a former search engine guy from Infoseek and Ultraseek, so it has a lot of familiar behavior, like merging segments automatically. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jul 23, 2014, at 7:25 AM, Jay Vyas <jayunit100.apa...@gmail.com> wrote: > Querying nested data is very difficult in any modern db that I have seen. > > If It works as you suggest then It would be cool if the feature was it going > to be eventually maintained inside solr. > >> On Jul 23, 2014, at 7:13 AM, Renaud Delbru <renaud@siren.solutions> wrote: >> >> One of the coolest features of Lucene/Solr is its ability to index nested >> documents using a Blockjoin approach. >> >> While this works well for small documents and document collections, it >> becomes unsustainable for larger ones: Blockjoin works by splitting the >> original document in many documents, one per nested record. >> >> For example, a single USPTO patent (XML format converted to JSON) will end >> up being over 1500 documents in the index. This has massive implications on >> performance and scalability. >> >> Introducing SIREn >> >> SIREn is an open source plugin for Solr for indexing and searching rich >> nested JSON data. >> >> SIREn uses a sophisticated "tree indexing" design which ensures that the >> index is not artificially inflated. This ensures that querying on many types >> of nested queries can be up to 3x faster. Further, depending on the data, >> memory requirements for faceting can be up to 10x higher. As such, SIREn >> allows you to use Solr for larger and more complex datasets, especially so >> for sophisticated analytics. (You can read our whitepaper to find out more >> [1]) >> >> SIREn is also truly schemaless - it even allows you to change the type of a >> property between documents without being restricted by a defined mapping. >> This can be very useful for data integration scenarios where data is >> described in different ways in different sources. >> >> You only need a few minutes to download and try SIREn [2]. It comes with a >> detailed manual [3] and you have access to the code on GitHub [4]. >> >> We look forward to hear about your feedbacks. >> >> [1] >> http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/ >> [2] http://siren.solutions/siren/downloads/ >> [3] http://siren.solutions/manual/preface.html >> [4] https://github.com/sindicetech/siren >> -- >> Renaud Delbru >> CTO >> SIREn Solutions