Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

Walter Underwood Wed, 23 Jul 2014 09:31:20 -0700

Querying nested data is very easy in MarkLogic, it was built for that. I used 
to work there.


The founder is a former search engine guy from Infoseek and Ultraseek, so it 
has a lot of familiar behavior, like merging segments automatically.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Jul 23, 2014, at 7:25 AM, Jay Vyas <jayunit100.apa...@gmail.com> wrote:

> Querying nested data is very difficult in any modern db that I have seen.
> 
> If It works as you suggest then It would be cool if the feature was it going 
> to be eventually maintained inside solr.
> 
>> On Jul 23, 2014, at 7:13 AM, Renaud Delbru <renaud@siren.solutions> wrote:
>> 
>> One of the coolest features of Lucene/Solr is its ability to index nested 
>> documents using a Blockjoin approach.
>> 
>> While this works well for small documents and document collections, it 
>> becomes unsustainable for larger ones: Blockjoin works by splitting the 
>> original document in many documents, one per nested record.
>> 
>> For example, a single USPTO patent (XML format converted to JSON) will end 
>> up being over 1500 documents in the index. This has massive implications on 
>> performance and scalability.
>> 
>> Introducing SIREn
>> 
>> SIREn is an open source plugin for Solr for indexing and searching rich 
>> nested JSON data.
>> 
>> SIREn uses a sophisticated "tree indexing" design which ensures that the 
>> index is not artificially inflated. This ensures that querying on many types 
>> of nested queries can be up to 3x faster. Further, depending on the data, 
>> memory requirements for faceting can be up to 10x higher. As such, SIREn 
>> allows you to use Solr for larger and more complex datasets, especially so 
>> for sophisticated analytics. (You can read our whitepaper to find out more 
>> [1])
>> 
>> SIREn is also truly schemaless - it even allows you to change the type of a 
>> property between documents without being restricted by a defined mapping. 
>> This can be very useful for data integration scenarios where data is 
>> described in different ways in different sources.
>> 
>> You only need a few minutes to download and try SIREn [2]. It comes with a 
>> detailed manual [3] and you have access to the code on GitHub [4].
>> 
>> We look forward to hear about your feedbacks.
>> 
>> [1] 
>> http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/
>> [2] http://siren.solutions/siren/downloads/
>> [3] http://siren.solutions/manual/preface.html
>> [4] https://github.com/sindicetech/siren
>> -- 
>> Renaud Delbru
>> CTO
>> SIREn Solutions

Re: [ANN] SIREn, a Lucene/Solr plugin for rich JSON data search

Reply via email to