I second Osma's congrats!

Do we want to take this into account:

https://lists.apache.org/thread.html/dce0d502b11891c28e57bbcbb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org%3E

? In other words, might it be better to factor out between -text and -spatial 
and _then_ try to upgrade the Lucene version?

I don't use the Solr component now, but I could easily see so doing... that's 
pretty vague, I know, and I'm not in a position to do any work to maintain it, 
so consider that just a very small and blurry data point. :)


---
A. Soroka
The University of Virginia Library

> On Feb 28, 2017, at 3:20 AM, Osma Suominen <osma.suomi...@helsinki.fi> wrote:
> 
> Hi Anuj!
> 
> Congratulations for getting the PoC working!
> 
> I'm not sure I like the idea of having a separate jena-text-es module.
> 
> Am I right that your main concern with creating a separate module is that the 
> Elasticsearch client library requires a newer Lucene version than what 
> jena-text currently uses? In that case, I think the solution should be 
> upgrading the Lucene version everywhere, i.e. the current jena-text and 
> jena-spatial modules. This work has already started (see JENA-1250) but it 
> has recently stalled and has not yet been merged.
> 
> I don't think it should be a problem to have multiple implementations 
> (Lucene, Solr, ES) within the same module. Ideally a lot of the 
> infrastructure could be shared (which is of course possible also with 
> separate modules, as you have done), and I would hope that also the unit 
> tests could be reused for the different implementations, although that is 
> currently not the case (the unit tests only target Lucene).
> 
> The Solr side of jena-text has unfortunately bitrotted even more than the 
> Lucene support. I've previously suggested that it should be removed entirely 
> [1], but there were no responses to my suggestion at the time.
> 
> -Osma
> 
> [1] https://www.mail-archive.com/dev@jena.apache.org/msg16380.html
> 
> 27.02.2017, 14:08, anuj kumar kirjoitti:
>> Hi All,
>> 
>> *Apologies for the long email.*
>> 
>> As some of you know, I have been working on extending Jena to Support
>> ElasticSearch for Text Indexing (in addition to Lucene and Solr).
>> 
>> I have come to a point where I have a basic (read non-prod) code that can
>> index RDFS:label text data into ElasticSearch 5.2.1
>> The code is working and testable. You simply have to download elasticsearch
>> 5.2.1 and run it locally for executing the test within  the ES
>> implementation.
>> The code is NOT production Ready but just a PoC code.  You can find the
>> first cut of the code here: https://github.com/EaseTech/jena (look inside
>> the module jena-text-es)
>> 
>> I need feedback from Jena maintainers and community, in terms of the
>> structuring of the code as this is important for me to finalize before I
>> move to implement the full blown Production Ready code for Jean Text
>> ElasticSearch Integration.
>> 
>> Here is the short description of what I did and the reasoning behind it:
>> 
>> 1. Created a separate module : *jena-text-es *that extends from *jena-text*
>> AND excludes all the Lucene related and Solr related dependencies. The
>> reason I had to do it was that* jena-text* module depends on Lucene version
>> 4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
>> resulting in the conflicts of Lucene version if I created the code for
>> ElasticSearch support within the *jena-text *module. Thus the need to
>> create a separate module.
>> 2. A side effect of creating a separate module meant, I had to extend the
>> TextDataSetFactory.java class present in the *jena-text *module to include
>> methods for creating ElasticSearch index objects. I named it
>> ESTextDataSetFactory. At this point in time I do not know if this is the
>> right approach or if Jena ALWAYS instantiates Index objects using the
>> TextDataSetFactory.java class. My initial investigation showed it is fine,
>> but I want the people who are experts in Jena to please confirm.
>> 3. I have tested a simple integration with ElasticSearch by defining a test
>> class under
>> src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
>> run this test by first starting an instance of Elasticsearch 5.2.1 locally.
>> 
>> *My Queries*
>> 1. Is it acceptable by the Jena community that I create a separate module
>> for support of ElasticSearch and call it *jena-text-es*?
>> 2. Is it fine if I extend the TextDataSetFactory.java class within the
>> *jena-text-es
>> *module?
>> 
>> *Food for Thought*
>> 
>> While implementing the ElasticSearch Integration, I could not help but
>> notice that the module *jena-text *not only contains the core classes for
>> performing text queries, but also contains technology specific (for eg.
>> Lucene and Solr) classes.
>> IMO, these should be separate and defined in their own modules to enable
>> separation of concerns.
>> This will also help in easier maintenance and extensions to be added later
>> on.
>> 
>> I think we should have the following modules:
>> 
>> jena-text - Containing core Jena text specific classes that are technology
>> agnostic.
>> jena-text-lucene - Lucene specific implementation of Jena-Text
>> jena-text-solr - Solr specific implementation of Jena-Text
>> jena-text-es - ElasticSearch specific implementation of Jena-Text
>> 
>> What does everyone think?
>> 
>> Thanks,
>> Anuj Kumar
>> 
>> 
>> On Tue, Feb 14, 2017 at 2:27 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:
>> 
>>> My saviour Osma. It worked :)
>>> Thanks for pointing that out. Really appreciate it.
>>> I am now to my next task. Implementing the actual code for ElasticSearch
>>> integration with Jena.
>>> 
>>> Thanks once again.
>>> 
>>> Anuj Kumar
>>> 
>>> On Tue, Feb 14, 2017 at 2:22 PM, Osma Suominen <osma.suomi...@helsinki.fi>
>>> wrote:
>>> 
>>>> 14.02.2017, 15:15, anuj kumar kirjoitti:
>>>> 
>>>>> I will do it. But I need to first get the simple test working in order to
>>>>> move forward. I hope I someone here can help me.
>>>>> 
>>>> 
>>>> Maybe you need to add an implementWith declaration to TextAssembler.java?
>>>> 
>>>> 
>>>> -Osma
>>>> 
>>>> --
>>>> Osma Suominen
>>>> D.Sc. (Tech), Information Systems Specialist
>>>> National Library of Finland
>>>> P.O. Box 26 (Kaikukatu 4)
>>>> 00014 HELSINGIN YLIOPISTO
>>>> Tel. +358 50 3199529
>>>> osma.suomi...@helsinki.fi
>>>> http://www.nationallibrary.fi
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> *Anuj Kumar*
>>> 
>> 
>> 
>> 
> 
> 
> -- 
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suomi...@helsinki.fi
> http://www.nationallibrary.fi

Reply via email to