Hi All,

*Apologies for the long email.*

 As some of you know, I have been working on extending Jena to Support
ElasticSearch for Text Indexing (in addition to Lucene and Solr).

I have come to a point where I have a basic (read non-prod) code that can
index RDFS:label text data into ElasticSearch 5.2.1
The code is working and testable. You simply have to download elasticsearch
5.2.1 and run it locally for executing the test within  the ES
implementation.
The code is NOT production Ready but just a PoC code.  You can find the
first cut of the code here: https://github.com/EaseTech/jena (look inside
the module jena-text-es)

I need feedback from Jena maintainers and community, in terms of the
structuring of the code as this is important for me to finalize before I
move to implement the full blown Production Ready code for Jean Text
ElasticSearch Integration.

Here is the short description of what I did and the reasoning behind it:

1. Created a separate module : *jena-text-es *that extends from *jena-text*
AND excludes all the Lucene related and Solr related dependencies. The
reason I had to do it was that* jena-text* module depends on Lucene version
4.9.1 whereas ElasticSearch 5.2.1 version depends on Lucene 6.4.1. This was
resulting in the conflicts of Lucene version if I created the code for
ElasticSearch support within the *jena-text *module. Thus the need to
create a separate module.
2. A side effect of creating a separate module meant, I had to extend the
TextDataSetFactory.java class present in the *jena-text *module to include
methods for creating ElasticSearch index objects. I named it
ESTextDataSetFactory. At this point in time I do not know if this is the
right approach or if Jena ALWAYS instantiates Index objects using the
TextDataSetFactory.java class. My initial investigation showed it is fine,
but I want the people who are experts in Jena to please confirm.
3. I have tested a simple integration with ElasticSearch by defining a test
class under
src/test/java/org/apache/jena/query/text/TestBuildTextDataSet.java. You can
run this test by first starting an instance of Elasticsearch 5.2.1 locally.

*My Queries*
1. Is it acceptable by the Jena community that I create a separate module
for support of ElasticSearch and call it *jena-text-es*?
2. Is it fine if I extend the TextDataSetFactory.java class within the
*jena-text-es
*module?

*Food for Thought*

While implementing the ElasticSearch Integration, I could not help but
notice that the module *jena-text *not only contains the core classes for
performing text queries, but also contains technology specific (for eg.
Lucene and Solr) classes.
IMO, these should be separate and defined in their own modules to enable
separation of concerns.
This will also help in easier maintenance and extensions to be added later
on.

I think we should have the following modules:

jena-text - Containing core Jena text specific classes that are technology
agnostic.
jena-text-lucene - Lucene specific implementation of Jena-Text
jena-text-solr - Solr specific implementation of Jena-Text
jena-text-es - ElasticSearch specific implementation of Jena-Text

What does everyone think?

Thanks,
Anuj Kumar


On Tue, Feb 14, 2017 at 2:27 PM, anuj kumar <anuj.gandh...@gmail.com> wrote:

> My saviour Osma. It worked :)
> Thanks for pointing that out. Really appreciate it.
> I am now to my next task. Implementing the actual code for ElasticSearch
> integration with Jena.
>
> Thanks once again.
>
> Anuj Kumar
>
> On Tue, Feb 14, 2017 at 2:22 PM, Osma Suominen <osma.suomi...@helsinki.fi>
> wrote:
>
>> 14.02.2017, 15:15, anuj kumar kirjoitti:
>>
>>> I will do it. But I need to first get the simple test working in order to
>>> move forward. I hope I someone here can help me.
>>>
>>
>> Maybe you need to add an implementWith declaration to TextAssembler.java?
>>
>>
>> -Osma
>>
>> --
>> Osma Suominen
>> D.Sc. (Tech), Information Systems Specialist
>> National Library of Finland
>> P.O. Box 26 (Kaikukatu 4)
>> 00014 HELSINGIN YLIOPISTO
>> Tel. +358 50 3199529
>> osma.suomi...@helsinki.fi
>> http://www.nationallibrary.fi
>>
>
>
>
> --
> *Anuj Kumar*
>



-- 
*Anuj Kumar*

Reply via email to