I personally have no preference as to how the code in Jena should be structured, as long as I am able to use it :). I have personal preference of doing it in a specific way because IMO, it is modular which makes it much easier to maintain in the long run. But again it may not be the quickest one.
I already have been given a deadline, by the company to have ES extension implemented in the next 15 days :). What this means is that I will be maintaining the ES code extension to Jena Text at-least locally for a coming period of time. I would be more than happy to contribute to Jena community whatever is required to have a proper ElasticSearch Implementation in place, whether within jena-text module or as a separate module. Till the time Lucene and Solr is not upgraded to the latest version, I will have to maintain a separate module for jena-text-es. Cheers! Anuj Kumar On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote: > Osma-- > > The short answer is that yes, given the right tools you _can_ have > different versions of code accessible in different ways. The longer answer > is that it's probably not a viable alternative for Jena for this problem, > at least not without a lot of other change. > > You are right to point to the classloader mechanism as being at the heart > of this question, but I must alter your remark just slightly. From "the > Java classloader only sees a single, flat package/class namespace and a set > of compiled classes" to "ANY GIVEN Java classloader only sees a single, > flat package/class namespace and a set of compiled classes". > > This is the fact that OSGi uses to make it possible to maintain strict > module boundaries (and even dynamic module relationships at run-time). Each > OSGi bundle sees its own classloader, and the framework is responsible for > connecting bundles up to ensure that every bundle has what it needs in the > way of types to function, based on metadata that the bundles provide to the > framework. It's an incredibly powerful system (I use it every day and enjoy > it enormously) but it's also very "heavy" and requires a good deal of > investment to use. In particular, it's probably too large to put _inside_ > Jena. (I frequently put Jena inside an OSGi instance, on the other hand.) > > Java 9 Jigsaw [1] offers some possibility for strong modularization of > this kind, but it's really meant for the JDK itself, not application > libraries. In theory, we could "roll our own" classloader management for > this problem. That sounds like more than a bit of a rabbit hole to me. > There might be another, more lightweight, toolkit out there to this > purpose, but I'm not aware of any myself. > > Otherwise, yes, you get into shading and the like. We have to do that for > Guava for now because of HADOOP-10101 (grumble grumble) but it's hardly a > thing we want to do any more of than needed, I don't think. > > --- > A. Soroka > The University of Virginia Library > > [1] http://openjdk.java.net/projects/jigsaw/ > > > On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi> > wrote: > > > > Hi Anuj! > > > > Thanks for the clarification. > > > > However, I'm still not sure I understand the situation completely. I > know Maven can perform a lot of tricks, but Maven modules are just > convenient ways to structure a Java project. Maven cannot change the fact > that at runtime, module divisions don't really matter (except that they > usually correspond to package sub-namespaces) and the Java classloader only > sees a single, flat package/class namespace and a set of compiled classes > (usually within JARs) in the classpath that it needs to check to find the > right classes, and if there are two versions of the same library (eg > Lucene) with overlapping class names, that's going to cause trouble. The > only way around that is to shade some of the libraries, i.e. rename them so > that they end up in another, non-conflicting namespace. Apparently > Elasticsearch also did some of that in the past [1] but nowadays tries to > avoid it. > > > > Does your assumption 1 ("At a given point in time, only a single > Indexing Technology is used") imply that in the assembler configuration, > you cannot have ja:loadClass declarations for both Lucene and ES backends? > Or how do you run something like Fuseki that contains (in a single big JAR) > both the jena-text and jena-text-es modules with all their dependencies, > one of which requires the Lucene 4.x classes and the other one the Lucene > 6.4.1 classes? How do you ensure that only one of them is used at a time, > and that the Java classloader, even though it has access to both versions > of Lucene, only loads classes from the single, correct one and not the > other? Or do you need to have separate "Fuseki-Lucene" and "Fuseki-ES" > packages, so that you don't end up with two Lucene versions within the same > Fuseki JAR? > > > > -Osma > > > > [1] https://www.elastic.co/blog/to-shade-or-not-to-shade > > > > 01.03.2017, 11:03, anuj kumar kirjoitti: > >> Hi Osma, > >> > >> I understand what you are saying. There are ways to mitigate risks and > >> balance the refactoring without affecting the existing modules. But I > will > >> not delve into those now. I am not an expert in Jena to convincingly say > >> that it is possible, without any hiccups. But I can take a guess and say > >> that it is indeed possible :) > >> > >> For the question: "is it even possible to mix modules that depend on > >> different versions of the Lucene libraries within the same project?" > >> > >> I actually do not understand what you mean by mixing modules. I assume > you > >> mean having jena-text and jena-text-es as dependencies in a build > without > >> causing the build to conflict. If that is what you mean than the answer > is > >> yes it is possible and quite simple as well. Let me explain how it is > >> possible. But before that some assumption which I want to call out > >> explicitly. > >> > >> *Assumption:* > >> 1. At a given point in time, only a single Indexing Technology is used > for > >> text based indexing and searching via Jean. What this means is that we > will > >> either use Lucene Implementation OR Solr Implementation OR ES > >> Implementation at any given point in time. > >> 2. Fuseki build does not depend on any Lucene 4.9.1 specific classes but > >> only on jena-text classes, if at all. > >> > >> Based on these assumptions it is possible to create a build that > contains > >> jena-text based common classes + ES specific classes without any > >> compatibility issues. And it is infact quite simple. I did it in the > >> current jena-text-es module and ran the entire build which succeeded. > >> The key is to include the latest Lucene dependencies at the very > beginning > >> in the pom and then include jena-text dependency. Maven will then > >> automatically resolve the dependency issues by including the Lucene > >> librarires that we included in our es specific pom. Have a look the pom > of > >> jena-text-es module here to see how it can be done : > >> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml > >> > >> > >> Thanks, > >> Anuj Kumar > >> > >> > >> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < > osma.suomi...@helsinki.fi> > >> wrote: > >> > >>> Hi Anuj, > >>> > >>> I understand your concerns. However, we also need to balance between > the > >>> needs of individual modules/features and the whole codebase. I'm > willing to > >>> put in the effort to keep the other modules up to date with newer > Lucene > >>> versions. Lucene upgrade requirements are well documented, the only > hitches > >>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene > >>> features that were dropped from newer versions. > >>> > >>> A perhaps stupid question to more experienced Java developers: is it > even > >>> possible to mix modules that depend on different versions of the Lucene > >>> libraries within the same project? In my (quite limited) understanding > of > >>> Java projects and libraries, this requires special arrangements (e.g. > >>> shading) as the Java package/class namespace is shared by all the code > >>> running within the same JVM. > >>> > >>> So can you create, say, a Fuseki build that contains the current > jena-text > >>> module (depending on Lucene 4.x) and the new jena-text-es module > (depending > >>> on Lucene 6.4.1) without any compatibility issues? > >>> > >>> -Osma > >>> > >>> > >>> > >>> > >>> 01.03.2017, 00:47, anuj kumar kirjoitti: > >>> > >>>> Hi, > >>>> > >>>> My 2 Cents : > >>>> > >>>> The reason I proposed to have separate modules for Lucene, Solr and > ES is > >>>> exactly for avoiding the "All or Nothing" approach we need to take if > we > >>>> club them all together. If they stay together and if in the near > future I > >>>> want to upgrade ES to another version, I also need to again upgrade > Lucene > >>>> and Solr and possibly another implementation that may have been added > >>>> during the time. As we all know, this means weeks of work if not > months to > >>>> get the changes released. This will personally de-motivate me to do > >>>> anything and I will probably start maintaining my version of > Jena-Text as > >>>> that would be much simpler to do than to upgrade and test and in the > >>>> process own(read fix bugs) the upgrade for each and every technology. > >>>> > >>>> If they are developed as separate modules, they can evolve > independently > >>>> of > >>>> each other and we can avoid situations where we cant upgrade to latest > >>>> version of Lucene because we do not know what effect it will have on > Solr > >>>> Implementation. > >>>> > >>>> We can start with having a separate Module for Jena Text ES and see > how > >>>> things go. If they go well, we could extract out Solr and Lucene out > of > >>>> Jena Text. > >>>> > >>>> Again this is just a suggestion based on my limited industry > experience. > >>>> > >>>> Thanks, > >>>> Anuj Kumar > >>>> > >>>> > >>>> > >>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < > osma.suomi...@helsinki.fi > >>>>> > >>>> wrote: > >>>> > >>>> 28.02.2017, 17:12, A. Soroka kirjoitti: > >>>>> > >>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc > >>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena.apache.org > %3E > >>>>>> ? In other words, might it be better to factor out between -text and > >>>>>> -spatial and _then_ try to upgrade the Lucene version? > >>>>>> > >>>>>> > >>>>> I certainly wouldn't object to that, but somebody has to volunteer > to do > >>>>> the actual work! > >>>>> > >>>>> I don't use the Solr component now, but I could easily see so > doing... > >>>>> > >>>>>> that's pretty vague, I know, and I'm not in a position to do any > work to > >>>>>> maintain it, so consider that just a very small and blurry data > point. > >>>>>> :) > >>>>>> > >>>>>> > >>>>> Last time I tried it (it was a while ago) I couldn't figure out how > to > >>>>> get > >>>>> it running... If you could just try that with some toy data, then > your > >>>>> data > >>>>> point would be a lot less blurry :) I haven't used Solr for > anything, so > >>>>> I'm not very familiar with how to set it up, and the jena-text > >>>>> instructions > >>>>> are pretty vague unfortunately. > >>>>> > >>>>> > >>>>> -Osma > >>>>> > >>>>> > >>>>> -- > >>>>> Osma Suominen > >>>>> D.Sc. (Tech), Information Systems Specialist > >>>>> National Library of Finland > >>>>> P.O. Box 26 (Kaikukatu 4) > >>>>> 00014 HELSINGIN YLIOPISTO > >>>>> Tel. +358 50 3199529 > >>>>> osma.suomi...@helsinki.fi > >>>>> http://www.nationallibrary.fi > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>> > >>> -- > >>> Osma Suominen > >>> D.Sc. (Tech), Information Systems Specialist > >>> National Library of Finland > >>> P.O. Box 26 (Kaikukatu 4) > >>> 00014 HELSINGIN YLIOPISTO > >>> Tel. +358 50 3199529 > >>> osma.suomi...@helsinki.fi > >>> http://www.nationallibrary.fi > >>> > >> > >> > >> > > > > > > -- > > Osma Suominen > > D.Sc. (Tech), Information Systems Specialist > > National Library of Finland > > P.O. Box 26 (Kaikukatu 4) > > 00014 HELSINGIN YLIOPISTO > > Tel. +358 50 3199529 > > osma.suomi...@helsinki.fi > > http://www.nationallibrary.fi > > -- *Anuj Kumar*