Hey, I just saw https://issues.apache.org/jira/browse/JENA-1301 Should we not first officially deprecate it and gives any users of Solr a chance to move to different Indexing technology?
BTW, I dont know yet how to login to apache JIRA. Thanks, Anuj Kumar On Fri, Mar 3, 2017 at 1:23 PM, anuj kumar <anuj.gandh...@gmail.com> wrote: > I Osma, > I briefly looked at the pull request. I beieve we need to upgrade Lucene > and Solr in one go, isnt it. The reason being Solr 4.9.1 depends on Lucene > 4.9.1 > > Also how do i log into issues.apache.org and where to file this bug? > > Thanks, > Anuj Kumar > > On Fri, Mar 3, 2017 at 11:22 AM, Osma Suominen <osma.suomi...@helsinki.fi> > wrote: > >> Hi Anuj, >> >> It's great that we found agreement over this! >> >> I've restarted the Lucene upgrade effort (JENA-1250) that had stalled and >> made a PR [1] that implements the upgrade up to version 6.4.1 (with 5.5.4 >> as an intermediate step). I'll wait for comments on the PR and if people >> think it's OK I will merge it soon to Jena master. Meanwhile, you can >> already base your ES implementation on that branch [2] if you like. >> >> Could you please open a JIRA issue on issues.apache.org explaining the >> Elasticsearch support feature, so that we have a place for tracking this >> work, request comments etc. >> >> Also I suggest we move the discussion around this to the developers' list >> (d...@jena.apache.org) where it's more appropriate. >> >> -Osma >> >> [1] https://github.com/apache/jena/pull/219 >> >> [2] https://github.com/osma/jena/tree/jena-1250-lucene6 >> >> >> 03.03.2017, 02:45, anuj kumar kirjoitti: >> >>> I second that. I am now finalising the integration of ES and should have >>> a >>> good production quality implementation ready in a week's time. At that >>> time I would want you guys to have a look at the implementation and >>> provide >>> feedback. Once you guys have upgraded Lucene to 6.4.1 , I can merge the >>> code in jena-text module and do a round of testing. >>> >>> Thanks, >>> Anuj Kumar >>> >>> On 2 Mar 2017 22:28, "A. Soroka" <aj...@virginia.edu> wrote: >>> >>> I do agree that trying to juggle different versions of Lucene libraries >>>> is >>>> probably not a realistic option right now. Luckily (if I understand the >>>> conversation thus far correctly) we have a solid alternative; getting >>>> our >>>> current Lucene dependency upgraded should allow us to (eventually) merge >>>> Anuj's work into the mainstream of development. Someone please tell me >>>> if I >>>> have that wrong! :grin: >>>> >>>> Let me reiterate that this seems like very good work and speaking for >>>> myself, I certainly want to get it included into Jena. It's just a >>>> question >>>> of fitting it in correctly, which might take a bit of time. >>>> >>>> --- >>>> A. Soroka >>>> The University of Virginia Library >>>> >>>> On Mar 1, 2017, at 1:27 PM, Osma Suominen <osma.suomi...@helsinki.fi> >>>>> >>>> wrote: >>>> >>>>> >>>>> Hi Anuj! >>>>> >>>>> I have nothing against modularity in general. However, I cannot see how >>>>> >>>> your proposal could work in practice for the Fuseki build, due to the >>>> reasons I mentioned in my previous message (and Adam seemed to concur). >>>> >>>>> >>>>> In any case, I'll see what I can do to get the Lucene upgrade moving >>>>> >>>> again. If all current Jena modules (ie jena-text and jena-spatial) were >>>> upgraded to Lucene 6.4.1, then you could just add your ES classes to >>>> jena-text, right? I think that would be better for everyone than having >>>> to >>>> maintain your own separate module. >>>> >>>>> >>>>> -Osma >>>>> >>>>> 01.03.2017, 16:59, anuj kumar kirjoitti: >>>>> >>>>>> I personally have no preference as to how the code in Jena should be >>>>>> structured, as long as I am able to use it :). >>>>>> I have personal preference of doing it in a specific way because IMO, >>>>>> >>>>> it is >>>> >>>>> modular which makes it much easier to maintain in the long run. But >>>>>> >>>>> again >>>> >>>>> it may not be the quickest one. >>>>>> >>>>>> I already have been given a deadline, by the company to have ES >>>>>> >>>>> extension >>>> >>>>> implemented in the next 15 days :). What this means is that I will be >>>>>> maintaining the ES code extension to Jena Text at-least locally for a >>>>>> coming period of time. I would be more than happy to contribute to >>>>>> Jena >>>>>> community whatever is required to have a proper ElasticSearch >>>>>> Implementation in place, whether within jena-text module or as a >>>>>> >>>>> separate >>>> >>>>> module. Till the time Lucene and Solr is not upgraded to the latest >>>>>> version, I will have to maintain a separate module for jena-text-es. >>>>>> >>>>>> Cheers! >>>>>> Anuj Kumar >>>>>> >>>>>> >>>>>> On Wed, Mar 1, 2017 at 3:36 PM, A. Soroka <aj...@virginia.edu> wrote: >>>>>> >>>>>> Osma-- >>>>>>> >>>>>>> The short answer is that yes, given the right tools you _can_ have >>>>>>> different versions of code accessible in different ways. The longer >>>>>>> >>>>>> answer >>>> >>>>> is that it's probably not a viable alternative for Jena for this >>>>>>> >>>>>> problem, >>>> >>>>> at least not without a lot of other change. >>>>>>> >>>>>>> You are right to point to the classloader mechanism as being at the >>>>>>> >>>>>> heart >>>> >>>>> of this question, but I must alter your remark just slightly. From "the >>>>>>> Java classloader only sees a single, flat package/class namespace and >>>>>>> >>>>>> a set >>>> >>>>> of compiled classes" to "ANY GIVEN Java classloader only sees a single, >>>>>>> flat package/class namespace and a set of compiled classes". >>>>>>> >>>>>>> This is the fact that OSGi uses to make it possible to maintain >>>>>>> strict >>>>>>> module boundaries (and even dynamic module relationships at >>>>>>> run-time). >>>>>>> >>>>>> Each >>>> >>>>> OSGi bundle sees its own classloader, and the framework is responsible >>>>>>> >>>>>> for >>>> >>>>> connecting bundles up to ensure that every bundle has what it needs in >>>>>>> >>>>>> the >>>> >>>>> way of types to function, based on metadata that the bundles provide >>>>>>> >>>>>> to the >>>> >>>>> framework. It's an incredibly powerful system (I use it every day and >>>>>>> >>>>>> enjoy >>>> >>>>> it enormously) but it's also very "heavy" and requires a good deal of >>>>>>> investment to use. In particular, it's probably too large to put >>>>>>> >>>>>> _inside_ >>>> >>>>> Jena. (I frequently put Jena inside an OSGi instance, on the other >>>>>>> >>>>>> hand.) >>>> >>>>> >>>>>>> Java 9 Jigsaw [1] offers some possibility for strong modularization >>>>>>> of >>>>>>> this kind, but it's really meant for the JDK itself, not application >>>>>>> libraries. In theory, we could "roll our own" classloader management >>>>>>> >>>>>> for >>>> >>>>> this problem. That sounds like more than a bit of a rabbit hole to me. >>>>>>> There might be another, more lightweight, toolkit out there to this >>>>>>> purpose, but I'm not aware of any myself. >>>>>>> >>>>>>> Otherwise, yes, you get into shading and the like. We have to do that >>>>>>> >>>>>> for >>>> >>>>> Guava for now because of HADOOP-10101 (grumble grumble) but it's >>>>>>> >>>>>> hardly a >>>> >>>>> thing we want to do any more of than needed, I don't think. >>>>>>> >>>>>>> --- >>>>>>> A. Soroka >>>>>>> The University of Virginia Library >>>>>>> >>>>>>> [1] http://openjdk.java.net/projects/jigsaw/ >>>>>>> >>>>>>> On Mar 1, 2017, at 9:03 AM, Osma Suominen <osma.suomi...@helsinki.fi >>>>>>>> > >>>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi Anuj! >>>>>>>> >>>>>>>> Thanks for the clarification. >>>>>>>> >>>>>>>> However, I'm still not sure I understand the situation completely. I >>>>>>>> >>>>>>> know Maven can perform a lot of tricks, but Maven modules are just >>>>>>> convenient ways to structure a Java project. Maven cannot change the >>>>>>> >>>>>> fact >>>> >>>>> that at runtime, module divisions don't really matter (except that they >>>>>>> usually correspond to package sub-namespaces) and the Java >>>>>>> classloader >>>>>>> >>>>>> only >>>> >>>>> sees a single, flat package/class namespace and a set of compiled >>>>>>> >>>>>> classes >>>> >>>>> (usually within JARs) in the classpath that it needs to check to find >>>>>>> >>>>>> the >>>> >>>>> right classes, and if there are two versions of the same library (eg >>>>>>> Lucene) with overlapping class names, that's going to cause trouble. >>>>>>> >>>>>> The >>>> >>>>> only way around that is to shade some of the libraries, i.e. rename >>>>>>> >>>>>> them so >>>> >>>>> that they end up in another, non-conflicting namespace. Apparently >>>>>>> Elasticsearch also did some of that in the past [1] but nowadays >>>>>>> tries >>>>>>> >>>>>> to >>>> >>>>> avoid it. >>>>>>> >>>>>>>> >>>>>>>> Does your assumption 1 ("At a given point in time, only a single >>>>>>>> >>>>>>> Indexing Technology is used") imply that in the assembler >>>>>>> >>>>>> configuration, >>>> >>>>> you cannot have ja:loadClass declarations for both Lucene and ES >>>>>>> >>>>>> backends? >>>> >>>>> Or how do you run something like Fuseki that contains (in a single big >>>>>>> >>>>>> JAR) >>>> >>>>> both the jena-text and jena-text-es modules with all their >>>>>>> >>>>>> dependencies, >>>> >>>>> one of which requires the Lucene 4.x classes and the other one the >>>>>>> >>>>>> Lucene >>>> >>>>> 6.4.1 classes? How do you ensure that only one of them is used at a >>>>>>> >>>>>> time, >>>> >>>>> and that the Java classloader, even though it has access to both >>>>>>> >>>>>> versions >>>> >>>>> of Lucene, only loads classes from the single, correct one and not the >>>>>>> other? Or do you need to have separate "Fuseki-Lucene" and >>>>>>> "Fuseki-ES" >>>>>>> packages, so that you don't end up with two Lucene versions within >>>>>>> the >>>>>>> >>>>>> same >>>> >>>>> Fuseki JAR? >>>>>>> >>>>>>>> >>>>>>>> -Osma >>>>>>>> >>>>>>>> [1] https://www.elastic.co/blog/to-shade-or-not-to-shade >>>>>>>> >>>>>>>> 01.03.2017, 11:03, anuj kumar kirjoitti: >>>>>>>> >>>>>>>>> Hi Osma, >>>>>>>>> >>>>>>>>> I understand what you are saying. There are ways to mitigate risks >>>>>>>>> >>>>>>>> and >>>> >>>>> balance the refactoring without affecting the existing modules. But I >>>>>>>>> >>>>>>>> will >>>>>>> >>>>>>>> not delve into those now. I am not an expert in Jena to convincingly >>>>>>>>> >>>>>>>> say >>>> >>>>> that it is possible, without any hiccups. But I can take a guess and >>>>>>>>> >>>>>>>> say >>>> >>>>> that it is indeed possible :) >>>>>>>>> >>>>>>>>> For the question: "is it even possible to mix modules that depend >>>>>>>>> on >>>>>>>>> different versions of the Lucene libraries within the same >>>>>>>>> project?" >>>>>>>>> >>>>>>>>> I actually do not understand what you mean by mixing modules. I >>>>>>>>> >>>>>>>> assume >>>> >>>>> you >>>>>>> >>>>>>>> mean having jena-text and jena-text-es as dependencies in a build >>>>>>>>> >>>>>>>> without >>>>>>> >>>>>>>> causing the build to conflict. If that is what you mean than the >>>>>>>>> >>>>>>>> answer >>>> >>>>> is >>>>>>> >>>>>>>> yes it is possible and quite simple as well. Let me explain how it >>>>>>>>> is >>>>>>>>> possible. But before that some assumption which I want to call out >>>>>>>>> explicitly. >>>>>>>>> >>>>>>>>> *Assumption:* >>>>>>>>> 1. At a given point in time, only a single Indexing Technology is >>>>>>>>> >>>>>>>> used >>>> >>>>> for >>>>>>> >>>>>>>> text based indexing and searching via Jean. What this means is that >>>>>>>>> >>>>>>>> we >>>> >>>>> will >>>>>>> >>>>>>>> either use Lucene Implementation OR Solr Implementation OR ES >>>>>>>>> Implementation at any given point in time. >>>>>>>>> 2. Fuseki build does not depend on any Lucene 4.9.1 specific >>>>>>>>> classes >>>>>>>>> >>>>>>>> but >>>> >>>>> only on jena-text classes, if at all. >>>>>>>>> >>>>>>>>> Based on these assumptions it is possible to create a build that >>>>>>>>> >>>>>>>> contains >>>>>>> >>>>>>>> jena-text based common classes + ES specific classes without any >>>>>>>>> compatibility issues. And it is infact quite simple. I did it in >>>>>>>>> the >>>>>>>>> current jena-text-es module and ran the entire build which >>>>>>>>> succeeded. >>>>>>>>> The key is to include the latest Lucene dependencies at the very >>>>>>>>> >>>>>>>> beginning >>>>>>> >>>>>>>> in the pom and then include jena-text dependency. Maven will then >>>>>>>>> automatically resolve the dependency issues by including the Lucene >>>>>>>>> librarires that we included in our es specific pom. Have a look the >>>>>>>>> >>>>>>>> pom >>>> >>>>> of >>>>>>> >>>>>>>> jena-text-es module here to see how it can be done : >>>>>>>>> https://github.com/EaseTech/jena/blob/master/jena-text-es/pom.xml >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Anuj Kumar >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 1, 2017 at 7:27 AM, Osma Suominen < >>>>>>>>> >>>>>>>> osma.suomi...@helsinki.fi> >>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi Anuj, >>>>>>>>>> >>>>>>>>>> I understand your concerns. However, we also need to balance >>>>>>>>>> between >>>>>>>>>> >>>>>>>>> the >>>>>>> >>>>>>>> needs of individual modules/features and the whole codebase. I'm >>>>>>>>>> >>>>>>>>> willing to >>>>>>> >>>>>>>> put in the effort to keep the other modules up to date with newer >>>>>>>>>> >>>>>>>>> Lucene >>>>>>> >>>>>>>> versions. Lucene upgrade requirements are well documented, the only >>>>>>>>>> >>>>>>>>> hitches >>>>>>> >>>>>>>> seen in JENA-1250 were related to how jena-text (ab)used some Lucene >>>>>>>>>> features that were dropped from newer versions. >>>>>>>>>> >>>>>>>>>> A perhaps stupid question to more experienced Java developers: is >>>>>>>>>> it >>>>>>>>>> >>>>>>>>> even >>>>>>> >>>>>>>> possible to mix modules that depend on different versions of the >>>>>>>>>> >>>>>>>>> Lucene >>>> >>>>> libraries within the same project? In my (quite limited) >>>>>>>>>> >>>>>>>>> understanding >>>> >>>>> of >>>>>>> >>>>>>>> Java projects and libraries, this requires special arrangements >>>>>>>>>> >>>>>>>>> (e.g. >>>> >>>>> shading) as the Java package/class namespace is shared by all the >>>>>>>>>> >>>>>>>>> code >>>> >>>>> running within the same JVM. >>>>>>>>>> >>>>>>>>>> So can you create, say, a Fuseki build that contains the current >>>>>>>>>> >>>>>>>>> jena-text >>>>>>> >>>>>>>> module (depending on Lucene 4.x) and the new jena-text-es module >>>>>>>>>> >>>>>>>>> (depending >>>>>>> >>>>>>>> on Lucene 6.4.1) without any compatibility issues? >>>>>>>>>> >>>>>>>>>> -Osma >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 01.03.2017, 00:47, anuj kumar kirjoitti: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> My 2 Cents : >>>>>>>>>>> >>>>>>>>>>> The reason I proposed to have separate modules for Lucene, Solr >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>> ES is >>>>>>> >>>>>>>> exactly for avoiding the "All or Nothing" approach we need to take >>>>>>>>>>> >>>>>>>>>> if >>>> >>>>> we >>>>>>> >>>>>>>> club them all together. If they stay together and if in the near >>>>>>>>>>> >>>>>>>>>> future I >>>>>>> >>>>>>>> want to upgrade ES to another version, I also need to again upgrade >>>>>>>>>>> >>>>>>>>>> Lucene >>>>>>> >>>>>>>> and Solr and possibly another implementation that may have been >>>>>>>>>>> >>>>>>>>>> added >>>> >>>>> during the time. As we all know, this means weeks of work if not >>>>>>>>>>> >>>>>>>>>> months to >>>>>>> >>>>>>>> get the changes released. This will personally de-motivate me to do >>>>>>>>>>> anything and I will probably start maintaining my version of >>>>>>>>>>> >>>>>>>>>> Jena-Text as >>>>>>> >>>>>>>> that would be much simpler to do than to upgrade and test and in >>>>>>>>>>> >>>>>>>>>> the >>>> >>>>> process own(read fix bugs) the upgrade for each and every >>>>>>>>>>> >>>>>>>>>> technology. >>>> >>>>> >>>>>>>>>>> If they are developed as separate modules, they can evolve >>>>>>>>>>> >>>>>>>>>> independently >>>>>>> >>>>>>>> of >>>>>>>>>>> each other and we can avoid situations where we cant upgrade to >>>>>>>>>>> >>>>>>>>>> latest >>>> >>>>> version of Lucene because we do not know what effect it will have >>>>>>>>>>> >>>>>>>>>> on >>>> >>>>> Solr >>>>>>> >>>>>>>> Implementation. >>>>>>>>>>> >>>>>>>>>>> We can start with having a separate Module for Jena Text ES and >>>>>>>>>>> see >>>>>>>>>>> >>>>>>>>>> how >>>>>>> >>>>>>>> things go. If they go well, we could extract out Solr and Lucene >>>>>>>>>>> >>>>>>>>>> out >>>> >>>>> of >>>>>>> >>>>>>>> Jena Text. >>>>>>>>>>> >>>>>>>>>>> Again this is just a suggestion based on my limited industry >>>>>>>>>>> >>>>>>>>>> experience. >>>>>>> >>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Anuj Kumar >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 28, 2017 at 5:23 PM, Osma Suominen < >>>>>>>>>>> >>>>>>>>>> osma.suomi...@helsinki.fi >>>>>>> >>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> 28.02.2017, 17:12, A. Soroka kirjoitti: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> https://lists.apache.org/thread.html/dce0d502b11891c28e57bbc >>>>>>>>>>>> >>>>>>>>>>>>> bb0cdef27d8374d58d9634076b8ef4cd7@1431107516@%3Cdev.jena. >>>>>>>>>>>>> >>>>>>>>>>>> apache.org >>>> >>>>> %3E >>>>>>> >>>>>>>> ? In other words, might it be better to factor out between -text >>>>>>>>>>>>> >>>>>>>>>>>> and >>>> >>>>> -spatial and _then_ try to upgrade the Lucene version? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I certainly wouldn't object to that, but somebody has to >>>>>>>>>>>> volunteer >>>>>>>>>>>> >>>>>>>>>>> to do >>>>>>> >>>>>>>> the actual work! >>>>>>>>>>>> >>>>>>>>>>>> I don't use the Solr component now, but I could easily see so >>>>>>>>>>>> >>>>>>>>>>> doing... >>>>>>> >>>>>>>> >>>>>>>>>>>> that's pretty vague, I know, and I'm not in a position to do any >>>>>>>>>>>>> >>>>>>>>>>>> work to >>>>>>> >>>>>>>> maintain it, so consider that just a very small and blurry data >>>>>>>>>>>>> >>>>>>>>>>>> point. >>>>>>> >>>>>>>> :) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Last time I tried it (it was a while ago) I couldn't figure out >>>>>>>>>>>> >>>>>>>>>>> how >>>> >>>>> to >>>>>>> >>>>>>>> get >>>>>>>>>>>> it running... If you could just try that with some toy data, >>>>>>>>>>>> then >>>>>>>>>>>> >>>>>>>>>>> your >>>>>>> >>>>>>>> data >>>>>>>>>>>> point would be a lot less blurry :) I haven't used Solr for >>>>>>>>>>>> >>>>>>>>>>> anything, so >>>>>>> >>>>>>>> I'm not very familiar with how to set it up, and the jena-text >>>>>>>>>>>> instructions >>>>>>>>>>>> are pretty vague unfortunately. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -Osma >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Osma Suominen >>>>>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>>>>> National Library of Finland >>>>>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>>>>> Tel. +358 50 3199529 >>>>>>>>>>>> osma.suomi...@helsinki.fi >>>>>>>>>>>> http://www.nationallibrary.fi >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Osma Suominen >>>>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>>>> National Library of Finland >>>>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>>>> Tel. +358 50 3199529 >>>>>>>>>> osma.suomi...@helsinki.fi >>>>>>>>>> http://www.nationallibrary.fi >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Osma Suominen >>>>>>>> D.Sc. (Tech), Information Systems Specialist >>>>>>>> National Library of Finland >>>>>>>> P.O. Box 26 (Kaikukatu 4) >>>>>>>> 00014 HELSINGIN YLIOPISTO >>>>>>>> Tel. +358 50 3199529 >>>>>>>> osma.suomi...@helsinki.fi >>>>>>>> http://www.nationallibrary.fi >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Osma Suominen >>>>> D.Sc. (Tech), Information Systems Specialist >>>>> National Library of Finland >>>>> P.O. Box 26 (Kaikukatu 4) >>>>> 00014 HELSINGIN YLIOPISTO >>>>> Tel. +358 50 3199529 >>>>> osma.suomi...@helsinki.fi >>>>> http://www.nationallibrary.fi >>>>> >>>> >>>> >>>> >>> >> >> -- >> Osma Suominen >> D.Sc. (Tech), Information Systems Specialist >> National Library of Finland >> P.O. Box 26 (Kaikukatu 4) >> 00014 HELSINGIN YLIOPISTO >> Tel. +358 50 3199529 >> osma.suomi...@helsinki.fi >> http://www.nationallibrary.fi >> > > > > -- > *Anuj Kumar* > -- *Anuj Kumar*