Hi Sorin, If the issue is resolved to your satisfaction please go ahead and close it.
Thanks, Chris > On Mar 25, 2019, at 11:56 AM, Sorin Gheorghiu > <sorin.gheorg...@uni-konstanz.de> wrote: > > Hi Chris, > > after doing more tests I have good news, the textindexer of Jena 3.10 is > working fine. When a large rdf data is indexed, the textindexer starts with > one field (per record), but later on the other fields are indexed as well. > This behaviour had confused me, I expected to see all fields indexed > immediately. Hence I learnt I have to wait until textindexer finishes his > task, then to check the results. > Thank you for your support so far! Shall I close the ticket? > Best regards, > Sorin > > Am 12.03.2019 um 15:39 schrieb Chris Tomlinson: >> Hi Sorin, >> >> I have focussed on the jena text integration w/ Lucene local to jena/fuseki. >> The solr was dropped over a year ago due to lack of support/interest and w’ >> your information about ES 7.x it’s likely going to take someone who is a >> user of ES to help keep the integration up-to-date. >> >> Anuj Kumar <akum...@isightpartners.com <mailto:akum...@isightpartners.com>> >> did the ES integration about a year ago for jena 3.9.0 and as I mentioned I >> made obvious changes to the ES integration to update to Lucene 7.4.0 for >> jena 3.10.0. >> >> The upgrade to Lucene 7.4.0 >> <https://issues.apache.org/jira/browse/JENA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673657#comment-16673657>was >> prompted by a user, jeanmarc.va...@gmail.com >> <mailto:jeanmarc.va...@gmail.com>, who was interested in Lucene 7.5, but the >> released version of ES was built against 7.4 so we upgraded to that version. >> >> I’ve opened JENA-1681 <https://issues.apache.org/jira/browse/JENA-1681> for >> the issue you’ve reported. You can report your findings there and hopefully >> we can get to the bottom of the problem. >> >> Regards, >> Chris >> >> >> >>> On Mar 12, 2019, at 6:40 AM, Sorin Gheorghiu >>> <sorin.gheorg...@uni-konstanz.de <mailto:sorin.gheorg...@uni-konstanz.de>> >>> wrote: >>> >>> Hi Chris, >>> >>> Thank you for your detailed answer. I will still try to find the root cause >>> of this issue. >>> But I have a question to you, do you know if Jena will support >>> Elasticsearch in the further versions? >>> >>> I am asking because in Elasticsearch 7.0 are breaking changes which will >>> affect the transport-client [1]: >>> The TransportClient is deprecated in favour of the Java High Level REST >>> Client and will be removed in Elasticsearch 8.0. >>> This supposes changes in the client’s initialization code, the Migration >>> Guide [2] explains how to do it. >>> >>> >>> [1] >>> https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html >>> >>> <https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html> >>> [2] >>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html >>> >>> <https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html> >>> >>> Best regards, >>> Sorin >>> >>> Am 11.03.2019 um 18:38 schrieb Chris Tomlinson: >>>> Hi Sorin, >>>> >>>> I haven’t had the time to try and delve further into your issue. Your pcap >>>> seems to clearly indicate that there is no data populating any >>>> field/property other than the first one in the entity map. >>>> >>>> I’ve included the configuration file that we use. It has many many fields >>>> defined that are all populated. We load jena/fuseki from a collection of >>>> git repos via a git-to-dbs tool <https://github.com/buda-base/git-to-dbs> >>>> and we don’t see the sort of issue you’re reporting where there is a >>>> single field out of all the defined fields that is populated in the >>>> dataset and Lucene index - we don’t use ElasticSearch. >>>> >>>> The point being that whatever is going wrong is apparently not in the >>>> parsing of the configuration and setting up of the internal tables that >>>> record information about which predicates are indexed via Lucene (or >>>> Elasticsearch) into what fields. >>>> >>>> So it appears to me that the issue is something that is happening in the >>>> connection between the standalone textindexer.java and the Elasticsearch >>>> via the TextIndexES.java. The textindexer.java doesn’t have any post 3.8.0 >>>> changes that I can see and the only change in the TextIndexES.java is a >>>> change in the name of >>>> org.elasticsearch.common.transport.InetSocketTransportAddress to >>>> org.elasticsearch.common.transport.TransportAddress as part of the upgrade. >>>> >>>> I’m really not able to go further at this time. >>>> >>>> I’m sorry, >>>> Chris >>>> >>>> >>>>> # Fuseki configuration for BDRC, configures two endpoints: >>>>> # - /bdrc is read-only >>>>> # - /bdrcrw is read-write >>>>> # >>>>> # This was painful to come up with but the web interface basically allows >>>>> no option >>>>> # and there is no subclass inference by default so such a configuration >>>>> file is necessary. >>>>> # >>>>> # The main doc sources are: >>>>> # - >>>>> https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html >>>>> <https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html> >>>>> # - https://jena.apache.org/documentation/assembler/assembler-howto.html >>>>> <https://jena.apache.org/documentation/assembler/assembler-howto.html> >>>>> # - https://jena.apache.org/documentation/assembler/assembler.ttl >>>>> <https://jena.apache.org/documentation/assembler/assembler.ttl> >>>>> # >>>>> # See https://jena.apache.org/documentation/fuseki2/fuseki-layout.html >>>>> <https://jena.apache.org/documentation/fuseki2/fuseki-layout.html> for >>>>> the destination of this file. >>>>> >>>>> @prefix fuseki: <http://jena.apache.org/fuseki# >>>>> <http://jena.apache.org/fuseki#>> . >>>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# >>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#>> . >>>>> @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema# >>>>> <http://www.w3.org/2000/01/rdf-schema#>> . >>>>> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb# >>>>> <http://jena.hpl.hp.com/2008/tdb#>> . >>>>> @prefix tdb2: <http://jena.apache.org/2016/tdb# >>>>> <http://jena.apache.org/2016/tdb#>> . >>>>> @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler# >>>>> <http://jena.hpl.hp.com/2005/11/Assembler#>> . >>>>> @prefix : <http://base/# <http://base/#>> . >>>>> @prefix text: <http://jena.apache.org/text# >>>>> <http://jena.apache.org/text#>> . >>>>> @prefix skos: <http://www.w3.org/2004/02/skos/core# >>>>> <http://www.w3.org/2004/02/skos/core#>> . >>>>> @prefix adm: <http://purl.bdrc.io/ontology/admin/ >>>>> <http://purl.bdrc.io/ontology/admin/>> . >>>>> @prefix bdd: <http://purl.bdrc.io/data/ <http://purl.bdrc.io/data/>> . >>>>> @prefix bdo: <http://purl.bdrc.io/ontology/core/ >>>>> <http://purl.bdrc.io/ontology/core/>> . >>>>> @prefix bdr: <http://purl.bdrc.io/resource/ >>>>> <http://purl.bdrc.io/resource/>> . >>>>> @prefix f: <java:io.bdrc.ldspdi.sparql.functions.> . >>>>> >>>>> # [] ja:loadClass "org.seaborne.tdb2.TDB2" . >>>>> # tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset . >>>>> # tdb2:GraphTDB2 rdfs:subClassOf ja:Model . >>>>> >>>>> [] rdf:type fuseki:Server ; >>>>> fuseki:services ( >>>>> :bdrcrw >>>>> ) . >>>>> >>>>> :bdrcrw rdf:type fuseki:Service ; >>>>> fuseki:name "bdrcrw" ; # name of the >>>>> dataset in the url >>>>> fuseki:serviceQuery "query" ; # SPARQL query service >>>>> fuseki:serviceUpdate "update" ; # SPARQL update service >>>>> fuseki:serviceUpload "upload" ; # Non-SPARQL upload >>>>> service >>>>> fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store >>>>> protocol (read and write) >>>>> fuseki:dataset :bdrc_text_dataset ; >>>>> . >>>>> >>>>> # using TDB >>>>> :dataset_bdrc rdf:type tdb:DatasetTDB ; >>>>> tdb:location "/usr/local/fuseki/base/databases/bdrc" ; >>>>> tdb:unionDefaultGraph true ; >>>>> . >>>>> >>>>> # using TDB2 >>>>> # :dataset_bdrc rdf:type tdb2:DatasetTDB2 ; >>>>> # tdb2:location "/usr/local/fuseki/base/databases/bdrc" ; >>>>> # tdb2:unionDefaultGraph true ; >>>>> # . >>>>> >>>>> :bdrc_text_dataset rdf:type text:TextDataset ; >>>>> text:dataset :dataset_bdrc ; >>>>> text:index :bdrc_lucene_index ; >>>>> . >>>>> >>>>> # Text index description >>>>> :bdrc_lucene_index a text:TextIndexLucene ; >>>>> text:directory <file:/usr/local/fuseki/base/lucene-bdrc> >>>>> <file:///usr/local/fuseki/base/lucene-bdrc> ; >>>>> text:storeValues true ; >>>>> text:multilingualSupport true ; >>>>> text:entityMap :bdrc_entmap ; >>>>> text:defineAnalyzers ( >>>>> [ text:defineAnalyzer :romanWordAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "word" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "roman" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :devaWordAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "word" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "deva" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :slpWordAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "word" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "SLP" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :romanLenientIndexAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "syl" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "roman" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "lenient" ; >>>>> text:paramValue "index" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :devaLenientIndexAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "syl" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "deva" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "lenient" ; >>>>> text:paramValue "index" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :slpLenientIndexAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "syl" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "SLP" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "lenient" ; >>>>> text:paramValue "index" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :romanLenientQueryAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.sa.SanskritAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "mode" ; >>>>> text:paramValue "syl" ] >>>>> [ text:paramName "inputEncoding" ; >>>>> text:paramValue "roman" ] >>>>> [ text:paramName "mergePrepositions" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterGeminates" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "lenient" ; >>>>> text:paramValue "query" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :hanzAnalyzer ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "profile" ; >>>>> text:paramValue "TC2SC" ] >>>>> [ text:paramName "stopwords" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterChars" ; >>>>> text:paramValue 0 ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :han2pinyin ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "profile" ; >>>>> text:paramValue "TC2PYstrict" ] >>>>> [ text:paramName "stopwords" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "filterChars" ; >>>>> text:paramValue 0 ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:defineAnalyzer :pinyin ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.zh.ChineseAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "profile" ; >>>>> text:paramValue "PYstrict" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:addLang "bo" ; >>>>> text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "segmentInWords" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "lemmatize" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterChars" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "inputMode" ; >>>>> text:paramValue "unicode" ] >>>>> [ text:paramName "stopFilename" ; >>>>> text:paramValue "" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:addLang "bo-x-ewts" ; >>>>> text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "segmentInWords" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "lemmatize" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterChars" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "inputMode" ; >>>>> text:paramValue "ewts" ] >>>>> [ text:paramName "stopFilename" ; >>>>> text:paramValue "" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:addLang "bo-alalc97" ; >>>>> text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; >>>>> text:analyzer [ >>>>> a text:GenericAnalyzer ; >>>>> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; >>>>> text:params ( >>>>> [ text:paramName "segmentInWords" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "lemmatize" ; >>>>> text:paramValue true ] >>>>> [ text:paramName "filterChars" ; >>>>> text:paramValue false ] >>>>> [ text:paramName "inputMode" ; >>>>> text:paramValue "alalc" ] >>>>> [ text:paramName "stopFilename" ; >>>>> text:paramValue "" ] >>>>> ) >>>>> ] ; >>>>> ] >>>>> [ text:addLang "zh-hans" ; >>>>> text:searchFor ( "zh-hans" "zh-hant" ) ; >>>>> text:auxIndex ( "zh-aux-han2pinyin" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :hanzAnalyzer ] ; >>>>> ] >>>>> [ text:addLang "zh-hant" ; >>>>> text:searchFor ( "zh-hans" "zh-hant" ) ; >>>>> text:auxIndex ( "zh-aux-han2pinyin" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :hanzAnalyzer >>>>> ] ; >>>>> ] >>>>> [ text:addLang "zh-latn-pinyin" ; >>>>> text:searchFor ( "zh-latn-pinyin" "zh-aux-han2pinyin" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :pinyin >>>>> ] ; >>>>> ] >>>>> [ text:addLang "zh-aux-han2pinyin" ; >>>>> text:searchFor ( "zh-latn-pinyin" "zh-aux-han2pinyin" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :pinyin >>>>> ] ; >>>>> text:indexAnalyzer :han2pinyin ; >>>>> ] >>>>> [ text:addLang "sa-x-ndia" ; >>>>> text:searchFor ( "sa-x-ndia" "sa-aux-deva2Ndia" >>>>> "sa-aux-roman2Ndia" "sa-aux-slp2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanLenientQueryAnalyzer >>>>> ] ; >>>>> ] >>>>> [ text:addLang "sa-aux-deva2Ndia" ; >>>>> text:searchFor ( "sa-x-ndia" "sa-aux-roman2Ndia" >>>>> "sa-aux-slp2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanLenientQueryAnalyzer >>>>> ] ; >>>>> text:indexAnalyzer :devaLenientIndexAnalyzer ; >>>>> ] >>>>> [ text:addLang "sa-aux-roman2Ndia" ; >>>>> text:searchFor ( "sa-x-ndia" "sa-aux-deva2Ndia" >>>>> "sa-aux-slp2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanLenientQueryAnalyzer >>>>> ] ; >>>>> text:indexAnalyzer :romanLenientIndexAnalyzer ; >>>>> ] >>>>> [ text:addLang "sa-aux-slp2Ndia" ; >>>>> text:searchFor ( "sa-x-ndia" "sa-aux-deva2Ndia" >>>>> "sa-aux-roman2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanLenientQueryAnalyzer >>>>> ] ; >>>>> text:indexAnalyzer :slpLenientIndexAnalyzer ; >>>>> ] >>>>> [ text:addLang "sa-deva" ; >>>>> text:searchFor ( "sa-deva" "sa-x-iast" "sa-x-slp1" "sa-x-iso" >>>>> "sa-alalc97" ) ; >>>>> text:auxIndex ( "sa-aux-deva2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :devaWordAnalyzer ] ; >>>>> ] >>>>> [ text:addLang "sa-x-iso" ; >>>>> text:searchFor ( "sa-x-iso" "sa-x-iast" "sa-x-slp1" "sa-deva" >>>>> "sa-alalc97" ) ; >>>>> text:auxIndex ( "sa-aux-roman2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanWordAnalyzer ] ; >>>>> ] >>>>> [ text:addLang "sa-x-slp1" ; >>>>> text:searchFor ( "sa-x-slp1" "sa-x-iast" "sa-x-iso" "sa-deva" >>>>> "sa-alalc97" ) ; >>>>> text:auxIndex ( "sa-aux-slp2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :slpWordAnalyzer ] ; >>>>> ] >>>>> [ text:addLang "sa-x-iast" ; >>>>> text:searchFor ( "sa-x-iast" "sa-x-slp1" "sa-x-iso" "sa-deva" >>>>> "sa-alalc97" ) ; >>>>> text:auxIndex ( "sa-aux-roman2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanWordAnalyzer ] ; >>>>> ] >>>>> [ text:addLang "sa-alalc97" ; >>>>> text:searchFor ( "sa-alalc97" "sa-x-slp1" "sa-x-iso" "sa-deva" >>>>> "sa-iast" ) ; >>>>> text:auxIndex ( "sa-aux-roman2Ndia" ) ; >>>>> text:analyzer [ >>>>> a text:DefinedAnalyzer ; >>>>> text:useAnalyzer :romanWordAnalyzer ] ; >>>>> ] >>>>> ) ; >>>>> . >>>>> >>>>> # Index mappings >>>>> :bdrc_entmap a text:EntityMap ; >>>>> text:entityField "uri" ; >>>>> text:uidField "uid" ; >>>>> text:defaultField "label" ; >>>>> text:langField "lang" ; >>>>> text:graphField "graph" ; ## enable graph-specific indexing >>>>> text:map ( >>>>> [ text:field "label" ; >>>>> text:predicate skos:prefLabel ] >>>>> [ text:field "altLabel" ; >>>>> text:predicate skos:altLabel ; ] >>>>> [ text:field "rdfsLabel" ; >>>>> text:predicate rdfs:label ; ] >>>>> [ text:field "chunkContents" ; >>>>> text:predicate bdo:chunkContents ; ] >>>>> [ text:field "eTextTitle" ; >>>>> text:predicate bdo:eTextTitle ; ] >>>>> [ text:field "logMessage" ; >>>>> text:predicate adm:logMessage ; ] >>>>> [ text:field "noteText" ; >>>>> text:predicate bdo:noteText ; ] >>>>> [ text:field "workAuthorshipStatement" ; >>>>> text:predicate bdo:workAuthorshipStatement ; ] >>>>> [ text:field "workColophon" ; >>>>> text:predicate bdo:workColophon ; ] >>>>> [ text:field "workEditionStatement" ; >>>>> text:predicate bdo:workEditionStatement ; ] >>>>> [ text:field "workPublisherLocation" ; >>>>> text:predicate bdo:workPublisherLocation ; ] >>>>> [ text:field "workPublisherName" ; >>>>> text:predicate bdo:workPublisherName ; ] >>>>> [ text:field "workSeriesName" ; >>>>> text:predicate bdo:workSeriesName ; ] >>>>> ) ; >>>>> . >>>> >>>> >>>>> On Mar 11, 2019, at 11:42 AM, Sorin Gheorghiu >>>>> <sorin.gheorg...@uni-konstanz.de >>>>> <mailto:sorin.gheorg...@uni-konstanz.de>> wrote: >>>>> >>>>> Hi Chris, >>>>> >>>>> have you had time to look in my results, by chance? Would this help to >>>>> isolate the issue? >>>>> Let me know if you need any other data to collect, please. >>>>> Best regards, >>>>> Sorin >>>>> >>>>> -------- Weitergeleitete Nachricht -------- >>>>> Betreff: Re: Text Index build with empty fields >>>>> Datum: Mon, 4 Mar 2019 17:35:56 +0100 >>>>> Von: Sorin Gheorghiu <sorin.gheorg...@uni-konstanz.de> >>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>> An: users@jena.apache.org <mailto:users@jena.apache.org> >>>>> Kopie (CC): Chris Tomlinson <chris.j.tomlin...@gmail.com> >>>>> <mailto:chris.j.tomlin...@gmail.com> >>>>> >>>>> Hi Chris, >>>>> >>>>> when I reduce the entity map to 3 fields: >>>>> >>>>> [ text:field "oldgndid"; >>>>> text:predicate gndo:oldAuthorityNumber >>>>> ] >>>>> [ text:field "prefName"; >>>>> text:predicate gndo:preferredNameForThePerson >>>>> ] >>>>> [ text:field "varName"; >>>>> text:predicate gndo:variantNameForThePerson >>>>> ] >>>>> then oldgndid field only contains data (see >>>>> textindexer_3params_040319.pcap attached): >>>>> ES...|..........\*.......gnd_fts_es_131018_index.Y6BxYm-hT6qL0_NX10HrZQ..GndSubjectheadings.http://d-nb.info/gnd/4000002-3 >>>>> <http://d-nb.info/gnd/4000002-3>........ >>>>> ES...B..........\*.....transport_client.indices:data/write/update..gnd_fts_es_131018_index.........GndSubjectheadings.http://d-nb.info/gnd/4000023-0......painless..if >>>>> <http://d-nb.info/gnd/4000023-0......painless..if>((ctx._source == null) >>>>> || (ctx._source.oldgndid == null) || (ctx._source.oldgndid.empty == >>>>> true)) {ctx._source.oldgndid=[params.fieldValue] } else >>>>> {ctx._source.oldgndid.add(params.fieldValue)}..fieldValue..(DE-588c)4000023-0...............gnd_fts_es_131018_index....GndSubjectheadings..http://d-nb.info/gnd/4000023-0 >>>>> >>>>> <http://d-nb.info/gnd/4000023-0>..>{"varName":[],"prefName":[],"oldgndid":["(DE-588c)4000023-0"]}............. >>>>> moreover with 2 fields: >>>>> >>>>> [ text:field "prefName"; >>>>> text:predicate gndo:preferredNameForThePerson >>>>> ] >>>>> [ text:field "varName"; >>>>> text:predicate gndo:variantNameForThePerson >>>>> ] >>>>> then prefName field only contains data (see >>>>> textindexer_2params_040319.pcap attached): >>>>> >>>>> ES...|..........\*.......gnd_fts_es_131018_index.Y6BxYm-hT6qL0_NX10HrZQ..GndSubjectheadings.http://d-nb.info/gnd/134316541 >>>>> <http://d-nb.info/gnd/134316541>........ >>>>> ES...$..........\*.....transport_client.indices:data/write/update..gnd_fts_es_131018_index.........GndSubjectheadings.http://d-nb.info/gnd/1153446294......painless..if >>>>> <http://d-nb.info/gnd/1153446294......painless..if>((ctx._source == >>>>> null) || (ctx._source.prefName == null) || (ctx._source.prefName.empty == >>>>> true)) {ctx._source.prefName=[params.fieldValue] } else >>>>> {ctx._source.prefName.add(params.fieldValue)}..fieldValue. >>>>> Pharmakon...............gnd_fts_es_131018_index....GndSubjectheadings..http://d-nb.info/gnd/1153446294 >>>>> >>>>> <http://d-nb.info/gnd/1153446294>..'{"varName":[],"prefName":["Pharmakon"]}................. >>>>> >>>>> Regards, >>>>> Sorin >>>>> >>>>> Am 01.03.2019 um 18:06 schrieb Chris Tomlinson: >>>>>> Hi Sorin, >>>>>> >>>>>> tcpdump -A -r works fine to view the pcap file; however, I don’t have >>>>>> the time to delve into the data. I’ll take your word for it that the >>>>>> whole setup worked in 3.8.0 and I encourage you to try simplifying the >>>>>> entity map perhaps by having a unique field per property to see if the >>>>>> problem appears related to prefName and varName fields mapping to >>>>>> multiple properties. >>>>>> >>>>>> I do notice that the field oldgndid only maps to a single property but >>>>>> not knowing the data I have no idea whether there’s any of that data in >>>>>> your tests. >>>>>> >>>>>> Since you indicate that only the field, gndtype, has data (per the pcap >>>>>> file) then if there is oldgndid data (i.e., occurrences of >>>>>> gndo:oldAuthorityNumber, then that suggests that there is some rather >>>>>> generic issue w/ textindexer; however if there is no oldgndid data then >>>>>> there may be a problem that has crept in since 3.8.0 that is leading to >>>>>> a problem with data for multiple properties assigned to a single field >>>>>> which I would guess might be related to >>>>>> google.common.collection.MultiMap that holds the results of parsing the >>>>>> entity map. >>>>>> >>>>>> I have no idea how to enable the debug when running the standalone >>>>>> textindexer, perhaps someone else can answer that. >>>>>> >>>>>> Regards, >>>>>> Chris >>>>>> >>>>>> >>>>>>> On Mar 1, 2019, at 2:57 AM, Sorin Gheorghiu >>>>>>> <sorin.gheorg...@uni-konstanz.de> >>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> wrote: >>>>>>> >>>>>>> Hi Chris, >>>>>>> >>>>>>> 1) As I said before, this entity map worked in 3.8.0. >>>>>>> The pcap file I sent you is the proof that Jena delivers inconsistent >>>>>>> data. You may open it with Wireshark >>>>>>> >>>>>>> <jndbgnifbhkopbdd.png> >>>>>>> >>>>>>> or read it with tcpick: >>>>>>> # tcpick -C -yP -r textindexer_280219.pcap | more >>>>>>> >>>>>>> ES...}..........\*.......gnd_fts_es_131018_index.cp-dFuCVTg-dUwvfyREG2w..GndSubjectheadings.http://d-nb.info/gnd/102968225 >>>>>>> >>>>>>> <dfucvtg-duwvfyreg2w..gndsubjectheadings.http://d-nb.info/gnd/102968225>......... >>>>>>> ES..............\*.....transport_client.indices:data/write/update..gnd_fts_es_131018_index.........GndSubjectheadings.http://d-nb.info/gnd/102968438......painless..if >>>>>>> <http://d-nb.info/gnd/102968438......painless..if>((ctx._source == >>>>>>> null) || (ctx._source.gndtype == null) || (ctx._source.gndtype.empty == >>>>>>> true)) {ctx._source.gndtype=[params.fieldValue] } else >>>>>>> {ctx._source.gndtype.add(params.fieldValue)} >>>>>>> ..fieldValue..Person...............gnd_fts_es_131018_index....GndSubjectheadings..http://d-nb.info/gnd/102968438 >>>>>>> >>>>>>> <http://d-nb.info/gnd/102968438>....{"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"oldgndid":[],"gndtype":["Person"]}.................................. >>>>>>> As a remark, Jena sends whole text index data within one TCP packet for >>>>>>> one Elasticsearch document. >>>>>>> >>>>>>> 3) fuseki.log collects logs when Fuseki server is running, but for text >>>>>>> indexer we have to run java command line, i.e. >>>>>>> >>>>>>> java -cp ./fuseki-server.jar:<other_jars> jena.textindexer >>>>>>> --desc=run/config.ttl >>>>>>> The question is how to activate the debug logs during text indexer? >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Sorin >>>>>>> >>>>>>> Am 28.02.2019 um 21:41 schrieb Chris Tomlinson: >>>>>>>> Hi Sorin, >>>>>>>> >>>>>>>> 1) I suggest trying to simplify the entity map. I assume there’s data >>>>>>>> for each of the properties other than skos:altLabel in the entity map: >>>>>>>> >>>>>>>>> [ text:field "gndtype"; >>>>>>>>> text:predicate skos:altLabel >>>>>>>>> ] >>>>>>>>> [ text:field "oldgndid"; >>>>>>>>> text:predicate gndo:oldAuthorityNumber >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForTheSubjectHeading >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForTheSubjectHeading >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate >>>>>>>>> gndo:preferredNameForThePlaceOrGeographicName >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForThePlaceOrGeographicName >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForTheWork >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForTheWork >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForTheConferenceOrEvent >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForTheConferenceOrEvent >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForTheCorporateBody >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForTheCorporateBody >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForThePerson >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForThePerson >>>>>>>>> ] >>>>>>>>> [ text:field "prefName"; >>>>>>>>> text:predicate gndo:preferredNameForTheFamily >>>>>>>>> ] >>>>>>>>> [ text:field "varName"; >>>>>>>>> text:predicate gndo:variantNameForTheFamily >>>>>>>>> ] >>>>>>>> 2) You might try a TextIndexLucene >>>>>>>> >>>>>>>> 3) Adding the line log4j.logger.org.apache.jena.query.text.es=DEBUG >>>>>>>> should work. I see no problem with it. >>>>>>>> >>>>>>>> Sorry to be of little help, >>>>>>>> Chris >>>>>>>> >>>>>>>> >>>>>>>>> On Feb 28, 2019, at 8:53 AM, Sorin Gheorghiu >>>>>>>>> <sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> wrote: >>>>>>>>> >>>>>>>>> Hi Chris, >>>>>>>>> Thank you for answering, I reply you directly because users@jena >>>>>>>>> doesn't accept messages larger than 1Mb. >>>>>>>>> >>>>>>>>> The previous text index successful attempt we did was with 3.8.0, not >>>>>>>>> 3.9.0, sorry for the misinformation. >>>>>>>>> Attached is the assembler file for 3.10.0 as requested, as well as >>>>>>>>> the packet capture file to see that only the 'gndtype' field has data. >>>>>>>>> I tried to enable the debug logs in log4j.properties with >>>>>>>>> log4j.logger.org.apache.jena.query.text.es=DEBUG but no output in the >>>>>>>>> log file. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Sorin >>>>>>>>> >>>>>>>>> Am 27.02.2019 um 20:01 schrieb Chris Tomlinson: >>>>>>>>>> Hi Sorin, >>>>>>>>>> >>>>>>>>>> Please provide the assembler file for Elasticsearch that has the >>>>>>>>>> problematic entity map definitions. >>>>>>>>>> >>>>>>>>>> There haven’t been any changes in over a year to textindexer since >>>>>>>>>> well before 3.9. I don’t see any relevant changes to the handling of >>>>>>>>>> entity maps either so I can’t begin to pursue the issue further w/o >>>>>>>>>> perhaps seeing your current assembler file. >>>>>>>>>> >>>>>>>>>> I don't have any experience with Elasticsearch or with using >>>>>>>>>> jena-text-es beyond a simple change to TextIndexES.java to change >>>>>>>>>> org.elasticsearch.common.transport.InetSocketTransportAddress to >>>>>>>>>> org.elasticsearch.common.transport.TransportAddress as part of the >>>>>>>>>> upgrade to Lucene 7.4.0 and Elasticsearch 6.4.2. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Feb 25, 2019, at 2:37 AM, Sorin Gheorghiu >>>>>>>>>>> <sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> wrote: >>>>>>>>>>> >>>>>>>>>>> Correction: only the *latest field *from the /text:map/ list >>>>>>>>>>> contains a value. >>>>>>>>>>> >>>>>>>>>>> To reformulate: >>>>>>>>>>> >>>>>>>>>>> * if there are 3 fields in /text:map/, then during indexing the >>>>>>>>>>> first >>>>>>>>>>> two are empty (let's name them 'text1' and 'text2') and the latest >>>>>>>>>>> field contains data (let's name it 'text3') >>>>>>>>>>> * if on the next attempt the field 'text3' is commented out, then >>>>>>>>>>> 'text1' is empty and 'text2' contains data >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Am 22.02.2019 um 15:01 schrieb Sorin Gheorghiu: >>>>>>>>>>>> In addition: >>>>>>>>>>>> >>>>>>>>>>>> * if there are 3 fields in /text:map/, then during indexing one >>>>>>>>>>>> contains data (let's name it 'text1'), the others are empty >>>>>>>>>>>> (let's >>>>>>>>>>>> name them 'text2' and 'text3'), >>>>>>>>>>>> * if on the next attempt the field 'text1' is commented out, then >>>>>>>>>>>> 'text2' contains data and 'text3' is empty >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -------- Weitergeleitete Nachricht -------- >>>>>>>>>>>> Betreff: Text Index build with empty fields >>>>>>>>>>>> Datum: Fri, 22 Feb 2019 14:01:18 +0100 >>>>>>>>>>>> Von: Sorin Gheorghiu <sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>>>>> Antwort an: users@jena.apache.org >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> >>>>>>>>>>>> An: users@jena.apache.org <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> <mailto:users@jena.apache.org> <mailto:users@jena.apache.org> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> When building the text index with the /jena.textindexer/ tool in >>>>>>>>>>>> Jena 3.10 for an external full-text search engine (Elasticsearch >>>>>>>>>>>> of course) and having multiple fields with different names in >>>>>>>>>>>> /text:map/, just *one field is indexed* (more precisely one field >>>>>>>>>>>> contains data, the others are empty). It doesn't look to be an >>>>>>>>>>>> issue with Elasticsearch, in the logs generated during the >>>>>>>>>>>> indexing the fields are already missing the values, but one. The >>>>>>>>>>>> same setup worked in Jena 3.9. Changing the Java version from 8 to >>>>>>>>>>>> 9 or 11 didn't change anything. >>>>>>>>>>>> >>>>>>>>>>>> Could it be that changes of the new release have affected this >>>>>>>>>>>> tool and we deal with a bug? >>>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sorin Gheorghiu Tel: +49 7531 88-3198 >>>>>>>>> Universität Konstanz Raum: B705 >>>>>>>>> 78464 Konstanz sorin.gheorg...@uni-konstanz.de >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>>>> >>>>>>>>> - KIM: Abteilung Contentdienste - >>>>>>> -- >>>>>>> Sorin Gheorghiu Tel: +49 7531 88-3198 >>>>>>> Universität Konstanz Raum: B705 >>>>>>> 78464 Konstanz sorin.gheorg...@uni-konstanz.de >>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>>>> >>>>>>> - KIM: Abteilung Contentdienste - >>>>> -- >>>>> Sorin Gheorghiu Tel: +49 7531 88-3198 >>>>> Universität Konstanz Raum: B705 >>>>> 78464 Konstanz sorin.gheorg...@uni-konstanz.de >>>>> <mailto:sorin.gheorg...@uni-konstanz.de> >>>>> >>>>> - KIM: Abteilung Contentdienste - >>>>> <textindexer_2params_040319.pcap><textindexer_3params_040319.pcap> >>>> >>> -- >>> Sorin Gheorghiu Tel: +49 7531 88-3198 >>> Universität Konstanz Raum: B705 >>> 78464 Konstanz sorin.gheorg...@uni-konstanz.de >>> <mailto:sorin.gheorg...@uni-konstanz.de> >>> >>> - KIM: Abteilung Contentdienste - >>