Welcome to java ClassLoader hell!
: Caused by: java.lang.IllegalArgumentException: resource
: tokenization/sentence-boundary-model.bin not found.
: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
: ~[?:?]
: at com.google.common.io.Resources.getResource(Resources.java:196) ~[?:?]
: at
:
zemberek.tokenization.TurkishSentenceExtractor.fromDefaultModel(TurkishSentenceExtractor.java:51)
...
: However, I have tokenization/sentence-boundary-model.bin inside
: zemberek-tokenization-0.17.1.jar file, which I also copied into lib
: dir.
You have to be specific: which "lib" directory are you talking about here?
If you mean this...
https://solr.apache.org/guide/solr/latest/configuration-guide/libs.html
...i would advise against this approach in general, and instead suggest
that you put your custom code in a custom module directory...
https://solr.apache.org/guide/solr/latest/configuration-guide/solr-modules.html
...or using the "package manager" (but i have very little experience with
this)...
https://solr.apache.org/guide/solr/latest/configuration-guide/package-manager.html
Doing one of these (instead of <lib .../> directives in your
solrconfig.xml) *may* fix your problem.
As to what exactly your problem is...
By the looks of it, based on the first google search result i
found, I'm guessing this code is hte underlying code you are
using...
https://github.com/ahmetaa/zemberek-nlp/blob/a9c0f88210dd6a4a1b6152de88d117054a105879/tokenization/src/main/java/zemberek/tokenization/TurkishSentenceExtractor.java#L49
...which uses the single argument version of
com.google.common.io.Resources.getResource(...) which is documented to use
the context classloader -- if you can change that code to use the two
argument getResource(...) and pass in the TurkishSentenceExtractor.class,
then it *should* always be the correct classloader that solr has created
for your plugin/module/SolrCore (regardless of how exactly you've pointed
Solr at your jar files)
If you can't modify the TurkishSentenceExtractor class directly, you could
maybe change your Factory so that instead of using
TurkishSentenceExtractor.DEFAULT you create your own instance of
TurkishSentenceExtractor passing in the loaded weights yourself.
(Hmmmm.. Except i think your problem is actaully in the static
initializers during the classloading of TurkishSentenceExtractor? ... so
that would probably still be a problem)
Alternatively: You can end run around all of this and shove your jars into
the WEB-INF/lib of solr itself -- which should make all the classes
findable no matter what classloader is used. There is some (bad)
precidence set for this in one of the spatial plugins....
https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html#jts-and-polygons-flat
-Hoss
http://www.lucidworks.com/