I recommend normalizing all characters with a compatibility transformation,
whether they are Arabic or not.
We use this charFilter as the first step in every query and indexing analysis
chain.
You’ll also need to include the ICU library, which should be included by
default.
This is only for Arabic language.
If you don't know the language and just want to assist people searching with
different scripts (search with latin letters for Arabic text), see my other
answer.
Uwe
Am May 20, 2021 2:38:26 PM UTC schrieb Mete Kural
:
>Hello Michael,
>
>Thank you very much
Hi,
As answer to your question looking for character substitutions. There is the
ICU library doing this with ICU Transformers. It may also change all Cyrillic
text to latin during indexing and search. This greatly helps people to find
stuff.
A great example of a transformer is here as part of
Hello Michael,
Thank you very much for this information.
I will try at java-u...@lucene.apache.org also.
By the way, is the Arabic analyzer referenced here
(https://github.com/apache/lucene/tree/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/ar)
just for the Arabic language
Hi Mete
You might also want to try the java-u...@lucene.apache.org mailing list
https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg
Re languages other than english you might find more information at
The default suffix in this system prop is "SNAPSHOT" and the timestamp comes
then from Maven's internal Logic, this cannot be changed.
By overriding the suffix explicit (as said before and find by Jenkins) you
convert it to an official "release" in Maven's sense and it is no longer a
snapshot.
Jenkins does this already:
https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/242/
It uses build number!
The system property "version suffix" is responsible and is set by Jenkins. See
in command line: [Lucene-Artifacts-main] $
Hello Lucene Community,
I hope this finds you all well. I want to ask you if this would be the right
medium to discuss some matters surrounding text search in relation to variant
Unicode codings of words in Arabic and Arabic scripted languages. This is not a
great example but the said matters
In principal it makes sense, but is there any chance the build artifact
could vary for the same SHA? We hope not, I think, but stranger things have
happened. Probably an edge case not worth worrying about though, and
relying on the build server's clock doesn't seem great, so +1 from me,
although I
Hi all,
I’m preparing a local lucene 9.0 snapshot build and I notice that the jar files
generated by `./gradlew mavenToLocalFolder` are called something like
`lucene-suggest-9.0.0-20210520.111833-1-javadoc.jar` - in other words, they are
including a timestamp. For my setup I’d like to replace
10 matches
Mail list logo