Re: [sword-devel] Lucene++

2017-05-20 Thread Jaak Ristioja
On 20.05.2017 21:28, David Haslam wrote: > *Lucene++* is an up to date C++ port of the popular Java Lucene library, a > high-performance, full-featured text search engine. According to GitHub the latest commit was 6aec070 on 25 Mar 2016, and their issue tracker seems to be abandoned by the

[sword-devel] Lucene++

2017-05-20 Thread David Haslam
Further to the recent threads that touched on issues with Lucene search in SWORD apps, in one of which DM mentioned that the Java Lucene library latest version *3.0.7* did not suffer these issues, here's a question prompted by that observation. If SWORD currently uses *CLucene*, what would it

Re: [sword-devel] Lucene search index and Coptic ?

2017-05-16 Thread David Haslam
Late last night I got round to creating an issue in our tracker for this. http://tracker.crosswire.org/browse/API-203 Even if most of this is outwith our control, at least it's now on record for all to see. Best regards, David -- View this message in context:

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-30 Thread David Haslam
Just to confirm. The crazy results from Lucene search type affects ALL five Coptic language Bible modules. [CopNT] - The Coptic New Testament [CopSahHorner] - Sahidic Coptic New Testament, ed. by G. W. Horner [CopSahidicMSS] - The Sahidica Manuscripts [CopSahidica]- Sahidica - A New

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread David Haslam
As it happens, I've rarely used BD on my Win7 x64 PC, largely because there's no quick fix for tweaking the way I launch BD whenever there's been an update to Oracle Java. I'm still waiting for DM to fix the way BD installs for 64-bit hardware. Best regards, David -- View this message in

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread Greg Hellings
I did not analyze the . It was multiple screens of text. Have you tried this in BD? BD uses Lucene directly instead of CLucene. That might have better support for Coptic. --Greg On Fri, Apr 28, 2017 at 1:19 PM, David Haslam wrote: > Thanks, Greg. > > I guess this shows

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread David Haslam
Thanks, Greg. I guess this shows the limitations of using diatheke in Windows when you have a non-ANSI module. "What's impossible with Windows is possible in Linux." The test also confirms that there's a serious issue with Lucene for Coptic texts. The number of matches (12460 ) is the same as I

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread Greg Hellings
$ diatheke -b SahidicBible -s lucene -k ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ 12460 matches total (SahidicBible) $ diatheke -b SahidicBible -s regex -k ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ Verses containing "ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ"-- Romans 5:3 ; James 1:3 -- 2 matches total (SahidicBible) --Greg On Fri, Apr 28, 2017 at 9:55 AM, David Haslam

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread David Haslam
Thanks Troy, When you wrote, "They typically search with regex". Please can you explain exactly how I could do this in a Windows CMD file (or command line) in order to find (e.g.) the two verses containing the word ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ What exactly does *diatheke -s regex* expect for Unicode character

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-28 Thread David Haslam
Greg wrote, "Have you tried using one of the command line utilities or examples directly?" Well, yes, but now I have hit a brick wall. Assuming that *mkfastmod.exe* exactly mimics Xiphos in how it constructs the Lucene index, that's not the problem. The problem is that in Windows, how do you

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread David Haslam
FIO. Zip contains the .conf file that I updated yesterday. sahidicbible.zip btw. I just missed out the "upload file" step before. David -- View this message in context:

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread David Haslam
Yes - of course! Encoding=UTF-8 It's not primarily a font issue, even though that might be a further annoyance in Xiphos. Having a font without coverage for the Coptic block in question cannot by any stretch of logic account for a search that finds 622,900% of the only 2 true matches. A font

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread Troy A. Griffitts
Has anyone checked the encoding entry in the conf file? On April 27, 2017 6:57:48 AM MST, Greg Hellings wrote: >On Thu, Apr 27, 2017 at 3:59 AM, David Haslam >wrote: > >> Even if Troy's good friends don't use the Lucene index for their work >on >>

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread Greg Hellings
On Thu, Apr 27, 2017 at 3:59 AM, David Haslam wrote: > Even if Troy's good friends don't use the Lucene index for their work on > Coptic manuscripts, that's no reason not to pursue this issue in more > detail. > > The *Coptic* block *2C80..2CFF* was added with *Unicode

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread David Haslam
No - I've not tried any command line utilities related to Lucene search. Were you thinking of *diatheke*? If it is a font issue, it's not as if I hadn't already installed the recommended *Antinoou* font for the SahidicBible module and selected this font in Xiphos.

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread David Haslam
Hi DM, I wouldn't know where to begin with *Luke*; there's no documentation for it in the code archive you cited. Is it even something a Windows user like me could do anything with? Best regards, David -- View this message in context:

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-27 Thread David Haslam
Even if Troy's good friends don't use the Lucene index for their work on Coptic manuscripts, that's no reason not to pursue this issue in more detail. The *Coptic* block *2C80..2CFF* was added with *Unicode 4.1* which was released in March 2005. Are we concluding that the Lucene indexing

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread Troy A. Griffitts
So, as a side note to this thread, The Sahidic Bible is maintained at coptot.manuscriptroom.com: http://coptot.manuscriptroom.com/transcribing?docID=1620025=PUBLISHED and we regularly export from there and import into swordweb, which is used for their browser plugin (first link on Christian

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread DM Smith
Consider using Luke to analyze the constructed Lucene index. See: https://code.google.com/archive/p/luke/ I think you’ll need one that matches Lucene 1.9.1. Maybe 1.4.x. DM > On Apr 26, 2017, at 3:48 PM, David Haslam wrote: >

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread Greg Hellings
Unicode replacement characters typically indicate a font issue, and would not normally be represented as such within the internals of a program. Have you tried using one of the command line utilities or examples directly? --Greg On Wed, Apr 26, 2017 at 2:48 PM, David Haslam

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread David Haslam
If you examine the result preview pane in the Xiphos Advanced Search dialog, the problem becomes apparent. Most Coptic Unicode characters are not displayed correctly. The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER. i.e. All these Coptic letters are basically not

Re: [sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread David Haslam
Comparing the results total 12460 to the number of module verses that contain any text (14212), a search that finds the 10 letter search key in 87.67% of the total is clearly a serious matter, one so egregious that it almost defies a rational explanation. Here's a possible clue. Taking the

[sword-devel] Lucene search index and Coptic ?

2017-04-26 Thread David Haslam
If you search the module SahidicBible using either PocketSword or using Xiphos with the Lucene method selected, the results list is enormous and erroneous. Example: Search for the word "ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ" This actually occurs on only two verses: Romans 5:3 and James 1:3 The Lucene method lists 12460

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Greg Hellings
There is another port - lucene++ - that I've read is the reason CLucene was abandoned. It targets compatibility with Lucene 3 vs CLucene's targeting of Lucene 2. It's on github, and its last commit was ~9 months ago. At least it's better than 2013! There's also Apache Lucy, which is a "loose C"

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread DM Smith
When I contributed to Lucene (Java version) there were folks there who lurked on the mailing lists that were part of the C port. Anyway, I mention it as searching those lists or signing up and asking questions might give appropriate insight. DM > On Feb 21, 2017, at 12:25 PM, Greg Hellings

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Karl Kleinpaste
On 02/21/2017 03:10 PM, Greg Hellings wrote: > The version currently packaged in Fedora is 1.2.24. Scratch other response -- got confused between mentions of clucene and xapian. Duh. ___ sword-devel mailing list: sword-devel@crosswire.org

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Karl Kleinpaste
On 02/21/2017 03:10 PM, Greg Hellings wrote: > The version currently packaged in Fedora is 1.2.24. Something is confused. 2.3.3.4 here, along with retro 0.9.21b. $ egrep '^(|mingw.*)clucene' /var/log/rpmpkgs clucene09-core-0.9.21b-16.fc24.i686.rpm clucene09-core-0.9.21b-16.fc24.x86_64.rpm

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Greg Hellings
If memory serves, that was back in pre-1.0 days. The version currently packaged in Fedora is 1.2.24. --Greg On Tue, Feb 21, 2017 at 2:06 PM, Karl Kleinpaste wrote: > On 02/21/2017 02:54 PM, Peter von Kaehne wrote: > > Xapian is the new default at svn head > > I

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Karl Kleinpaste
On 02/21/2017 02:54 PM, Peter von Kaehne wrote: > Xapian is the new default at svn head I experimented with Xapian in Xiphos a couple years ago. The indices it creates are of horrifyingly monstrous size. ___ sword-devel mailing list:

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Karl Kleinpaste
On 02/21/2017 12:25 PM, Greg Hellings wrote: > This is going to necessitate dropping the package from the MinGW > builds of Sword that I maintain for Fedora which will make future > releases of Xiphos for Windows incapable of offering Lucene based > searching. I will keep using "outdated" MinGW

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Greg Hellings
Really? I know there had been some conversations around Xapian and a brief start on a proof of concept, but I was unaware that it had made it into HEAD or even into living code at all. --Greg On Tue, Feb 21, 2017 at 1:54 PM, Peter von Kaehne wrote: > On Tue, 2017-02-21 at 11:25

Re: [sword-devel] Lucene Complaint

2017-02-21 Thread Peter von Kaehne
On Tue, 2017-02-21 at 11:25 -0600, Greg Hellings wrote: > > Is there any whiff of hope that we might be willing to move off of > depending on CLucene for advanced search support and onto a project > that has any amount of vitality? > I thought we had? Xapian is the new default at svn head

[sword-devel] Lucene Complaint

2017-02-21 Thread Greg Hellings
I know it's been mentioned and hinted at in the past, but I wanted to - again - lodge a complaint regarding the inertia of CLucene use in the engine. CLucene's last release, and last git commit on SourceForge was in 2013. It has had none of the language-specific updates that Lucene has generated

Re: [sword-devel] lucene indexing failing on some modules

2008-05-10 Thread Peter von Kaehne
A method for ignoring the minimum version should be added to the BibleTime module installer if it's not there already. You can always download and unzip a module by hand + open its config file and remove any offending bits. A bit haphazard, but in this situation maybe the simplest.

Re: [sword-devel] lucene indexing failing on some modules

2008-05-10 Thread Frank
Steven P. Ulrick wrote: On Fri, 09 May 2008 15:23:40 -0400 Karl Kleinpaste [EMAIL PROTECTED] wrote: Peter von Kaehne [EMAIL PROTECTED] writes: Tried a few: Please pick up BosworthToller (a dictionary module) from the beta repo and try it as well. Hello, Karl I just

[sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Karl Kleinpaste
I have found that indexing is not working on some modules, including some of my home-grown ones, but also a few Crosswire-distributed ones. Case in point, BosworthToller. This was first noticed from integrated support in GS, but use of mkfastmod on its own shows similar problems. In 15min wall

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Peter von Kaehne
Karl Kleinpaste wrote: I have found that indexing is not working on some modules, including some of my home-grown ones, but also a few Crosswire-distributed ones. Tried a few: WLC - flakes out with Error reading ulBuffNum WEB - fine Vulgate - fine Turkish2 - error message as above but

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Karl Kleinpaste
Peter von Kaehne [EMAIL PROTECTED] writes: Tried a few: Please pick up BosworthToller (a dictionary module) from the beta repo and try it as well. ___ sword-devel mailing list: sword-devel@crosswire.org

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Frank
Karl Kleinpaste wrote: Peter von Kaehne [EMAIL PROTECTED] writes: Tried a few: Please pick up BosworthToller (a dictionary module) from the beta repo and try it as well. When I connect to the beta repo, I just see Bibles. How can I get B-T and C-V? -- Blessings Frank

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Karl Kleinpaste
Frank [EMAIL PROTECTED] writes: When I connect to the beta repo, I just see Bibles. How can I get B-T and C-V? If you connect to it using any Sword app's module manager, you can see everything the repo has available. ___ sword-devel mailing list:

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Peter von Kaehne
Karl Kleinpaste wrote: Peter von Kaehne [EMAIL PROTECTED] writes: Tried a few: Please pick up BosworthToller (a dictionary module) from the beta repo and try it as well. 99% processor load on top, running for 3 mins - then I stopped it, had created a directory lucene, but no contents

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Manfred Bergmann
Will try to index some modules this weekend. Manfred Am 09.05.2008 um 20:12 schrieb Karl Kleinpaste: I have found that indexing is not working on some modules, including some of my home-grown ones, but also a few Crosswire-distributed ones. Case in point, BosworthToller. This was first

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Peter von Kaehne
As an aside, lucene indeces are stored by sword in the module file, but by jsword in a separate place inside .jsword Is there a reason behind this? ___ sword-devel mailing list: sword-devel@crosswire.org

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread DM Smith
Peter von Kaehne wrote: As an aside, lucene indeces are stored by sword in the module file, but by jsword in a separate place inside .jsword Is there a reason behind this? Several reasons: JSword used Lucene first. Our search engine had lots to be desired. We had to put it somewhere. Joe

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Chris Little
BosworthToller has an error, either due to encoding or to an import error, which causes it to loop on the entry for 1. Any frontend or utility that iterates through all keys should hang at the same point. --Chris Karl Kleinpaste wrote: I have found that indexing is not working on some

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread DM Smith
Karl Kleinpaste wrote: I have found that indexing is not working on some modules, including some of my home-grown ones, but also a few Crosswire-distributed ones. Case in point, BosworthToller. This was first noticed from integrated support in GS, but use of mkfastmod on its own shows similar

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread DM Smith
Chris Little wrote: BosworthToller has an error, either due to encoding or to an import error, which causes it to loop on the entry for 1. Any frontend or utility that iterates through all keys should hang at the same point. BibleDesktop has no problems with it, but has the first entry with

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Frank
Karl Kleinpaste wrote: Frank [EMAIL PROTECTED] writes: When I connect to the beta repo, I just see Bibles. How can I get B-T and C-V? If you connect to it using any Sword app's module manager, you can see everything the repo has available. I connected with Bibletime 1.6.4. I see a

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Karl Kleinpaste
Frank [EMAIL PROTECTED] writes: I connected with Bibletime 1.6.4. I see a number of Bibles in various languages. That's all the repo says it has. BibleTime (I have 1.6.5) shows only those modules which aren't already installed. Because I already have all the beta modules installed, when I

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Frank
Karl Kleinpaste wrote: Frank [EMAIL PROTECTED] writes: I connected with Bibletime 1.6.4. I see a number of Bibles in various languages. That's all the repo says it has. BibleTime (I have 1.6.5) shows only those modules which aren't already installed. Because I already have all the

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Steven P. Ulrick
On Fri, 09 May 2008 15:23:40 -0400 Karl Kleinpaste [EMAIL PROTECTED] wrote: Peter von Kaehne [EMAIL PROTECTED] writes: Tried a few: Please pick up BosworthToller (a dictionary module) from the beta repo and try it as well. Hello, Karl I just tried to do just that, and I have a slight

Re: [sword-devel] lucene indexing failing on some modules

2008-05-09 Thread Chris Little
Steven P. Ulrick wrote: I will be more than glad to test the BosworthToller module if you can help me with the following: 1. If I really do need Sword 1.5.11 to use this module, how do I get if if the most recent version in SVN is 1.5.10? BosworthToller does require 1.5.11 in that it

[sword-devel] Lucene, new version soon

2008-01-13 Thread DM Smith
FYI, Lucene (Java) will be releasing version 2.3 soon. This release will mark yet again a significant performance increase. The JSword index code running under Lucene 1.4.3 took 5 minutes to index the ESV. Currently, it takes ~1 minute using 2.2. With 2.3, it will take ~25 seconds. The

Re: [sword-devel] Lucene, new version soon

2008-01-13 Thread Eeli Kaikkonen
On Sun, 13 Jan 2008, DM Smith wrote: With regard to cLucene, this is important because cLucene is still at 1.4.3 compatibility, even after Lucene 2.0 was released 19 months ago (May 2006). There are only a few active developers on cLucene and while much has been done to port 2.0, there still

[sword-devel] Lucene 2.0

2006-02-15 Thread DM Smith
The lucene folks are planning a 1.9rc on Monday, 20 Feb with 1.9, a week later on 27 Feb. As with any volunteer effort, those are not set in stone. At this time there is not a date for 2.0. Version 1.9 maintains backward compatibility with 1.4.3 but deprecates methods not going forward in

[sword-devel] Lucene phonetic search

2005-05-02 Thread DM Smith
Lucene will be adding the ability to do phonetic searches. I think that this will be of interest for those that cannot spell. For more info you can go here: http://issues.apache.org/bugzilla/show_bug.cgi?id=10340 ___ sword-devel mailing list:

Re: [sword-devel] Lucene phonetic search

2005-05-02 Thread Martin Gruner
Will CLucene do that too? mg Am Montag, 2. Mai 2005 22:05 schrieb DM Smith: Lucene will be adding the ability to do phonetic searches. I think that this will be of interest for those that cannot spell. For more info you can go here: http://issues.apache.org/bugzilla/show_bug.cgi?id=10340

Re: [sword-devel] Lucene phonetic search

2005-05-02 Thread DM Smith
I've been lurking on the lucene list serve and reading a bunch on lucene and while I don't know the answer to your question, I am under the impression that CLucene is actively working on coming out with a port of Lucene 2.0 when or soon after it comes out. The link I gave contained the actual