Re: [sword-devel] diatheke search type regex and the dot ?

2017-05-21 Thread Troy A. Griffitts
So, I did a little experimenting this weekend and found that the ICU RegEx engine is actually really capable. o It's fast. o It supports {n,m} characters instead of bytes o It even works (though a little slow) with lookaheads and lookbacks, e.g., for words in any order:

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-24 Thread Jaak Ristioja
Another possibility is to use Boost.Xpressive [1], which I think supports the Perl regular expressions at runtime, and also static regular expressions using C++ syntax: using namespace boost::xpressive; // sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); sregex rex = (s1= +_w) >> ' '

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-07 Thread David Haslam
Thanks, Karl, Xiphos 4.0.4 in Windows 7 x64 gave this: S:\>xiphos\diatheke -b KJV -s regex -k Abed...nego Verses containing "Abed...nego"-- Daniel 1:7 ; Daniel 2:49 ; Daniel 3:12 ; Daniel 3:13 ; Daniel 3:14 ; Daniel 3:16 ; Daniel 3:19 ; Daniel 3:20 ; Daniel 3:22 ; Daniel 3:23 ; Daniel 3:26 ;

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/06/2017 09:06 PM, DM Smith wrote: > Does setting CLANG (or whatever it is) in the env help? In unix you > have to tell the program what charset you are using. They already come along for the ride for free as a result of logging in, per default specification when system was installed. $

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread DM Smith
Does setting CLANG (or whatever it is) in the env help? In unix you have to tell the program what charset you are using. Cent from my fone so theer mite be tipos. ;) > On Mar 6, 2017, at 7:52 PM, Karl Kleinpaste wrote: > >> On 03/06/2017 05:25 PM, Greg Hellings wrote: >>

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Troy A. Griffitts
Yeah, so this page shows that c11x regex is still mostly unsupported in gcc: http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1 (see section 7) And the old school gnu regex we use otherwise I don't think knows anything about wide chars. It simply compares bytes and

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/06/2017 05:25 PM, Greg Hellings wrote: > being off by 2 would seem strange to me I don't understand this question at all. 0xE2 = 226 = 0342 0x80 = 128 = 0200 0x93 = 147 = 0223 There's no off-by error at all. "od" is the "octal dump" tool; given -c, it tries to dump characters, but outside

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Greg Hellings
On Mon, Mar 6, 2017 at 4:15 PM, David Haslam wrote: > Are we sure it's an "off by 2" error and not just an email typo? > I'm not sure of that at all. It was my first guess, but being off by 2 would seem strange to me, as I would expect a "fat finger" error to produce an

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread David Haslam
Are we sure it's an "off by 2" error and not just an email typo? I wasn't expecting decimal, I just didn't parse it as octal. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656914.html Sent from the SWORD Dev

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Greg Hellings
147 = 0223 (octal) 128 = 0200 (octal) 226 = 0340 (octal) So it's off by 2 in the top order byte. Not sure why, but it seems you're expecting decimal but the tool is obviously giving out octal. --Greg On Mon, Mar 6, 2017 at 3:02 PM, David Haslam wrote: > Thanks Karl, > >

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread David Haslam
Thanks Karl, All the "hyphenated" names in the KJV OT use the *en dash* character U+2013 which has 3 UTF-8 bytes E2 80 93. In decimal, these are 226 128 147 so we might well wonder how your tool gave 342 200 223 ? Best regards, David -- View this message in context:

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/03/2017 09:16 PM, Troy A. Griffitts wrote: > SWORD supports compiling with a variety of regex engines I have an interesting result. My previous build of sword used --with-cxx11regex, and that failed to find Abednego in any circumstance. Reconfiguring without that option and rebuilding, I

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-04 Thread David Haslam
Corrigendum: "everything outside ASCII" -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656901.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-04 Thread David Haslam
Thanks Troy, The precise /flavour/ of *regex* supported by diatheke search really needs to be properly documented. Expecting the *dot* to be a byte when we're handling Unicode is just not on at all. I'm struggling more because I'm on Windows, where the UTF-16 verse UTF-8 disparity affects

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread Troy A. Griffitts
SWORD supports compiling with a variety of regex engines-- typically GNU regex on most linux system. We include 'internal regex' copy of this, as well. We also will compile against the C++ standard regex engine including the language spec. Each handles unicode characters different. . is

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread David Haslam
Created http://tracker.crosswire.org/browse/MODTOOLS-101 David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656890.html Sent from the SWORD Dev mailing list archive at Nabble.com.

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread David Haslam
So what flavour of regex does diatheke actually use under Linux? Why is it that the *dot metacharacter* is not recognized? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656889.html Sent from the SWORD Dev

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread Karl Kleinpaste
On 03/02/2017 02:14 PM, Greg Hellings wrote: > I also get no results. On the other hand... $ mod2imp KJV | grep -B1 -i abed.nego | fgrep '$$' $$$Daniel 1:7 $$$Daniel 2:49 $$$Daniel 3:12 $$$Daniel 3:13 $$$Daniel 3:14 $$$Daniel 3:16 $$$Daniel 3:19 $$$Daniel 3:20 $$$Daniel 3:22 $$$Daniel 3:23

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread David Haslam
Typo was only in the message, sorry! The actual test in Windows shell with the -k there didn't give any matches. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656884.html Sent from the SWORD Dev mailing list

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread Greg Hellings
$ diatheke -b KJV -s regex -k Abed.nego Verses containing "Abed.nego"-- none (KJV) Once I correct the command to include the -k parameter, I also get no results. --Greg On Thu, Mar 2, 2017 at 12:58 PM, David Haslam wrote: > I was under the impression that the

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread David Haslam
I suspect this may be a further symptom of what Greg suggested as the explanation in my other thread. i.e. That SWORD expects to search in UTF-8 encoded text, whereas Windows uses UTF-16 internally. Still can't quite make out why the dot isn't treated how regular expressions use it. David --

[sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread David Haslam
I was under the impression that the metacharacter *dot* in a regex means "any single character". It would seem that for diatheke with *-s regex* this is not the case at all. Example: diatheke -b KJV -s regex Abed.nego In Windows command shell, that command line does not find the 15 instances