On Thu, 1 Feb 2018 19:20:04 +0000 Richard Wordingham via Unicode <[email protected]> wrote:
> A regular trace expression of the form > > [:ccc=1:][:ccc=2:]…[:ccc=n:] > > seems to require 2^n states in your scheme. As I effectively only > apply the regex to NFD input strings, I use fewer states. However, > the efficiency of my scheme depends on the order of the commuting > factors - reverse order would require the 2^n states. I've overstated the compactness of my scheme. Firstly, I split the state for an optionally final matched character into two states according to whether it is to be the final character or not. Secondly, the DFA for a Unicode character is quite large. I've kept it simple and identify most states by the matched Unicode character, which means I have nearly a million states, whereas I could probably whittle it down to more like a thousand or so, at a vast increase in complexity. Richard.

