With [Char] and (Seq Char) the text is full unicode.

With ByteString and ByteString.Lazy you are really using
ByteString.Char8 and ByteString.Lazy.Char8

Here is a test (I saved the source file in utf8):

import Text.Regex.TDFA
text = "☮☯♲☢☣☠☃"
regex = "(☢|☣)"
search :: [[String]]
search = text =~ regex
main = do
  print text
  print regex
  print search

in ghci this prints:

*Main> main
main
"\9774\9775\9842\9762\9763\9760\9731"
"(\9762|\9763)"
[["\9762","\9762"],["\9763","\9763"]]

So this works.  Are you using bytestrings to hold unicode as utf-8 or
utf-16 ?



On Mar 20, 10:17 am, Jean-Philippe Bernardy
<[email protected]> wrote:
> Am I right that this library does not support unicode in regexes?
> Searching for unicode strings in Yi does not work, but ny cursory
> browsing of the code, I cannot find the reason why.
>
> Thanks,
> JP.
>
> On Wed, Mar 18, 2009 at 1:23 PM, ChrisK <[email protected]> wrote:
> > I have just uploaded the new regex-tdfa-1.1.0 to hackage.  This version is a
> > small performance update to the old regex-tdfa-1.0.0 version.
>
> > Previously all text (e.g. ByteString) being search was converted to String
> > and sent through a single engine.
>
> > The new version uses a type class and SPECIALIZE pragmas to avoid converting
> > to String.  This should make adding support for searching other Char
> > containers easy to do.
>
> > The new version includes six specialized engine loops to take advantage of
> > obvious optimizations of the traversal.  The previous version had only a
> > couple of such engines.  The new code paths have been tested for correctness
> > and no performance degradations have shown up.
>
> > --
> > Chris
> > _______________________________________________
> > Libraries mailing list
> > [email protected]
> >http://www.haskell.org/mailman/listinfo/libraries
--~--~---------~--~----~------------~-------~--~----~
Yi development mailing list
[email protected]
http://groups.google.com/group/yi-devel
-~----------~----~----~----~------~----~------~--~---

Reply via email to