Re: [basex-talk] how to pass raw bytes intact?

2013-01-02 Thread Christian Grün
As Liam indicated (thanks!), XQuery may not be the best choice to process data on byte level: XQuery was built to work with Unicode characters as basic unit, which means that it will never be possible with pure XQuery to create illegal UTF8 sequences. This also means that the language provides no

Re: [basex-talk] how to pass raw bytes intact?

2012-12-31 Thread Liam R E Quin
On Tue, 2013-01-01 at 10:52 +0800, jida...@jidanni.org wrote: I'm just trying to find a way to remove the wbr/ injected here, $ echo 'A你好/A'|perl -pwle 's![^[:ascii:]]!$wbr/!'|qprint -e A=E4wbr/=BD=A0=E5=A5=BD/A I don't have a qprint command on my system, so I'm not sure what's going on for

Re: [basex-talk] how to pass raw bytes intact?

2012-12-31 Thread jidanni
LREQ Your perl substitution is putting wbr/ after the first non-ascii LREQ character on the line, and 你 is for sure not an ascii character, LREQ so you get wbr/ after it. Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8. I was just curious if there was a way in basex if I

Re: [basex-talk] how to pass raw bytes intact?

2012-12-30 Thread jidanni
Our mission today is to use Basex to remove tags injected right between the bytes of multibyte UTF-8 characters. http://www.couchsurfing.org/group_read.html?gid=430post=13986932 CG == Christian Grün christian.gr...@gmail.com writes: CG Have you tried method=raw, as mentioned in our