As Liam indicated (thanks!), XQuery may not be the best choice to
process data on byte level: XQuery was built to work with Unicode
characters as basic unit, which means that it will never be possible
with pure XQuery to create illegal UTF8 sequences. This also means
that the language provides no s
> "LREQ" == Liam R E Quin writes:
LREQ> Treating the individual UTF-8 octets individually?
Yes.
LREQ> Not in standard XQuery, but that doesn't preclude a BaseX extension...
Well no big deal, I was just curious.
>> I was just curious if there was a way in basex if I could do s!!!g
>> like I can
On Tue, 2013-01-01 at 11:47 +0800, jida...@jidanni.org wrote:
> Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8.
Treating the individual UTF-8 octets individually?
Not in standard XQuery, but that doesn't preclude a BaseX extension...
> I was just curious if there was a w
LREQ> Your perl substitution is putting after the first non-ascii
LREQ> character on the line, and 你 is for sure not an ascii character,
LREQ> so you get after it.
Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8.
I was just curious if there was a way in basex if I could do
On Tue, 2013-01-01 at 10:52 +0800, jida...@jidanni.org wrote:
> I'm just trying to find a way to remove the injected here,
> $ echo '你好'|perl -pwle 's![^[:ascii:]]!$&!'|qprint -e
> =E4=BD=A0=E5=A5=BD
I don't have a qprint command on my system, so I'm not sure what's going
on for you here. Your p
> "CG" == Christian Grün writes:
CG> Jidanni,
>> echo '你好'|perl -pwle 's![^[:ascii:]]!$&!'|basex -q '
>> declare option db:parser "html";
>> declare option output:method "raw";
>> doc("/dev/stdin")//*:wbr/..'
CG> If you want help, please try to help, too. Your example is not what I
CG> would
Jidanni,
> echo '你好'|perl -pwle 's![^[:ascii:]]!$&!'|basex -q '
> declare option db:parser "html";
> declare option output:method "raw";
> doc("/dev/stdin")//*:wbr/..'
If you want help, please try to help, too. Your example is not what I
would call very helpful; give us at least
Our mission today is to use Basex to remove tags injected right between
the bytes of multibyte UTF-8 characters.
http://www.couchsurfing.org/group_read.html?gid=430&post=13986932
> "CG" == Christian Grün writes:
CG> Have you tried method=raw, as mentioned in our documentation
CG> (http://doc
8 matches
Mail list logo