Hi all,
I just committed a little workaround for a problem (using the client
libraries) doing a propfind on any property with '-' in the property
name (current-user-privelege-set, specifically).
The library was doing this: to split a string between namespace
abbreviation and name, when looking at the element name, it went
backwards from the end, checking each character to see if it was a valid
part of an xml name. If it gets to the start, there's no namespace. If
it doesn't, then it can split into namespace and name. This is fine, but
the check for 'valid part of xml name' appeared broken.
It was calling Character.isUnicodeIdentifierPart(chr). The javadoc is
somewhat evasive on what _exactly_ this does (I assume it's fairly well
defined by the unicode standard, but I couldn't find the right
information in a look around unicode.org, and I don't have a copy of the
unicode book). The XML spec is explicit about exactly which codepoints
are allowed. Specifically, as well as several standard unicode character
classes (letters, digits, etc.), it gives '-', '.', and '_' as allowed.
isUnicodeIdentifierPart() rejects '-' (the javadoc says that _ is
allowed. I think (though I didn't test this explicitly) that '.' is also
allowed).
For now, I've changed this check to isUnicodeIdentifierPart(chr) ||
chr=='-'. This works, but seems terribly inelegent. I also suspect that
isUnicodeIdentifierPart() lets through some things that it shouldn't,
though I haven't checked into this properly.
Are there any unicode and/or xml experts on the list who could weigh in
with an opinion on what the correct check is, here? It's a minor detail,
really, but it's the sort of minor detail that has a tendency to bite
you later if you don't get it right the first time.
Michael