In URLs, the character '#' is used to indicate a fragment-id.

However, when specifying a filename (rather than a URL), '#' is a valid 
file character, although it may need to be quoted to protect it from the 
shell.

Notwithstanding the conventions of URLs, it should be possible to 
specify such a filename on the command line for XXE or the validation 
tools dtdvalid, xsdvalid, etc.

Regardless of the existence of a file /tmp/#test.xml, I get:

$ dtdvalid /tmp/\#test.xml
cannot load '/tmp/#test.xml': file:/tmp/#test.xml:1:0: syntax error

Running xxe /tmp/\#test.xml gives a similar error in a popup.

While it is possible to successfully %-encode the '#' for dtdvalid, e.g. 
dtdvalid /tmp/%23test.xml, it means that if there were a file with the 
name /tmp/%23test.xml, I would have to use the command dtdvalid 
/tmp/%2523test.xml to validate it.  This is difficult and awkward in 
shell scripts, and in any case, doesn't work with XXE, which gives a 
pop-up with the error:

"/tmp/%23test.xml" is not an URL or a file name.

It is possible to get XXE to open file:/tmp/%23test.xml, but it will 
only open it read-only (just as well, I have no idea what filename it 
would write it out as if I saved it).

It seems to me that in any context where you accept a URL or a file 
name, if there is no leading file: (or other URL scheme) you should 
treat the name as a file name, and convert it to a URL by %-escaping any 
reserved characters.

I realize this is a minor quibble, but until recently (XXE 2.4?) this 
used to work correctly, and when it changed, it broke my CVS commit 
validation scripts that ran validation on temporary files with # in the 
pathname.  I've worked around this by eliminating the '#', but it may 
cause confusion or problems for others in the future.

@alex
-- 
mailto:dupuy at sysd.com


Reply via email to