---- you "Stephane Bailliez" <[EMAIL PROTECTED]> wrote ----
> Now the final question: what is the systemid syntax for files ? :-)

Ah-ha! I was just looking into that yesterday.  I think I've come to a
'good enough' conclusion although I'm still not quite happy with it.  (Of
course the difference what the specs say and what common programs actually
implement is also a point to remember...)
Comments or corrections (especially with references to well-known specs)
are definitely appreciated!

---- Some sample URLs that I think are correct
Absolute URL on UNIX:     file:///usr/dir/file.txt
Corresponding path:       /usr/dir/file.txt

Absolute URL on Windows:  file:///c:/bin/file.txt
Corresponding path:       c:\bin\file.txt

Relative URLs:
Note that relative URLs never start with scheme: names; you find the scheme
from the Base URI of whatever enclosing document you have.

/usr/dir/file.txt   -> is a relative URL to an abs_path, which presuming
your Base URI is file:something, would be just /usr/dir/file.txt on UNIX
and \usr\dir\file.txt on Windows.

dir/file.txt        -> is a relative URL to a rel_path, which is just what
you'd expect: relative hierarchical path related to the Base URI, in most
cases for file:, to the current directory or to the current file being
processed.

For a while I was wanting to put 4 slashes in the UNIX case, but I think
there should be three.  Other obvious notes are that only forward slashes
are ever allowed in URL's as separators immaterial of what system you're
on.  And relative URL's probably look different than you expect, since most
of them don't have a scheme: on the front, and on UNIX at least will
seemingly look almost identical to local file paths (even though they
aren't actually identical).

Digging through RFC 2396:'3. URI Syntactic Components' it's not quite clear
from the discussion about the separator between authority and abs_path if
the separator should always be there; or rather if any leading forward
slash character on a path_segment should be maintained alongside the
separator.

I guess the confusion is likely due to the fact the hierarchical part of
URL's does *not* map directly to any filesystem, even in the UNIX case when
it sure looks like it.  The point is the hier_part of a URL specifies some
conceptual number of levels of organization (directories, usually) that
some object (a file, usually) lives in.  You then conceptually map each of
these levels of organization to your local system's conventions by taking
each level and re-mapping it - for file:, this is to directory levels.  But
the separators are completely conceptually separate between the URL form
and the local filename form.

----- Recommended reading:
RFC 2396:'3.3. Path Component'
RFC 2396:'5. Relative URI References'
RFC 2396:'A. Collected BNF for URI'
RFC 2396:'G.2. Modifications from both RFC 1738 and RFC 1808'

RFC 1738:'5. BNF for specific URL schemes' for the original definition of
file: scheme; note that 2396 has superceded or updated this RFC, so I'm not
sure which one to believe completely although most of 2396 carefully avoids
giving examples about file:.

---- References:
<http://www.ietf.org/rfc.html> or <http://www.rfc-editor.org/rfc.html>
RFC 2396, 1808, 1738 (any others directly relevant?)

---- Xalan note:
I'm sure there are a couple of places where Xalan code does not handle
relative URL's correctly.  Help us find them and I'll definitely work on
fixing them.  I have a basic SystemIDTest already although I both need to
update it a bit and also create another SpecialSystemIdTest that covers
other non-file: URLs (which require special network setup obviously, so I
may exclude it from the smoketest, etc.)

Oh: side note: the java.net.URL class is notoriously bad at manipulating
URL formats, so don't count on it to tell you what the spec says.

- Shane

Reply via email to