> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Futter
> Sent: 13 December 2004 01:28
> To: [EMAIL PROTECTED]
> Subject: Re: [WSG] Validating unicode files
>
> On 13/12/04 8:23 AM, "Matthew Cruickshank"
> <[EMAIL PROTECTED]> wrote:
>
> > Hi chaps,
> >
> > When it comes to text encoding the character range from
> 127-255 is, as
> > I understand it, disputed territory. In that all kinds of regional
> > hacks were used over the years and with Unicode they're no longer
> > neccessary so I should avoid this range. I was just copying
> some text
> > together and my xml parser didn't like it because of some
> characters in this range.
See W3C's FAQ "HTML, XHTML, XML and Control Codes"
http://www.w3.org/International/questions/qa-controls
> > It seems that even when you tell notepad.exe to save as utf-8 it
> > sometimes doesn't.
I've never experienced that. It only saves as something else if I forget to
do SaveAs or remove the byte order mark. Also, you should make sure that
your server is not overriding the encoding of your file by serving an
incorrect HTTP header.
> >
> > So is there a bit of software to validate UTF-8 encoded files?
The W3C Validator works fine on UTF-8 encoded files. It can also be useful
for determining the encoding of your file.
> >
> >
> > .Matthew Cruickshank
> > http://holloway.co.nz/
>
> My understanding is that it's a known 'feature' of Notepad to
> add some internal proprietary identifier to UTF-8 encoded
> files that actually render them invalid, so-to-speak. I'm
> sure someone else can explain it better than I just did!
See W3C's FAQ "Unexpected characters or blank lines"
http://www.w3.org/International/questions/qa-utf8-bom (esp the background)
The UTF-8 BOM or signature doesn't render the file invalid, but may produce
some unexpected effects in certain browsers.
>
> I've found this article quite useful, though it may not
> necessarily directly address your problem:
>
> http://www.joelonsoftware.com/articles/Unicode.html
>
> --
> Kevin Futter
> Webmaster, St. Bernard's College
> http://www.sbc.melb.catholic.edu.au/
>
Hope that helps. (Please let me know if there's a way to improve our
articles, or add useful new ones.)
Richard Ishida
W3C
contact info:
http://www.w3.org/People/Ishida/
W3C Internationalization:
http://www.w3.org/International/
Publication blog:
http://people.w3.org/rishida/blog/
**
The discussion list for http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**