Re: Garbage in comments

Dirk Wed, 12 Apr 2006 12:53:45 -0700

Later I recognized that the encoding setting is not enough in XML to
force the correct encoding since it is still discouraged to output
specific character.
I suspect that this is an input versus output issue. Doesn'tSubversion do everything with UTF-8 internally? So if the XML isultimately intended to migrate to svn input, I'd suggest converting toUTF-8 early in the process, and make all the XML UTF-8.

Yes, but i hoped, that Perl has a decent XML parser with it already, soI could left this task to Perl.

This is more a policy question than a technical question. Anythingbelow 0x20 that's not white space can probably be dropped as noise.Stuff between 0x7f and 0x9f has multi-byte UTF-8 equivalents.

Currently I try to put everything printeable into the XML. The onlyproblem: Linux doesn't know CP1252. So it can't decide what is printeable.

That's what I did. But it didn't worked. According to
http://www.w3.org/TR/REC-xml/#charsets some characters, that are still
allowed in the the windows-1252 codepage, are discouraged in XML. esp.
most of the characters in the band [x80-x9f].
They're not discouraged. They just have a different encoding. That'swhere you need the suggested table lookup to generate the multi-byteequivalent.

They explicitly state on the mentioned side: "The characters defined inthe following ranges are also discouraged"

To cut a long story short: Sooner or later I will add a windows-1252 toUtf8 converter into ssphys, so that the output XML is in UTF8. But Ican't do it within the next time (roughly 3 weeks). I have no objectionsif someone else does the job. I just want to keep ssphys lean and mean.


Dirk

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org

Re: Garbage in comments

Reply via email to