Package: graphviz Version: 2.2.1-1sarge1 Severity: normal Steps to reproduce: 1) cat > hello.dot << EOF digraph g { a -> b; b [label="testiƤ"]; } EOF 2) dot -Tsvg hello.dot > hello.svg 3) inkscape hello.svg
Expected results: 3) if step 2 completeled successfully "hello.svg" should be a valid SVG file and inkscape should open it. Actual results: 3) Inkscape fails to open the file and shows the following error: hello.svg:17: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE4 0x3C 0x2F 0x74 <text text-anchor="middle" x="33" y="99">testiƤ</text> First line of hello.svg is <?xml version="1.0" encoding="UTF-8" standalone="no"?> If I chage this to <?xml version="1.0" encoding="iso-8859-1" standalone="no"?> then inkscape is able to open the file correctly. I suggest that either a) dot should only accept UTF-8 input and refuse to continue if it reads something else, b) dot should support specifying charset with a command line option, or c) dot should support specifying both input and output charset and do conversions between these (this might be overkill) At least b) should be very easy to do with something like --- ./orig/graphviz-2.2.1/dotneato/common/svggen.c 2004-12-11 21:26:05.000000000 +0200 +++ ./graphviz-2.2.1/dotneato/common/svggen.c 2005-10-26 12:25:41.000000000 +0300 @@ -475,8 +475,12 @@ /* Pages = pages; */ N_pages = pages.x * pages.y; - svg_fputs - ("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n"); + svg_fputs("<?xml version=\"1.0\" encoding=\""); + if ((s = agget(g, "encoding")) && s[0]) + svg_fputs(s); + else + svg_fputs("UTF-8"); + svg_fputs("\" standalone=\"no\"?>\n"); if ((s = agget(g, "stylesheet")) && s[0]) { svg_fputs("<?xml-stylesheet href=\""); svg_fputs(s); and then use dot -Gencoding=iso-8859-1 -Tsvg hello.dot > hello.svg -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.4.27-2-k7 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages graphviz depends on: ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an ii libexpat1 1.95.8-3 XML parsing C library - runtime li ii libfontconfig1 2.3.1-2 generic font configuration library ii libfreetype6 2.1.7-2.4 FreeType 2 font engine, shared lib ii libice6 4.3.0.dfsg.1-14sarge1 Inter-Client Exchange library ii libjpeg62 6b-10 The Independent JPEG Group's JPEG ii libpng12-0 1.2.8rel-1 PNG library - runtime ii libsm6 4.3.0.dfsg.1-14sarge1 X Window System Session Management ii libx11-6 4.3.0.dfsg.1-14sarge1 X Window System protocol client li ii libxaw7 4.3.0.dfsg.1-14sarge1 X Athena widget set library ii libxext6 4.3.0.dfsg.1-14sarge1 X Window System miscellaneous exte ii libxmu6 4.3.0.dfsg.1-14sarge1 X Window System miscellaneous util ii libxpm4 4.3.0.dfsg.1-14sarge1 X pixmap library ii libxt6 4.3.0.dfsg.1-14sarge1 X Toolkit Intrinsics ii tcl8.4 8.4.9-1 Tcl (the Tool Command Language) v8 ii tk8.4 8.4.9-1 Tk toolkit for Tcl and X11, v8.4 - ii xlibs 4.3.0.dfsg.1-14sarge1 X Keyboard Extension (XKB) configu ii zlib1g 1:1.2.2-4.sarge.2 compression library - runtime -- no debconf information