Hi Tony,

* A.J.Mechelynck on Saturday, September 23, 2006 at 17:35:25 +0200:
> Christian Ebert wrote:
>> * A.J.Mechelynck on Saturday, September 23, 2006 at 09:57:40 +0200:
>>> #1.
>>> cat file1.utf8.txt file2.latin1.txt file3.utf8.txt > file99.utf8.txt
>>> 
>>> will produce invalid output unless the Latin1 input file is actually 
>>> 7-bit US-ASCII. This is not a limitation of the "cat" program (which 
>>> inherently never translates anything) but a false manoeuver on the part 
>>> of the user.
>> 
>> Hm, I want illegal stuff, hehe.
> 
> Then don't use UTF-8 files.

Yup. Basically I can't edit files with mixed encodings. What
fooled me was that if I do in an utf-8 environment:

$ echo 'Vögel' >file-utf8.txt

and then "illegally":

$ echo 'Vögel' | iconv -f utf-8 -t iso-8859-1 >>file-utf8.txt
$ vim file-utf8.txt

Vim then decides to convert to latin1 automatically for
representation:

#v+
Vögel
Vögel
#v-

Makes sense as Vim considers 'ö' as legal latin1 chars. And
apparently there is no way to force Vim in a less sensible way ;)
like to represent the illegal chars with a placeholder.

Blinded by my (dirty workaround) purpose I hoped for a way to
force Vim /not/ to convert.

>>> #2.
>>> gvim
>>> :if &tenc == "" | let &tenc = &enc | endif
>>> :set enc=utf-8 fencs=utf-bom,utf-8,latin1
>>                             ucs-bom
>>> :e ++enc=utf-8 file1.utf8.txt
>>> :$r ++enc=latin1 file2.latin1.txt
>>> :$r ++enc=utf-8 file3.utf-8.txt
>>> :saveas file99.utf8.txt
>> 
>> Then file99.utf8.txt is the same as the one produced with the
>> cat command. Which is actually what I want.
> 
> No. It is what the one produced with the cat command should have been, with 
> the Latin1 accented characters properly converted to UTF-8.

You are right, of course.

To summarize:

I tried to work around a shortcoming in a LaTeX package (it can't
parse utf input).

For my purposes the easiest workaround would have been the
dirtiest:

[LaTeX pseudo-code]
#v+
\usepackage[utf8]{inputenc}
\usepackage{soul}% <- the package in question
....
Loads of legal utf-8 text ...

\begingroup\inputencoding{latin1}
\caps{short text in illegal iso-8859-1}
\endgroup

Loads of legal utf-8 text ...
#v-

This does not work in one file if I want to continue to edit the
"loads of legal utf-8 text" in Vim.

In the above simple case I could do:

$ voeg=`echo 'Vögel' | iconv -f utf-8 -t iso-8859-1`; \
sed -i~ -e "s/\\caps{.*}/\\caps{$voeg}/" file-utf8.tex

to get the result (LaTeX output) I wanted.

Or I could write the group around \caps in a latin1 file and
\input it, or decide to switch to a latin1 environment ...

... or rewrite the LaTeX-package to accept utf-8 encoding --
which would be the cleanest solution, but unfortunately over my
head ATM.

So, what I had in mind was too dirty (for Vim).

Thanks for taking your time, Tony.

c
-- 
_B A U S T E L L E N_ lesen! --->> <http://www.blacktrash.org/baustellen.html>

Reply via email to