Le 30/10/2013 16:13, "Jörg Knappen" a écrit :
Thanks again!
My updated sed pattern generator now looks like:
r = range(0xa0, 0x170)
file = open("fixu8.sed", "w")
for i in r:
pat1 = "s/"+unichr(i).encode("utf-8").decode("latin-1").encode("utf-8") + "/" + unichr(i).encode("utf-8") +"/g"
  print >>file, pat1
  try:
pat2 = "s/"+unichr(i).encode("utf-8").decode("windows-1252").encode("utf-8") + "/" + unichr(i).encode("utf-8") +"/g"
  except:
    pat2 = pat1
  if (pat1 != pat2):
    print >>file, pat2
doing both latin-1 and windows-1252 mangled double utf-8. This is probably enough for now, the rate of errors is low enough for practical purposes (i.e., lower than the natural error rate introduced by typing errors)

Why to you do both latin1 and windows-1252 ? Windows-1252 is supposed to be a superset of latin1, so it should be enough. Or is there a problem with the few undefined bytes of windows-1252 (81, 8D, 8F, 90, 9D) ?


    Frédéric

Reply via email to