Re: What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Mark Davis ☕
Popping up a level. ICU (and some other libraries) have heuristic encoding detection, that will take a sequence of bytes and come up with a likely encoding id. Mark * * *— Il meglio è l’inimico del bene —* ** On Fri, Jul 19, 2013 at 8:40 PM, Whis

Re: What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Peter Edberg
On Jul 19, 2013, at 12:42 PM, Mark Davis ☕ wrote: > Popping up a level. > > ICU (and some other libraries) have heuristic encoding detection, that will > take a sequence of bytes and come up with a likely encoding id. However, the ICU encoding detection typically requires more than 4 bytes (

Re: What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Karl Williamson
On 07/19/2013 11:51 AM, Costello, Roger L. wrote: Hi Folks, Suppose that these hex bytes: C3 83 C2 B1 show up in a message and the message contains no hint what its encoding is. Perhaps it is 8859-1, in which case the message consists of four 1-byte characters: C3 = Ã 83 = the “no b

RE: What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Whistler, Ken
> Suppose that these hex bytes: > > C3 83 C2 B1 > > show up in a message and the message contains no hint what its encoding is. > > Perhaps it is 8859-1, in which case the message consists of four 1-byte > characters: > > C3 = Ã > 83 = the “no break here” character > C2 = Â > B1 = ± >

What does one do if the encoding is unknown and all you have is a sequence of bytes?

2013-07-19 Thread Costello, Roger L.
Hi Folks, Suppose that these hex bytes: C3 83 C2 B1 show up in a message and the message contains no hint what its encoding is. Perhaps it is 8859-1, in which case the message consists of four 1-byte characters: C3 = Ã 83 = the “no break here” character C2 = Â B1 = ± Perhaps it is