>
> It's UTF-8 with a UTF-16 BOM prepended, which is not uncommon when you
> have a file that's been converted to UTF-8 from UTF-16 or has been
> produced by shitty Microsoft software. You can tell instantly at a
> glance that it's not UTF-16 because the ascii dump would l.o.o.k.
> .l.i.k.e. .t.h.i
>
> Try setting encoding to: "utf-8-sig".
>
> 'eb bb bf' is the byte order mark for UTF8 (most systems do not include
> this in UTF-8 encoded files)
>
> Python will correctly read UTF8 BOMs if you use the 'utf-8-sig' encoding
> when reading files
>
Excellent, thanks. That worked like a charm. Know
On 2021-02-09, Skip Montanaro wrote:
> I downloaded US hospital ICU capacity data this morning from this page:
>
> https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility
>
> (The download link is about halfway down the page.)
>
> Trying to read it using my p
Try setting encoding to: "utf-8-sig".
'eb bb bf' is the byte order mark for UTF8 (most systems do not include
this in UTF-8 encoded files)
Python will correctly read UTF8 BOMs if you use the 'utf-8-sig' encoding
when reading files
Steve
On Tue, Feb 9, 2021 at 2:56 PM Skip Montanaro
wrote:
>
I downloaded US hospital ICU capacity data this morning from this page:
https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility
(The download link is about halfway down the page.)
Trying to read it using my personal CSV tools without specifying an
encoding,