Re: UTF-16 or something else?

2021-02-09 Thread Skip Montanaro
> > It's UTF-8 with a UTF-16 BOM prepended, which is not uncommon when you > have a file that's been converted to UTF-8 from UTF-16 or has been > produced by shitty Microsoft software. You can tell instantly at a > glance that it's not UTF-16 because the ascii dump would l.o.o.k. > .l.i.k.e. .t.h.i

Re: UTF-16 or something else?

2021-02-09 Thread Skip Montanaro
> > Try setting encoding to: "utf-8-sig". > > 'eb bb bf' is the byte order mark for UTF8 (most systems do not include > this in UTF-8 encoded files) > > Python will correctly read UTF8 BOMs if you use the 'utf-8-sig' encoding > when reading files > Excellent, thanks. That worked like a charm. Know

Re: UTF-16 or something else?

2021-02-09 Thread Jon Ribbens via Python-list
On 2021-02-09, Skip Montanaro wrote: > I downloaded US hospital ICU capacity data this morning from this page: > > https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility > > (The download link is about halfway down the page.) > > Trying to read it using my p

Re: UTF-16 or something else?

2021-02-09 Thread Stestagg
Try setting encoding to: "utf-8-sig". 'eb bb bf' is the byte order mark for UTF8 (most systems do not include this in UTF-8 encoded files) Python will correctly read UTF8 BOMs if you use the 'utf-8-sig' encoding when reading files Steve On Tue, Feb 9, 2021 at 2:56 PM Skip Montanaro wrote: >

UTF-16 or something else?

2021-02-09 Thread Skip Montanaro
I downloaded US hospital ICU capacity data this morning from this page: https://healthdata.gov/dataset/covid-19-reported-patient-impact-and-hospital-capacity-facility (The download link is about halfway down the page.) Trying to read it using my personal CSV tools without specifying an encoding,