Hi Brian,
Thanks for your response.
> I'd suggest you first make sure your XML is really UTF-8
I believe it is. I used a hex editor to look at the XML source file and the
character in question (the "Registered Sign") is encoded as hex "c2 ae" which
is the proper UTF-8 encoding for that charac
Hi Jackie,
On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie <[EMAIL PROTECTED]> wrote:
> What I have is an Excel spreadsheet for dissertations which I have saved as
> a tab delimited file (examining the file in TextPad, the diacritics appears
> to be fine), then read in and output the file as a utf
I'd suggest you first make sure your XML is really UTF-8, using JHOVE:
/path/to/jhove/jhove -c /path/to/jhove/conf/jhove.conf -m utf8-hul
myFile.xml
If it fails you could convert to utf8, on the (perhaps unwarranted)
assumption it's windows latin1:
iconv -c -f windows-1252 -t UTF-8 m
Hi Jackie,
I'm working on a very similar problem... converting theses/dissertations
records (in XML) to MARC records. I'm still in the testing stage, but have had
similar problems with records with diacritics in the 100 or 245 fields (however
diacritics in a 520a field don't seem to cause any
I have MARC::Record 2.0 installed [1]. According to the Changes file marcdump
now has a "--hex" switch [2]:
[ENHANCEMENTS]
- Added --hex switch to marcdump, which dumps the record in
hexadecimal. The offsets are in decimal so that you can match
them up to values in the leader. The