RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
Hi Brian, Thanks for your response. > I'd suggest you first make sure your XML is really UTF-8 I believe it is. I used a hex editor to look at the XML source file and the character in question (the "Registered Sign") is encoded as hex "c2 ae" which is the proper UTF-8 encoding for that charac

Re: Help for utf-8 output

2008-02-21 Thread Galen Charlton
Hi Jackie, On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie <[EMAIL PROTECTED]> wrote: > What I have is an Excel spreadsheet for dissertations which I have saved as > a tab delimited file (examining the file in TextPad, the diacritics appears > to be fine), then read in and output the file as a utf

Re: Help for utf-8 output

2008-02-21 Thread Brian Sheppard
I'd suggest you first make sure your XML is really UTF-8, using JHOVE: /path/to/jhove/jhove -c /path/to/jhove/conf/jhove.conf -m utf8-hul myFile.xml If it fails you could convert to utf8, on the (perhaps unwarranted) assumption it's windows latin1: iconv -c -f windows-1252 -t UTF-8 m

RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
Hi Jackie, I'm working on a very similar problem... converting theses/dissertations records (in XML) to MARC records. I'm still in the testing stage, but have had similar problems with records with diacritics in the 100 or 245 fields (however diacritics in a 520a field don't seem to cause any

marcdump hex switch

2008-02-21 Thread Doran, Michael D
I have MARC::Record 2.0 installed [1]. According to the Changes file marcdump now has a "--hex" switch [2]: [ENHANCEMENTS] - Added --hex switch to marcdump, which dumps the record in hexadecimal. The offsets are in decimal so that you can match them up to values in the leader. The