RE: printing UTF-8 encoded MARC records with as_usmarc

2012-08-15 Thread Doran, Michael D
Hi Devon, > I just recently came across this presentation which lays out pretty much > all the issues with Unicode in perl, and makes some recommendations for > best practices. While Nick Patch's presentation is excellent, I'm not sure that it "lays out pretty much all the issues with Unicode in

Re: File open head scratcher UPDATE

2012-03-17 Thread Doran, Michael D
;Brad Baxter" wrote: > On Sat, Mar 17, 2012 at 5:25 PM, Doran, Michael D wrote: >> It looks like the read pointer was going to the beginning of the file on >> Solaris, but the end of the file on Linux. I've edited the script to do >> separate opens for when I need t

RE: File open head scratcher

2012-03-17 Thread Doran, Michael D
eni...@gmail.com [mailto:deni...@gmail.com] On Behalf Of Dan Scott > Sent: Saturday, March 17, 2012 4:19 PM > To: Doran, Michael D > Cc: perl4lib > Subject: Re: File open head scratcher > > On Sat, Mar 17, 2012 at 3:09 PM, Doran, Michael D wrote: > > I am migrating  a perl scrip

RE: File open head scratcher UPDATE

2012-03-17 Thread Doran, Michael D
l > -Original Message- > From: Doran, Michael D [mailto:do...@uta.edu] > Sent: Saturday, March 17, 2012 2:09 PM > To: perl4lib > Subject: File open head scratcher > > I am migrating a perl script from a server running perl v5.8.5 on Solaris > 9 to a server

File open head scratcher

2012-03-17 Thread Doran, Michael D
I am migrating a perl script from a server running perl v5.8.5 on Solaris 9 to a server running perl v5.12.2 on Redhat Linux 5.5. The new environment doesn't seem to like the syntax I'm using to open a file, and I'm scratching my head over why that is the case. That part that is not working a

RE: Anyone create MFHD records using MARC/Perl

2011-09-12 Thread Doran, Michael D
Hi Mark, Over the years, I've done a few projects that involved manipulation of, and/or creating MARC holdings (MFHD) records using the Perl MARC::Record module. No problems that I know of. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office

RE: marcdump hex switch

2011-05-18 Thread Doran, Michael D
I never got an answer to this back in 2008 and thought I might have better luck now... -- Michael > -Original Message- > From: Doran, Michael D > Sent: Thursday, February 21, 2008 11:03 AM > To: perl4lib@perl.org > Subject: marcdump hex switch > > I have MARC::Re

RE: Invalid UTF-8 characters causing MARC::Record crash.

2011-05-18 Thread Doran, Michael D
Hi Al, > For me I've found the best solution is to leave Encode.pm alone > and redefine the offending subroutine within my processing script. This was timely help for me, too, due to problems with fatal errors when processing a large file of bibs with MARC::Record. Thanks! (Although, when I ch

RE: MARC blob to MARC::Record object

2011-01-10 Thread Doran, Michael D
Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ > -Original Message- > From: Leif Andersson [mailto:leif.anders...@sub.su.se] > Sent: Monday, January 10, 2011 8:35 AM > To

RE: MARC blob to MARC::Record object

2011-01-07 Thread Doran, Michael D
# do...@uta.edu # http://rocky.uta.edu/doran/ > -Original Message- > From: Leif Andersson [mailto:leif.anders...@sub.su.se] > Sent: Friday, January 07, 2011 7:50 AM > To: Doran, Michael D; perl4lib > Subject: Re: MARC blob to MARC::Record object > > Hi Michael, >

RE: MARC blob to MARC::Record object

2011-01-06 Thread Doran, Michael D
> -Original Message- > From: Gorman, Jon [mailto:jtgor...@illinois.edu] > Sent: Thursday, January 06, 2011 6:19 PM > To: Doran, Michael D; perl4lib > Subject: RE: MARC blob to MARC::Record object > > > > > How do I make the MARC blob into a MARC::Record obj

MARC blob to MARC::Record object

2011-01-06 Thread Doran, Michael D
I am working on a Perl script that retrieves data from our Voyager ILS via an SQL query. Among other data, I have MARC records in blob form, and the script processes one MARC record at a time. I want to be able to parse and modify/convert the MARC record (using MARC::Record) before writing/pri

RE: Regular Expression for non-Roman characters

2008-09-25 Thread Doran, Michael D
Hi Jane, In a MARC-8 character set environment, I would assume that the key to detecting non-Latin characters would be the presence of an escape sequence to indicate a switch to an alternate character set (e.g. Arabic, Greek, Cyrillic, etc) [1]. Everything from that point on would be non-Latin

RE: Biblio::Isis and character encoding

2008-07-14 Thread Doran, Michael D
Hi Emmanuel, > I'm trying to convert an ISIS database to MARC21 What is the character set encoding of the data in the ISIS database? What is the desired character set encoding for the MARC21 records? I.e. MARC-8 or MARC Unicode(UTF-8)? If they are dissimilar character encodings, is the data un

RE: Problem installing MARC::Record 2.0.0 under perl 5.8.0

2008-07-08 Thread Doran, Michael D
Hi Chris, > I'll try that version. I sure hope you meant upgrading to Perl 5.8.2 (or higher) rather than downgrading to MARC::Record 1.39_02. ;-) This is just my un-asked for 2 cents, but I wouldn't stint on anything that will make the processing of Unicode-encoded text easier. Last December

RE: Stripping out Unicode combining characters (diacritics) -

2008-05-07 Thread Doran, Michael D
l the help. :-) -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Doran, Michael D [mailto:[EMAIL PROTECTED] > Sent: Monda

RE: Stripping out Unicode combining characters (diacritics)

2008-05-06 Thread Doran, Michael D
TECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Leif Andersson [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 06, 2008 3:33 AM > To: Doran, Michael D > Subject: Re: Stripping out Unicode combining characters (diacritics) > > Oh, now I see your REAL

RE: Importing Perl package variables into a Perl script with "require"

2008-05-05 Thread Doran, Michael D
# [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ -Original Message- From: Mike Rylander [mailto:[EMAIL PROTECTED] Sent: Mon 5/5/2008 8:57 PM To: Doran, Michael D Cc: Perl4lib Subject: Re: Importing Perl package variables into a Perl script with "require" On Fri, Apr 25, 2008

RE: Stripping out Unicode combining characters (diacritics)

2008-05-05 Thread Doran, Michael D
Mon 5/5/2008 8:52 PM To: Doran, Michael D Cc: [EMAIL PROTECTED]; Perl4lib Subject: Re: Stripping out Unicode combining characters (diacritics) On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <[EMAIL PROTECTED]> wrote: [snip] > > I'm pulling my hair out on this... so an

Stripping out Unicode combining characters (diacritics)

2008-05-05 Thread Doran, Michael D
I'm trying to strip out combining diacritics from some form input using this code: #!/usr/local/bin/perl use CGI; $query = CGI::new(); $search_term = $query->param('text'); $sans_diacritics = $search_term; $sans_diacritics =~ s/\p{M}*//g; #$sans_diacritics =~ s/o//g;

RE: Importing Perl package variables into a Perl script with "require"

2008-04-27 Thread Doran, Michael D
as at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Leif Andersson [mailto:[EMAIL PROTECTED] > Sent: Sunday, April 27, 2008 3:20 PM > To: Doran, Michael D; Perl4lib > Subject: Re: Im

Importing Perl package variables into a Perl script with "require"

2008-04-25 Thread Doran, Michael D
Back-story: I have a Perl CGI program. The CGI program needs to utilize variables in one of several separate configuration files (packages). The different packages all contain the same variables, but with different values for those variables. Each package represents a different language for

RE: Help for utf-8 output - followup on Record Length

2008-03-03 Thread Doran, Michael D
on MARC::Record. http://search.cpan.org/src/MIKERY/MARC-Record-2.0.0/Changes # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Doran, Mi

RE: Help for utf-8 output

2008-03-03 Thread Doran, Michael D
17-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Leif Andersson [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 01, 2008 2:51 PM > To: Doran, Michael D; perl4lib@perl.org; [EMAIL PROTECTED] > Subject: Re: Help for utf-8 out

RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
ington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Brian Sheppard [mailto:[EMAIL PROTECTED] > Sent: Thursday, February 21, 2008 1:00 PM > To: Doran, Michael D > Cc: perl4lib@perl.org > Subjec

RE: Help for utf-8 output

2008-02-21 Thread Doran, Michael D
Hi Jackie, I'm working on a very similar problem... converting theses/dissertations records (in XML) to MARC records. I'm still in the testing stage, but have had similar problems with records with diacritics in the 100 or 245 fields (however diacritics in a 520a field don't seem to cause any

marcdump hex switch

2008-02-21 Thread Doran, Michael D
I have MARC::Record 2.0 installed [1]. According to the Changes file marcdump now has a "--hex" switch [2]: [ENHANCEMENTS] - Added --hex switch to marcdump, which dumps the record in hexadecimal. The offsets are in decimal so that you can match them up to values in the leader. The

RE: MARC::File::XML and parsing.

2007-09-27 Thread Doran, Michael D
Hi Henri, > Is there a reason why MARC::File::XML considers only a very > strict subset of utf-8 as valid ? I would guess that it has to do with adhering to the MARC-21 repertoire of characters, so as to facilitate the round-trip conversion between the MARC-8 and Unicode character sets [1,2].

RE: MARC::Charset 'utf8_to_marc8'

2007-09-18 Thread Doran, Michael D
Hi Laurence, > I'm trying to create MARC records from serials data exported > from SFX, using MARC::Charset version 0.98 to convert UTF-8 > strings to MARC-8. It seems to be failing on extended latin > characters like U+00C5 CAPITAL LETTER A WITH RING ABOVE The encoding, U+00C5 (CAPITAL LETTE

RE: MARC::Charset question

2007-05-18 Thread Doran, Michael D
817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Doran, Michael D > Sent: Friday, May 18, 2007 1:17 PM > To: perl4lib@perl.org > Subject: RE: MARC::Charset question > > Hi Michael, > > > An example is t

RE: MARC::Charset question

2007-05-18 Thread Doran, Michael D
Hi Michael, > An example is the author (personal name) of the book that can > be found at http://catalog.loc.gov/ by searching for ISBN > 5040039875 (I'm guessing the fact that the website appears to > be displaying a corrupted name may be part of the problem here). The Library of Congress cat

RE: Working around a UTF8/Unicode encoding problem

2007-05-15 Thread Doran, Michael D
> > I can also see that this record is broken because the XML entity > > ' is in a MARC communications format file. > > The character entity ' *is valid* in a MARC-XML file. > It is one of the few standard character entities allowed in > an XML file, e.g., &, <, >, and '. A recent MARC Proposa

Character set tests [was MARC::Charset]

2007-03-14 Thread Doran, Michael D
exas at Arlington # 817-272-5326 office # 817-688-1926 mobile # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Ashley Sanders [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 14, 2007 10:52 AM > To: Doran, Michael D > Cc: perl4lib > Subject

RE: MARC::Charset

2007-03-14 Thread Doran, Michael D
Hi Ashley, > I think 〹 is now legal in MARC-8 now to indicate a > Unicode character that isn't in the MARC-8 repertoire. Yes, that's also my understanding [1,2], though I've not personally come across any records yet that use that method. (Although not being a cataloger, I don't routinely exa

RE: MARC::Charset

2007-03-14 Thread Doran, Michael D
du/doran/ > -Original Message- > From: Henri-Damien LAURENT [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 14, 2007 4:18 AM > To: Doran, Michael D; perl4lib > Subject: Re: MARC::Charset > > Doran, Michael D a écrit : > > Hi Henri, > > > > Althou

RE: MARC::Charset

2007-03-13 Thread Doran, Michael D
Hi Henri, > MARC::Charset ... fails on each µ character. > ad Scripturµ sensum Although in my email client, the character in question appears as a MICRO SIGN ("µ"), I am assuming that it is actually meant to be a LOWERCASE DIGRAPH AE ("æ") since that is consistent with the Latin vernacular tex

RE: MARC Records, XML, and encoding

2006-05-18 Thread Doran, Michael D
> So I took a look at that position in the marc record and > found a 0x9C character at that position, as the error > message indicates. I can't find a 0x9C in either of the > mapping tables that this record purports to use: 0x9C is a C1 control character that is generally assigned the function of

RE: Z39.50 Module

2005-12-09 Thread Doran, Michael D
Hi Jane, If you don't get an answer on this list, you might consider a posting to the Net-z3950 list [1]. -- Michael [1] Net-z3950 mailing list [EMAIL PROTECTED] http://www.indexdata.dk/mailman/listinfo/net-z3950 # Michael Doran, Systems Librarian # University of Texas at Arlington # 81

RE: MARC-8 to UTF-8 conversion

2005-12-05 Thread Doran, Michael D
ECTED] On > Behalf Of Ed Summers > Sent: Monday, December 05, 2005 12:14 PM > To: perl4lib@perl.org > Subject: Re: MARC-8 to UTF-8 conversion > > On 12/5/05, Doran, Michael D <[EMAIL PROTECTED]> wrote: > > So... this is all very interesting (and I've definitely lea

RE: MARC-8 to UTF-8 conversion

2005-12-05 Thread Doran, Michael D
Ed, ED > I don't really understand why Perl 5.8.7 lacked DB_File since ED > Module::CoreList [...] reports it being standard sine 5.00307. Perhaps ED > this is some sort of emasculated version that ships with Solaris :-) Nope, I wasn't using a "Perl lite" version. ;-) Although Solaris now comes

RE: MARC-8 to UTF-8 conversion

2005-12-05 Thread Doran, Michael D
Hi Ed, > -Original Message- > From: Edward Summers [mailto:[EMAIL PROTECTED] > Sent: Monday, December 05, 2005 6:14 AM > To: perl4lib > Subject: Re: MARC-8 to UTF-8 conversion > > On Dec 2, 2005, at 9:01 AM, Doran, Michael D wrote: > > > Installing the MAR

RE: MARC-8 to UTF-8 conversion

2005-12-02 Thread Doran, Michael D
Hi Stefano, Installing the MARC::Charset module can be a bit problematic for the casual Perl user, due to the prerequisites. However if you need to do a MARC-8 to UTF-8 conversion, that's probably the best tool available. The issue with MARC-8 conversions is that MARC-8 is only really used for e

RE: yet another character encoding question

2005-09-29 Thread Doran, Michael D
Hi Jason, I believe that MARC::Charset only does MARC-8 to UTF-8 conversion and vice versa, so won't be a solution for automating your Latin-1 to MARC-8 conversion, unless you were planning to do Latin-1=>UTF-8=>MARC-8. A few years ago, I wrote an imperfect MARC-8 to Latin-1 character set co

RE: who help me process BIG5?

2005-08-18 Thread Doran, Michael D
> I have some Excel files by big5 charset. > I fetch some column from it and save to usmarc file by using > MARC::Record, but I get nothing. What is "nothing"? Does that mean no MARC records were created? Or that they are empty? Or couldn't be imported and/or read in an integrated library sy

RE: new books list

2005-05-27 Thread Doran, Michael D
Hi Kindra, > I'm attaching a new books script that my predecessor created. I didn't see an attached script, but I'll take a shot anyway... > I'm trying to get this to run, but I'm coming up with errors. > I'm almost sure it is because of the upgrade to Unicode (we're > with Endeavor) Typicall

RE: installing perl 5.8.6

2005-05-19 Thread Doran, Michael D
red/perl/5.8.5-09/.cpan/build/DBD-Oracle-1.15/blib/lib/ > DBD/Oracle.pm > > Searching for Oracle versions on this system... > /oracle/app/oracle/product has these versions... > 9.2.0 9.2.0.3 > > > Does this mean that I have to chan

RE: installing perl 5.8.6

2005-05-19 Thread Doran, Michael D
Hi Kindra, Although you are only aware of the one Perl installation on your server, there are probably at least two others. A new Sun server from Endeavor will have Solaris 9, which comes with Perl 5.005 and Perl 5.6. In addition, as part of the Voyager software installation, you will have Perl

RE: LC call number sorting utilities

2005-03-29 Thread Doran, Michael D
.edu/doran/ > -Original Message- > From: Bryan Baldus [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 29, 2005 5:22 PM > To: Doran, Michael D; perl4lib@perl.org > Subject: RE: LC call number sorting utilities > > On Sunday, March 27, 2005 7:09 PM, Michael Doran wrote: > >I rec

LC call number sorting utilities

2005-03-27 Thread Doran, Michael D
I recently converted a Library of Congress (LC) call number normalization routine (that I had written for a shelf list application) into a couple of Perl LC call number sorting utilities. sortLC.pl is a standalone application. Usage is: sortLC.pl < call_number_file - or

listserv vs. Google Group

2005-03-23 Thread Doran, Michael D
I'm not sure that everybody who subscribes to this listserv is aware that perl4lib listserv postings end up in the perl.perl4lib Google Group. I know that I was a bit surprised to find that out. Although serving a similar purpose, I make a distinction between listservs and news groups. The main

RE: MARC::Record and UTF-8 & related threads

2005-03-07 Thread Doran, Michael D
Hi Ed, > How would people feel about the next version of MARC-Record (perhaps > a v2.0) which handled utf8 properly and required a modern perl? Definitely a *good* thing. Worth upgrading Perl version for, if necessary. > Perhaps if people could respond to the list (or me if you prefer) with >

dope.sh - a shell script for discovery of Oracle-Perl environment

2005-01-12 Thread Doran, Michael D
dope.sh is a shell script that facilitates discovery of the Oracle-Perl environment on a Unix (Solaris) system [1]. I distribute an open-source Perl application that incorporates a DBI/DBD::Oracle connection. The users that implement the application generally (but not always) have the requisite D

RE: Ignoring Diacritics accessing Fixed Field Data

2005-01-11 Thread Doran, Michael D
$ME =~ s/[\xE0-\xFE]//g; $TITLE =~ s/[\xE0-\xFE]//g; Sorry, -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -----Original Message- > From: Dor

RE: Ignoring Diacritics accessing Fixed Field Data

2005-01-11 Thread Doran, Michael D
Hi Jane, These answers assume that the data you are processing: 1) is encoded in the MARC-8 character set, and 2) consists of the MARC-8 default basic and extended Latin characters. > Dave,Ayod\2003 > Paòt,Kaâs\2002 > Baks,Dasa\2003 > ,Viâs\2002 > > Problem 1: As you can see, I don't really want

RE: Documentation_about_'Unix_for_librarians'

2005-01-09 Thread Doran, Michael D
Hi Carlos, > I am writing you for the following: the next month I'll be giving a > training course called "UNIX for librarians". ... Sadly there's no > material available in Spanish about this topic. I'm guessing that you won't find much material (in any language) on the topic of "UNIX for libra

RE: MARC::Record and UTF-8

2005-01-07 Thread Doran, Michael D
> ...the ILS can be upgraded to a new version and and > people can start using Unicode, not only for Western > European languages, but also for languages like Thai. This is not really apropos to the discussion at hand, but since Thai was mentioned I thought I would contribute my two cents on an i

RE: inserting diacrtics

2005-01-05 Thread Doran, Michael D
at.net/ABM.html > > > -Original Message- > > From: Ed Summers [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, January 05, 2005 2:42 PM > > To: perl4lib@perl.org > > Subject: Re: inserting diacrtics > > > > On Wed, Jan 05, 2005 at 01:22:54

RE: inserting diacrtics

2005-01-05 Thread Doran, Michael D
t;new( '710', '2', '', a => 'Biblioth'.$acute.'eque nationale de france.' ); -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.u

RE: inserting diacrtics

2005-01-05 Thread Doran, Michael D
nded Latin as either GO or G1. -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 cell # [EMAIL PROTECTED] # http://rocky.uta.edu/doran/ > -Original Message- > From: Doran, Michael D > Sent: Wednesday, Janua

RE: inserting diacrtics

2005-01-05 Thread Doran, Michael D
> You need to escape to ExtendedLatin, add the combining acute, escape back to > BasicLatin, and then put the 'e'. Or in code: Extended Latin (as G1) is part of the MARC-8 default character set and shouldn't require any escape sequences [1]. I think all Jackie needs to do is add the combining gr

RE: Character sets - kind of solved?

2004-12-06 Thread Doran, Michael D
tarball contains a patched XML.pm and SAX.pm. Replace > your current MARC/File/XML.pm and MARC/File/SAX.pm with those and you > should be good to go. I've also included the scripts I used to test > and one of my old MARC8 encoded records. http://redlightgreen.com > confirm

RE: Character sets - kind of solved?

2004-12-03 Thread Doran, Michael D
First off, Ashley's suggestion that the original encoding was likely MARC-8 is correct. The author's Arabic name, transliterated into the Latin alphabet, should be "Bis{latin small letter a with macron}{latin small letter t with dot below}{latin small letter i with macron}, Mu{latin small letter h