Hi Devon,
> I just recently came across this presentation which lays out pretty much
> all the issues with Unicode in perl, and makes some recommendations for
> best practices.
While Nick Patch's presentation is excellent, I'm not sure that it "lays out
pretty much all the issues with Unicode in
;Brad Baxter" wrote:
> On Sat, Mar 17, 2012 at 5:25 PM, Doran, Michael D wrote:
>> It looks like the read pointer was going to the beginning of the file on
>> Solaris, but the end of the file on Linux. I've edited the script to do
>> separate opens for when I need t
eni...@gmail.com [mailto:deni...@gmail.com] On Behalf Of Dan Scott
> Sent: Saturday, March 17, 2012 4:19 PM
> To: Doran, Michael D
> Cc: perl4lib
> Subject: Re: File open head scratcher
>
> On Sat, Mar 17, 2012 at 3:09 PM, Doran, Michael D wrote:
> > I am migrating a perl scrip
l
> -Original Message-
> From: Doran, Michael D [mailto:do...@uta.edu]
> Sent: Saturday, March 17, 2012 2:09 PM
> To: perl4lib
> Subject: File open head scratcher
>
> I am migrating a perl script from a server running perl v5.8.5 on Solaris
> 9 to a server
I am migrating a perl script from a server running perl v5.8.5 on Solaris 9 to
a server running perl v5.12.2 on Redhat Linux 5.5. The new environment doesn't
seem to like the syntax I'm using to open a file, and I'm scratching my head
over why that is the case.
That part that is not working a
Hi Mark,
Over the years, I've done a few projects that involved manipulation of, and/or
creating MARC holdings (MFHD) records using the Perl MARC::Record module. No
problems that I know of.
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
I never got an answer to this back in 2008 and thought I might have better luck
now...
-- Michael
> -Original Message-
> From: Doran, Michael D
> Sent: Thursday, February 21, 2008 11:03 AM
> To: perl4lib@perl.org
> Subject: marcdump hex switch
>
> I have MARC::Re
Hi Al,
> For me I've found the best solution is to leave Encode.pm alone
> and redefine the offending subroutine within my processing script.
This was timely help for me, too, due to problems with fatal errors when
processing a large file of bibs with MARC::Record. Thanks!
(Although, when I ch
Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:leif.anders...@sub.su.se]
> Sent: Monday, January 10, 2011 8:35 AM
> To
# do...@uta.edu
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:leif.anders...@sub.su.se]
> Sent: Friday, January 07, 2011 7:50 AM
> To: Doran, Michael D; perl4lib
> Subject: Re: MARC blob to MARC::Record object
>
> Hi Michael,
>
> -Original Message-
> From: Gorman, Jon [mailto:jtgor...@illinois.edu]
> Sent: Thursday, January 06, 2011 6:19 PM
> To: Doran, Michael D; perl4lib
> Subject: RE: MARC blob to MARC::Record object
>
>
>
> > How do I make the MARC blob into a MARC::Record obj
I am working on a Perl script that retrieves data from our Voyager ILS via an
SQL query. Among other data, I have MARC records in blob form, and the script
processes one MARC record at a time. I want to be able to parse and
modify/convert the MARC record (using MARC::Record) before writing/pri
Hi Jane,
In a MARC-8 character set environment, I would assume that the key to detecting
non-Latin characters would be the presence of an escape sequence to indicate a
switch to an alternate character set (e.g. Arabic, Greek, Cyrillic, etc) [1].
Everything from that point on would be non-Latin
Hi Emmanuel,
> I'm trying to convert an ISIS database to MARC21
What is the character set encoding of the data in the ISIS database?
What is the desired character set encoding for the MARC21 records? I.e. MARC-8
or MARC Unicode(UTF-8)?
If they are dissimilar character encodings, is the data un
Hi Chris,
> I'll try that version.
I sure hope you meant upgrading to Perl 5.8.2 (or higher) rather than
downgrading to MARC::Record 1.39_02. ;-)
This is just my un-asked for 2 cents, but I wouldn't stint on anything that
will make the processing of Unicode-encoded text easier. Last December
l the help. :-)
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Doran, Michael D [mailto:[EMAIL PROTECTED]
> Sent: Monda
TECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 06, 2008 3:33 AM
> To: Doran, Michael D
> Subject: Re: Stripping out Unicode combining characters (diacritics)
>
> Oh, now I see your REAL
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
-Original Message-
From: Mike Rylander [mailto:[EMAIL PROTECTED]
Sent: Mon 5/5/2008 8:57 PM
To: Doran, Michael D
Cc: Perl4lib
Subject: Re: Importing Perl package variables into a Perl script with "require"
On Fri, Apr 25, 2008
Mon 5/5/2008 8:52 PM
To: Doran, Michael D
Cc: [EMAIL PROTECTED]; Perl4lib
Subject: Re: Stripping out Unicode combining characters (diacritics)
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <[EMAIL PROTECTED]> wrote:
[snip]
>
> I'm pulling my hair out on this... so an
I'm trying to strip out combining diacritics from some form input using this
code:
#!/usr/local/bin/perl
use CGI;
$query = CGI::new();
$search_term = $query->param('text');
$sans_diacritics = $search_term;
$sans_diacritics =~ s/\p{M}*//g;
#$sans_diacritics =~ s/o//g;
as at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:[EMAIL PROTECTED]
> Sent: Sunday, April 27, 2008 3:20 PM
> To: Doran, Michael D; Perl4lib
> Subject: Re: Im
Back-story:
I have a Perl CGI program. The CGI program needs to utilize variables in one
of several separate configuration files (packages). The different packages all
contain the same variables, but with different values for those variables.
Each package represents a different language for
on MARC::Record.
http://search.cpan.org/src/MIKERY/MARC-Record-2.0.0/Changes
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Doran, Mi
17-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:[EMAIL PROTECTED]
> Sent: Saturday, March 01, 2008 2:51 PM
> To: Doran, Michael D; perl4lib@perl.org; [EMAIL PROTECTED]
> Subject: Re: Help for utf-8 out
ington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Brian Sheppard [mailto:[EMAIL PROTECTED]
> Sent: Thursday, February 21, 2008 1:00 PM
> To: Doran, Michael D
> Cc: perl4lib@perl.org
> Subjec
Hi Jackie,
I'm working on a very similar problem... converting theses/dissertations
records (in XML) to MARC records. I'm still in the testing stage, but have had
similar problems with records with diacritics in the 100 or 245 fields (however
diacritics in a 520a field don't seem to cause any
I have MARC::Record 2.0 installed [1]. According to the Changes file marcdump
now has a "--hex" switch [2]:
[ENHANCEMENTS]
- Added --hex switch to marcdump, which dumps the record in
hexadecimal. The offsets are in decimal so that you can match
them up to values in the leader. The
Hi Henri,
> Is there a reason why MARC::File::XML considers only a very
> strict subset of utf-8 as valid ?
I would guess that it has to do with adhering to the MARC-21 repertoire of
characters, so as to facilitate the round-trip conversion between the MARC-8
and Unicode character sets [1,2].
Hi Laurence,
> I'm trying to create MARC records from serials data exported
> from SFX, using MARC::Charset version 0.98 to convert UTF-8
> strings to MARC-8. It seems to be failing on extended latin
> characters like U+00C5 CAPITAL LETTER A WITH RING ABOVE
The encoding, U+00C5 (CAPITAL LETTE
817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Doran, Michael D
> Sent: Friday, May 18, 2007 1:17 PM
> To: perl4lib@perl.org
> Subject: RE: MARC::Charset question
>
> Hi Michael,
>
> > An example is t
Hi Michael,
> An example is the author (personal name) of the book that can
> be found at http://catalog.loc.gov/ by searching for ISBN
> 5040039875 (I'm guessing the fact that the website appears to
> be displaying a corrupted name may be part of the problem here).
The Library of Congress cat
> > I can also see that this record is broken because the XML entity
> > ' is in a MARC communications format file.
>
> The character entity ' *is valid* in a MARC-XML file.
> It is one of the few standard character entities allowed in
> an XML file, e.g., &, <, >, and '.
A recent MARC Proposa
exas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Ashley Sanders [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 14, 2007 10:52 AM
> To: Doran, Michael D
> Cc: perl4lib
> Subject
Hi Ashley,
> I think 〹 is now legal in MARC-8 now to indicate a
> Unicode character that isn't in the MARC-8 repertoire.
Yes, that's also my understanding [1,2], though I've not personally come across
any records yet that use that method. (Although not being a cataloger, I don't
routinely exa
du/doran/
> -Original Message-
> From: Henri-Damien LAURENT [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 14, 2007 4:18 AM
> To: Doran, Michael D; perl4lib
> Subject: Re: MARC::Charset
>
> Doran, Michael D a écrit :
> > Hi Henri,
> >
> > Althou
Hi Henri,
> MARC::Charset ... fails on each µ character.
> ad Scripturµ sensum
Although in my email client, the character in question appears as a MICRO SIGN
("µ"), I am assuming that it is actually meant to be a LOWERCASE DIGRAPH AE
("æ") since that is consistent with the Latin vernacular tex
> So I took a look at that position in the marc record and
> found a 0x9C character at that position, as the error
> message indicates. I can't find a 0x9C in either of the
> mapping tables that this record purports to use:
0x9C is a C1 control character that is generally assigned the function
of
Hi Jane,
If you don't get an answer on this list, you might consider a posting to
the Net-z3950 list [1].
-- Michael
[1] Net-z3950 mailing list
[EMAIL PROTECTED]
http://www.indexdata.dk/mailman/listinfo/net-z3950
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 81
ECTED] On
> Behalf Of Ed Summers
> Sent: Monday, December 05, 2005 12:14 PM
> To: perl4lib@perl.org
> Subject: Re: MARC-8 to UTF-8 conversion
>
> On 12/5/05, Doran, Michael D <[EMAIL PROTECTED]> wrote:
> > So... this is all very interesting (and I've definitely lea
Ed,
ED > I don't really understand why Perl 5.8.7 lacked DB_File since
ED > Module::CoreList [...] reports it being standard sine 5.00307.
Perhaps
ED > this is some sort of emasculated version that ships with Solaris
:-)
Nope, I wasn't using a "Perl lite" version. ;-)
Although Solaris now comes
Hi Ed,
> -Original Message-
> From: Edward Summers [mailto:[EMAIL PROTECTED]
> Sent: Monday, December 05, 2005 6:14 AM
> To: perl4lib
> Subject: Re: MARC-8 to UTF-8 conversion
>
> On Dec 2, 2005, at 9:01 AM, Doran, Michael D wrote:
>
> > Installing the MAR
Hi Stefano,
Installing the MARC::Charset module can be a bit problematic for the
casual Perl user, due to the prerequisites. However if you need to do a
MARC-8 to UTF-8 conversion, that's probably the best tool available.
The issue with MARC-8 conversions is that MARC-8 is only really used for
e
Hi Jason,
I believe that MARC::Charset only does MARC-8 to UTF-8 conversion and vice
versa, so won't be a solution for automating your Latin-1 to MARC-8 conversion,
unless you were planning to do Latin-1=>UTF-8=>MARC-8.
A few years ago, I wrote an imperfect MARC-8 to Latin-1 character set
co
> I have some Excel files by big5 charset.
> I fetch some column from it and save to usmarc file by using
> MARC::Record, but I get nothing.
What is "nothing"? Does that mean no MARC records were created? Or
that they are empty? Or couldn't be imported and/or read in an
integrated library sy
Hi Kindra,
> I'm attaching a new books script that my predecessor created.
I didn't see an attached script, but I'll take a shot anyway...
> I'm trying to get this to run, but I'm coming up with errors.
> I'm almost sure it is because of the upgrade to Unicode (we're
> with Endeavor)
Typicall
red/perl/5.8.5-09/.cpan/build/DBD-Oracle-1.15/blib/lib/
> DBD/Oracle.pm
>
> Searching for Oracle versions on this system...
> /oracle/app/oracle/product has these versions...
> 9.2.0 9.2.0.3
>
>
> Does this mean that I have to chan
Hi Kindra,
Although you are only aware of the one Perl installation on your server,
there are probably at least two others. A new Sun server from Endeavor
will have Solaris 9, which comes with Perl 5.005 and Perl 5.6. In
addition, as part of the Voyager software installation, you will have
Perl
.edu/doran/
> -Original Message-
> From: Bryan Baldus [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 29, 2005 5:22 PM
> To: Doran, Michael D; perl4lib@perl.org
> Subject: RE: LC call number sorting utilities
>
> On Sunday, March 27, 2005 7:09 PM, Michael Doran wrote:
> >I rec
I recently converted a Library of Congress (LC) call number
normalization routine (that I had written for a shelf list application)
into a couple of Perl LC call number sorting utilities.
sortLC.pl is a standalone application. Usage is:
sortLC.pl < call_number_file
- or
I'm not sure that everybody who subscribes to this listserv is aware
that perl4lib listserv postings end up in the perl.perl4lib Google
Group. I know that I was a bit surprised to find that out.
Although serving a similar purpose, I make a distinction between
listservs and news groups. The main
Hi Ed,
> How would people feel about the next version of MARC-Record (perhaps
> a v2.0) which handled utf8 properly and required a modern perl?
Definitely a *good* thing. Worth upgrading Perl version for, if
necessary.
> Perhaps if people could respond to the list (or me if you prefer) with
>
dope.sh is a shell script that facilitates discovery of the Oracle-Perl
environment on a Unix (Solaris) system [1]. I distribute an open-source
Perl application that incorporates a DBI/DBD::Oracle connection. The
users that implement the application generally (but not always) have the
requisite D
$ME =~ s/[\xE0-\xFE]//g;
$TITLE =~ s/[\xE0-\xFE]//g;
Sorry,
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -----Original Message-
> From: Dor
Hi Jane,
These answers assume that the data you are processing:
1) is encoded in the MARC-8 character set, and
2) consists of the MARC-8 default basic and extended Latin characters.
> Dave,Ayod\2003
> Paòt,Kaâs\2002
> Baks,Dasa\2003
> ,Viâs\2002
>
> Problem 1: As you can see, I don't really want
Hi Carlos,
> I am writing you for the following: the next month I'll be giving a
> training course called "UNIX for librarians". ... Sadly there's no
> material available in Spanish about this topic.
I'm guessing that you won't find much material (in any language) on the topic
of "UNIX for libra
> ...the ILS can be upgraded to a new version and and
> people can start using Unicode, not only for Western
> European languages, but also for languages like Thai.
This is not really apropos to the discussion at hand, but since Thai was
mentioned I thought I would contribute my two cents on an i
at.net/ABM.html
>
> > -Original Message-
> > From: Ed Summers [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, January 05, 2005 2:42 PM
> > To: perl4lib@perl.org
> > Subject: Re: inserting diacrtics
> >
> > On Wed, Jan 05, 2005 at 01:22:54
t;new( '710', '2', '',
a => 'Biblioth'.$acute.'eque nationale de france.' );
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.u
nded Latin as either GO or G1.
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 cell
# [EMAIL PROTECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Doran, Michael D
> Sent: Wednesday, Janua
> You need to escape to ExtendedLatin, add the combining acute, escape
back to
> BasicLatin, and then put the 'e'. Or in code:
Extended Latin (as G1) is part of the MARC-8 default character set and
shouldn't require any escape sequences [1]. I think all Jackie needs to
do is add the combining gr
tarball contains a patched XML.pm and SAX.pm. Replace
> your current MARC/File/XML.pm and MARC/File/SAX.pm with those and you
> should be good to go. I've also included the scripts I used to test
> and one of my old MARC8 encoded records. http://redlightgreen.com
> confirm
First off, Ashley's suggestion that the original encoding was likely
MARC-8 is correct. The author's Arabic name, transliterated into the
Latin alphabet, should be "Bis{latin small letter a with macron}{latin
small letter t with dot below}{latin small letter i with macron},
Mu{latin small letter h
62 matches
Mail list logo