Re: [CODE4LIB] free source for issn-periodical-type data?

2012-04-17 Thread Michael Hopwood
Just a quick note: The correct URL for ONIX for Serials is http://www.editeur.org/17/ONIX-for-Serials/ - note that this is a family of standards, so it covers a very wide range of data types and content. The code lists Tom mentioned are available there in human-readable form. Also: it sounded

[CODE4LIB] Job: Archivist, Institute of Jazz Studies at Rutgers-Newark

2012-04-17 Thread jobs
RESPONSIBILITIES: The Rutgers University Libraries seek an experienced, innovative, and serviceoriented librarian to fill the position of Archivist in the Institute of Jazz Studies, John Cotton Dana Library onthe Newark Campus of Rutgers, The State University of New Jersey. Reporting to the

[CODE4LIB] Job: Senior Web Development and User Experience Technician, Discovery Systems at Queen's University

2012-04-17 Thread jobs
**Description and Duties:** Within the framework of established policies, regulations and procedures, in consultation with the Systems Coordinator, the Division Head of Discovery Systems and other Discovery Systems staff, the incumbent provides technical expertise and support for the

[CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
I know how char encodings work in MARC ISO binary -- the encoding can legally be either Marc8 or UTF8 (nothing else). The encoding of a record is specified in it's header. In the wild, specified encodings are frequently wrong, or data includes weird mixed encodings. Okay! But what's going on

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
There are probably a couple of answers to that. XML rules define what characterset is used. The encoding attribute on the ?xml? header is where you find out what characterset is being used. I've always gone under the assumption that if an encoding wasn't specified, then UTF-8 is in effect and

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Kyle Banerjee
What's the legal thing to do? What's actually found 'in the wild' with MarcXML? In some cases, invalid XML. In an ideal world, the encoding should be included in the declaration. But I wouldn't trust it. kyle -- -- Kyle Banerjee

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
So what if the ?xml? decleration says one charset encoding, but the MARC header included in the MarcXML says a different encoding... which one is the 'legal' one to believe? Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an entirely different charset that is legal in XML?

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
On 4/17/2012 1:57 PM, Kyle Banerjee wrote: In some cases, invalid XML. In an ideal world, the encoding should be included in the declaration. But I wouldn't trust it. kyle So would you use the Marc header payload instead? Or you're just saying you wouldn't trust _any_ encoding declerations

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
Okay, maybe here's another way to approach the question. If I want to have a MarcXML document encoded in Marc8 -- what should it look like? What should be in the XML decleration? What should be in the MARC header embedded in the XML? Or is it not in fact legal at all? If I want to have a

[CODE4LIB] Job: Director of Library Information Technology Production Services at University of Illinois at Urbana-Champaign

2012-04-17 Thread jobs
**Director of Library Information Technology Production Services** Academic Professional Position University of Illinois at Urbana-Champaign **Position Available**: This position is available July, 2012. This is a 100%-time, twelve-month appointment Academic Professional position.

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
If I want to have a MarcXML document encoded in Marc8 -- what should it look like? What should be in the XML decleration? What should be in the MARC header embedded in the XML? Or is it not in fact legal at all? I'm going out on a limb here, but I don't think it is legal. There is no

[CODE4LIB] Code4Lib West Registration Form: July 30, 2012

2012-04-17 Thread Reese, Terry
The University of Oregon Libraries and Oregon State University Libraries invite you to code4lib west, Monday, July 30, 2012, at the UO Knight Library. There is no registration fee for this conference. Registration is limited to 50 participants. All participants are expected to deliver a

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Doran, Michael D
Hi Ralph, But, ignoring the encoding, the original MarcXML rules were the same as the MARC-21 rules for character repertoire and you were suppose to restrict yourself to characters that could be mapped back into MARC-8. I don't know if that rule is still in force, but everyone ignores it.

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
Thanks, this is helpful feedback at least. I think it's completely irrelevant, when determining what is legal under standards, to talk about what certain Java tools happen to do though, I don't care too much what some tool you happen to use does. In this case, I'm _writing_ the tools. I want

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
Re: But do others agree that there is in fact no legal way to have Marc8 in MarcXML? No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog, and you will want to be aware that XML processors are only REQUIRED to process UTF-8 and UTF-16 -- in practice

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Houghton,Andrew
Jonathan Rochkind Sent: Tuesday, April 17, 2012 14:18 Subject: Re: [CODE4LIB] MarcXML and char encodings Okay, maybe here's another way to approach the question. If I want to have a MarcXML document encoded in Marc8 -- what should it look like? What should be in the XML decleration?

[CODE4LIB] Job: Web Developer at Michigan Technological University

2012-04-17 Thread jobs
Michigan Technological University's Van Pelt and Opie Library seeks an energetic, user-focused and collegial Web developer that enjoys working on a variety of projects with library and IT staff, faculty, and students that support library services, instruction and research. Michigan

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Kyle Banerjee
So would you use the Marc header payload instead? Or you're just saying you wouldn't trust _any_ encoding declerations you find anywhere? This. The short version is that too many vendors and systems just supply some value without making sure that's what they're spitting out. I haven't had

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Karen Coyle
The discussions at the MARC standards group relating to Unicode all had to do with using Unicode *within* ISO2709. I can't find any evidence that MARCXML ever went through the standards process. (This may not be a bad thing.) So none of what we know about the MARBI discussions and resulting

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Houghton,Andrew
Karen Coyle Sent: Tuesday, April 17, 2012 15:41 Subject: Re: [CODE4LIB] MarcXML and char encodings The discussions at the MARC standards group relating to Unicode all had to do with using Unicode *within* ISO2709. I can't find any evidence that MARCXML ever went through the standards

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Decasm
Let me make some recommendations. These are what I would consider best practices for interoperability. 1) Never put marc8 in xml. Just don't do it. No one expects it. Few will be willing to bother with it. 2) Always prefer utf8 for marcxml. You can use any standard charset if you need  to,

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Jonathan Rochkind
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote: No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog, Wait, how canyou declare a Marc8 encoding in an XML decleration/prolog/whatever it's called? The things that appear there need to be from a

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread LeVan,Ralph
No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog, Wait, how canyou declare a Marc8 encoding in an XML decleration/prolog/whatever it's called? Nope, you can't do that. There is no approved name for the MARC-8 encoding. As Andy said, the closest

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
In XML standard: It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to usingtheir registered names; other encodings SHOULD use names starting with an x- prefix.

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Eric Lease Morgan
MARC-8. Cool in its time. Dumb now. Typical. --ELM

Re: [CODE4LIB] MarcXML and char encodings

2012-04-17 Thread Sheila M. Morrissey
I think this is a case of being in violent agreement -- see some earlier replies in this thread -- Pragmatically, if you are going to hew to marc-8 encoding transported in XML -- you are losing the usefulness of standard tools for xml -- smm -Original Message- From: Code for Libraries

[CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Jonathan Rochkind
Okay, forget XML for a moment, let's just look at marc 'binary'. First, for Anglophone-centric MARC21. The LC docs don't actually say quite what I thought about leader byte 09, used to advertise encoding: a - UCS/Unicode Character coding in the record makes use of characters from the

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Simon Spero
On Tue, Apr 17, 2012 at 7:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Okay, forget XML for a moment, let's just look at marc 'binary'. First, for Anglophone-centric MARC21. Actually Anglo and Francophone centric. And the USMARC style 245 was a poor replacement for the UKMARC approach

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-17 Thread Bill Dueber
On Tue, Apr 17, 2012 at 8:46 PM, Simon Spero sesunc...@gmail.com wrote: Actually Anglo and Francophone centric. And the USMARC style 245 was a poor replacement for the UKMARC approach (someone at the British Library hosted Linked Data meeting wondered why there were punctation characters