Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Henri Sivonen

On May 20, 2009, at 19:24, Bruce D'Arcus wrote:


Re: the recent microdata work and the subsequent effort to include
BibTeX in the spec, I summarized my argument against this on my blog:

http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 


Quoting from the blog post:
On the last use case, he has chosen BibTeX, on the basis that it is  
widely used and simple to author and process.



Those are good criteria.
	• BibTeX is designed for the sciences, that typically only cite  
secondary academic literature. It is thus inadequate for, nor widely  
used, in many fields outside of the sciences: the humanities and law  
being quite obvious examples. For this reason, BibTeX cannot by  
default adequately represent even the use cases Ian has identified.  
For example, there are many citations on Wikipedia that can only be  
represented using effectively useless types such as “misc” and which  
require new properties to be invented.


This doesn't mean that BibTeX is a bad basis. The set of types and  
fields is limited, though.


Since renderings of bibliography don't show the type of the reference  
usually, having to use 'misc' for almost everything isn't a practical  
problem although it is aesthetically displeasing.


The set of fields is more of an issue, but it can be fixed by  
inventing more fields--it doesn't mean the whole base solution needs  
to be discarded. Fortunately, having custom fields in .bib doesn't  
break existing pre-Web, pre-ISBN bibliography styles. I've used at  
least these custom fields:


key: Show this citation pseudo-id in rendering instead of the actual  
id used for matching.

url: The absolute URL of a resource that is on the Web.
refdate: The date when the author made the reference to an ephemeral  
source such as a Web page.

isbn: The ISBN of a publication.
stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E)

Particularly the 'url' and 'isbn' field names should be obvious and  
uncontroversial additions.


	• Related, BibTeX cannot represent much of the data in widely used  
bibliographic applications such as Endnote, RefWorks and Zotero  
except in very general ways.


Do you have an example? (I've never used the other formats.)

	• The BibTeX extensibility model puts a rather large burden on  
inventing new properties to accommodate data not in the core model.  
For example, the core model has no way to represent a DOI identifier  
(this is no surprise, as BibTeX was created before DOIs existed). As  
a  consequence, people have gradually added this to their BibTeX  
records and styles in a more ad hoc way. This ad hoc approach to  
extensibility has one of two consequences: either the vocabulary  
terms are understood as completely uncontrolled strings, or one  
needs to standardize them. If we assume the first case, we introduce  
potential interoperability problems.


In practice, those problems have already been introduced. For some  
reason I don't understand, there's an existing pattern of calling a  
field 'doi' but putting an absolute URI in the value. (As opposed to  
using a field name 'url' or a value that contains only the DOI- 
significant part.)


If we assume the second, we have an organizational and process  
problem: that the WHATWG and/or the W3C—neither of which have  
expertise in this domain—become the gate-keepers for such  
extensions. In either case, we have a rather brittle and  
anachronistic approach to extension.


Problems of this nature haven't stopped the WHATWG in the past. :-)

	• The BibTeX model conflicts with Dublin Core and with vCard, both  
of which are quite sensibly used elsewhere in the microdata spec to  
encode information related to the document proper. There seems  
little justification in having two different ways to represent a  
document  depending on whether on it is THIS document or THAT  
document.


When you are referring to THAT document, you generally want the names  
of the authors--not their full business cards. Therefore, vCard is an  
overkill, and conversion to .bib is more useful than conversion to  
vCard for this use case.



My suggestion instead?
	• reuse Dublin Core and vCard for the generic data: titles,  
creators/contributors, publisher, dates, part/version relations,  
etc.,  and only add those properties (volume, issue, pages, editors,  
etc.) that they omit


This would make conversion to and from the dominant bibliography  
format (.bib) more complex. Furthermore, there's a risk of a GIGO  
effect where the conversion can't be done algorithmically. (IIRC, you  
can't algorithmically map a .bib author name to the vCard name  
structure without a huge dictionary of names.)


	• typing should NOT be handled a bibtex-type property, but the same  
way everything else is typed in the microdata proposal: a global  
identifier


Why is typing even needed except for separating articles from  
compilations?


	• make it possible for people to 

Re: [whatwg] Exposing known data types in a reusable way

2009-05-21 Thread Eduard Pascual
Interesting.
Despite my PoV against the microdata proposal, I've taken a look at it
and find a minor typo:

Within 5.4.1 vCard, by the end of the n property description, the
spec reads:
The value of the fn property a name in one of the following forms:
shouldn't it read:
The value of the fn property is a name in one of the following forms: ?

Maybe this will grant me a seat for posterity on the acknowledgements
section =P.

On Wed, May 20, 2009 at 1:07 AM, Ian Hickson i...@hixie.ch wrote:

 Some of the use cases I collected from the e-mails sent in over the past
 few months were the following:

   USE CASE: Exposing contact details so that users can add people to their
   address books or social networking sites.

   SCENARIOS:
     * Instead of giving a colleague a business card, someone gives their
       colleague a URL, and that colleague's user agent extracts basic
       profile information such as the person's name along with references to
       other people that person knows and adds the information into an
       address book.
     * A scholar and teacher wants other scholars (and potentially students)
       to be able to easily extract information about who he is to add it to
       their contact databases.
     * Fred copies the names of one of his Facebook friends and pastes it
       into his OS address book; the contact information is imported
       automatically.
     * Fred copies the names of one of his Facebook friends and pastes it
       into his Webmail's address book feature; the contact information is
       imported automatically.
     * David can use the data in a web page to generate a custom browser UI
       for including a person in our address book without using brittle
       screen-scraping.

   REQUIREMENTS:
     * A user joining a new social network should be able to identify himself
       to the new social network in way that enables the new social network
       to bootstrap his account from existing published data (e.g. from
       another social nework) rather than having to re-enter it, without the
       new site having to coordinate (or know about) the pre-existing site,
       without the user having to give either sites credentials to the other,
       and without the new site finding out about relationships that the user
       has intentionally kept secret.
       (http://w2spconf.com/2008/papers/s3p2.pdf)
     * Data should not need to be duplicated between machine-readable and
       human-readable forms (i.e. the human-readable form should be
       machine-readable).
     * Shouldn't require the consumer to write XSLT or server-side code to
       read the contact information.
     * Machine-readable contact information shouldn't be on a separate page
       than human-readable contact information.
     * The information should be convertible into a dedicated form (RDF,
       JSON, XML, vCard) in a consistent manner, so that tools that use this
       information separate from the pages on which it is found have a
       standard way of conveying the information.
     * Should be possible for different parts of a contact to be given in
       different parts of the page. For example, a page with contact details
       for people in columns (with each row giving the name, telephone
       number, etc) should still have unambiguous grouped contact details
       parseable from it.
     * Parsing rules should be unambiguous.
     * Should not require changes to HTML5 parsing rules.


   USE CASE: Exposing calendar events so that users can add those events to
   their calendaring systems.

   SCENARIOS:
     * A user visits the Avenue Q site and wants to make a note of when
       tickets go on sale for the tour's stop in his home town. The site says
       October 3rd, so the user clicks this and selects add to calendar,
       which causes an entry to be added to his calendar.
     * A student is making a timeline of important events in Apple's history.
       As he reads Wikipedia entries on the topic, he clicks on dates and
       selects add to timeline, which causes an entry to be added to his
       timeline.
     * TV guide listings - browsers should be able to expose to the user's
       tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on.
     * Paul sometimes gives talks on various topics, and announces them on
       his blog. He would like to mark up these announcements with proper
       scheduling information, so that his readers' software can
       automatically obtain the scheduling information and add it to their
       calendar. Importantly, some of the rendered data might be more
       informal than the machine-readable data required to produce a calendar
       event.
     * David can use the data in a web page to generate a custom browser UI
       for adding an event to our calendaring software without using brittle
       screen-scraping.
     * http://livebrum.co.uk/: the author would like people to be able to
    

Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
Hi Henri,

On Thu, May 21, 2009 at 4:00 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 20, 2009, at 19:24, Bruce D'Arcus wrote:

 Re: the recent microdata work and the subsequent effort to include
 BibTeX in the spec, I summarized my argument against this on my blog:


 http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5

 Quoting from the blog post:

 On the last use case, he has chosen BibTeX, on the basis that it is widely
 used and simple to author and process.

 Those are good criteria.

Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.

        • BibTeX is designed for the sciences, that typically only cite
 secondary academic literature. It is thus inadequate for, nor widely used,
 in many fields outside of the sciences: the humanities and law being quite
 obvious examples. For this reason, BibTeX cannot by default adequately
 represent even the use cases Ian has identified. For example, there are many
 citations on Wikipedia that can only be represented using effectively
 useless types such as “misc” and which require new properties to be
 invented.

 This doesn't mean that BibTeX is a bad basis. The set of types and fields is
 limited, though.

It's limited, and it's flat.

 Since renderings of bibliography don't show the type of the reference
 usually, having to use 'misc' for almost everything isn't a practical
 problem although it is aesthetically displeasing.

But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.

Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher. Surely that should not limit how we address this
going forward?

 The set of fields is more of an issue, but it can be fixed by inventing more
 fields--it doesn't mean the whole base solution needs to be discarded.
 Fortunately, having custom fields in .bib doesn't break existing pre-Web,
 pre-ISBN bibliography styles. I've used at least these custom fields:

 key: Show this citation pseudo-id in rendering instead of the actual id used
 for matching.
 url: The absolute URL of a resource that is on the Web.
 refdate: The date when the author made the reference to an ephemeral source
 such as a Web page.
 isbn: The ISBN of a publication.
 stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E)

 Particularly the 'url' and 'isbn' field names should be obvious and
 uncontroversial additions.

Trust me: this is not nearly as simple as you think. More below ...

        • Related, BibTeX cannot represent much of the data in widely used
 bibliographic applications such as Endnote, RefWorks and Zotero except in
 very general ways.

 Do you have an example? (I've never used the other formats.)

Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

https://www.zotero.org/trac/wiki/BiboMapping

Here's some info on Microsoft's bib format for OOXML, that will give
you some info:

http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14

Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup

Here's the variable list:

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup

        • The BibTeX extensibility model puts a rather large burden on
 inventing new properties to accommodate data not in the core model. For
 example, the core model has no way to represent a DOI identifier (this is no
 surprise, as BibTeX was created before DOIs existed). As a  consequence,
 people have gradually added this to their BibTeX records and styles in a
 more ad hoc way. This ad hoc approach to extensibility has one of two
 consequences: either the vocabulary terms are understood as completely
 uncontrolled strings, or one needs to standardize them. If we assume the
 first case, we introduce potential interoperability problems.

 In practice, those problems have already been introduced. For some reason I
 don't understand, there's an existing pattern of calling a field 'doi' but
 putting an absolute URI in the value. (As opposed to using a field name
 'url' or a value that contains only the DOI-significant part.)

The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a classic work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.


Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
Oops; two quick things ...

On Thu, May 21, 2009 at 8:02 AM, Bruce D'Arcus bdar...@gmail.com wrote:


 Citation and bibliographic formatting conventions do include
 information that suggests type; it's not that it requires a human
 reader to decipher.

I meant it's JUST that ...



 Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
 few others; PO from the BBC, and SIOC):

 https://www.zotero.org/trac/wiki/BiboMapping

FWIW, the Zotero types here refer to what's in their UI ATM. They
will, however, be moving to a more flexible and relational UI model
here that more closely reflects the BIBO model. Reason? Users were
asking for things not easily accommodated in the current, flat,
approach (example: a review might be published in a newspaper or a
journal, or broadcast on the radio on a podcast).

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Henri Sivonen

On May 21, 2009, at 15:02, Bruce D'Arcus wrote:


Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.


OK.

This doesn't mean that BibTeX is a bad basis. The set of types and  
fields is

limited, though.


It's limited, and it's flat.


In order to not get completely ignored in the technology and sciences  
sectors, a bibliography microdata format needs to be able to plug into  
the network effects of BibTeX. Having a non-flat microdata format  
while BibTeX remains flat would seriously hinder conversions from  
microdata to BibTeX.


How are non-flat bibliographies (beyond an article being in a book /  
journal / Web site) presented?



Since renderings of bibliography don't show the type of the reference
usually, having to use 'misc' for almost everything isn't a practical
problem although it is aesthetically displeasing.


But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.


More to the point, allow to be extracted and used as bibliography  
source data for another publication to avoid repetitive data entry.



Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher.


OK. The styles that I've observed make a difference that isn't  
traceable to the availability of fields on an item have mainly made a  
distinction between atomic publications and compilations.


   • Related, BibTeX cannot represent much of the data in  
widely used
bibliographic applications such as Endnote, RefWorks and Zotero  
except in

very general ways.


Do you have an example? (I've never used the other formats.)


Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

https://www.zotero.org/trac/wiki/BiboMapping


On the surface, it seems that it would possible to mint more field  
types and publications for BibTeX to support those cases, but what is  
the publication type information used for? Are there as many different  
entry presentations as there are entry types? Or are the type tokens  
supposed to be mapped to localized human-readable label strings?


Also, the non-flatness I see is an item being part of a compilation  
which is already supported by BibTeX without allowing the whole model  
to generalize into a graph.



Here's some info on Microsoft's bib format for OOXML, that will give
you some info:

http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 



It seems relatively straight-forward technically to extend BibTeX with  
the field types from OOXML that BibTeX doesn't cover. The main issue  
seems to be the bikeshed of what names to use.



Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup 



Here's the variable list:

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup 



I don't see a fundamental reason why the BibTeX vocabulary couldn't be  
extended with stuff from there.


   • The BibTeX extensibility model puts a rather large burden  
on
inventing new properties to accommodate data not in the core  
model. For
example, the core model has no way to represent a DOI identifier  
(this is no
surprise, as BibTeX was created before DOIs existed). As a   
consequence,
people have gradually added this to their BibTeX records and  
styles in a
more ad hoc way. This ad hoc approach to extensibility has one of  
two
consequences: either the vocabulary terms are understood as  
completely
uncontrolled strings, or one needs to standardize them. If we  
assume the

first case, we introduce potential interoperability problems.


In practice, those problems have already been introduced. For some  
reason I
don't understand, there's an existing pattern of calling a field  
'doi' but
putting an absolute URI in the value. (As opposed to using a field  
name

'url' or a value that contains only the DOI-significant part.)


The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a classic work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.

With a flat model, you have to invent new properties to accommodate
every little exception like this.


What formats/software do people use for cases like that in practice?

If we assume the second, we have an organizational and process  
problem:
that the WHATWG and/or the W3C—neither of which 

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-21 Thread Toby Inkster
On Thu, 2009-05-21 at 13:26 +0200, Eduard Pascual wrote:

 [... lots ...]

Eduard, thanks for your long and informative reply. I won't go into
every point mentioned in detail, but in summary I'd like to say that
your message reassured me on a few points and perhaps CRDF is not as bad
as I initially thought.

That said, I do think that externalising of the semantics of a document
is a mistake. As the author of RDF-EASE, I don't say this without having
thought the matter through. 

CSS was invented as a way to separate out content from styling. Or to
put it another way, to separate out data and presentation, which allows
the same data to be re-presented (or indeed represented) in many
different ways. The unobtrusive scripting movement (for want of a
better word) aims to separate out behaviour from data, which I think is
also a worthy ideal. But I consider the information which RDFa carries
to be very strongly part of the document's *data*, so not especially
suitable for separating out.

(This consideration very much effected the design of RDF-EASE. You'll
note that the -rdf-about and -rdf-content properties which it defines do
not allow the author to hard code data into the RDF-EASE file -- they
only allow the author to specify an attribute from the (X)HTML file
where the data can be found.)

That's very much an ideological argument, and I appreciate that not
everyone shares my ideology. But for those who don't, there is also the
more practical argument that separating out an aspect of the document's
meaning from the bulk of the markup increases the fragility of its
meaning. If the external file is lost, then part of the document's
meaning is lost.

Some people might argue that RDF already does this by relying on
external vocabularies, but this is only partly so.

By simply using span about=#me
xmlns:foaf=http://xmlns.com/foaf/0.1/; property=foaf:name.../span
then I am, to a certain extent relying on the FOAF project's definition
of name to be stable.

(Bear with me here, as this is about to start to seem very abstract,
but I'll bring it back to the more practical eventually.)

Even without RDFa though, I am relying on the usual English definition
of name being stable. It might seem unlikely that the standard English
definition of words is going to change especially much, but remember
that some of HTML5's proponents have lofty ambitions that HTML5
documents should still be readable in 1000 years. 

Think not of 1000 years, but consider how, just in our own lifetimes,
the words 'Web', 'surf' and 'browser' have picked up new meanings which
probably surpass their original meanings in terms of day-to-day usage.

Look back at how English was spoken 1000 years ago and you'll appreciate
how much it's changed. Many people have difficulty reading Shakespeare,
who wrote his work a mere ~400 years ago. Chaucer's The Canterbury
Tales which was written only 200 years earlier is virtually
indecipherable these days. Go back any further and you are effectively
looking at another language.

Some believe that the future will bring an even faster rate of change to
the English language, with new technologies giving us new concepts to
think about and label, and the ever wider spread of English as a second
language leading to an increase in loan words.

A great help in clarifying your usage of terms is the inclusion of a
glossary. For example, I could write:

dl
  dtname/dt
  dd
A name is a label for a noun, (human or animal,
thing, place, product [as in a brand name] and even an
idea or concept), normally used to distinguish one from
another.
(a href=http://en.wikipedia.org/wiki/Name;source/a)
  /dd
/dl

With RDFa, the idea of a glossary can be used to reduce our reliance on
external vocabularies:

dl xmlns:foaf=http://xmlns.com/foaf/0.1/;
xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#;
  dt about=[foaf:name] property=rdfs:labelname/dt
  dd about=[foaf:name] property=rdfs:comment datatype=
A name is a label for a noun, (human or animal,
thing, place, product [as in a brand name] and even an
idea or concept), normally used to distinguish one from
another.
(a rel=rdfs:seeAlso
href=http://en.wikipedia.org/wiki/Name;source/a)
  /dd
/dl

This doesn't completely eliminate the risk, but goes a long way to
mitigating it.

Anyway, that's enough on internal/external data. A few more specific
points...

 The reduced number of attributes in CRDF is not aimed to deal with
 complexity; but with a separate issue: it is easier for a host
 language to add a rel value for links and an extra attribute with no
 predefined name, than the bunch of attributes RDFa defines.

Not just an extra rel value for link, but in some languages it would
involve introducing the link element to begin with. The cost of
introducing a new element is significantly higher than new attributes,
given that in most implementations of XML-like languages, unknown
attributes are generally ignored.

 Actually,
 there have 

Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
On Thu, May 21, 2009 at 9:51 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 21, 2009, at 15:02, Bruce D'Arcus wrote:

 Except the assumption that BIbTeX is widely used is overdrawn once you
 get out of the technology and sciences sectors.

 OK.

 This doesn't mean that BibTeX is a bad basis. The set of types and fields
 is
 limited, though.

 It's limited, and it's flat.

 In order to not get completely ignored in the technology and sciences
 sectors, a bibliography microdata format needs to be able to plug into the
 network effects of BibTeX. Having a non-flat microdata format while BibTeX
 remains flat would seriously hinder conversions from microdata to BibTeX.

All that matters from a BIbTeX perspective is that the data is a clean
superset. E.g. so long as a book, chapter, article, etc. can be
reliably converted to and from BibTeX, there's no problem.

The same is true of all the other bib formats out there: RIS, NLM,
MODS, PRISM, OOXML, etc.

 How are non-flat bibliographies (beyond an article being in a book / journal
 / Web site) presented?

A journal article is always a good example. If you like, take a look
at the RDFa embedded in this example:

http://bruce.darcus.name/publications/articles/outside-agitator

Now, let's consider the most basic and important distinction: how you
represent the journal title.

In BibTeX, it's (typically) a flat journal key.

In the DC/BIBO representation here, you use a dc:isPartOf relation, so
that the triples look like:

http://bruce.darcus.name/publications/articles/outside-agitator a
bibo:AcademicArticle ;
dc:title Dissent, Public Space and the Politics of Citizenship:
Riots and the Outside Agitator@en ;
bibo:doi 10.1080/1356257042000309652 ;
bibo:issue 3 ;
bibo:pageEnd 370 ;
bibo:pageStart 355 ;
bibo:volume 8 ;
dc:creator http://bruce.darcus.name/about#me ;
dc:isPartOf [ dc:title Space amp; Polity ] .

So that same mechanism can be used to represent related titles of all
sorts: weblogs, magazines and newspapers, court reporters (which are
really just periodicals that published legal decisions), etc.

The alternative in a totally flat model is having to invent new title
properties every time you come across new data (or using a more
generic key than journal to represent the containing title).

I explain the basic thinking behind this using some actual examples
from citation styles here:

http://www.users.muohio.edu/darcusb/misc/citations-spec.html

They're really just design notes, but I think communicate the point.

 Since renderings of bibliography don't show the type of the reference
 usually, having to use 'misc' for almost everything isn't a practical
 problem although it is aesthetically displeasing.

 But this is not the point of adding structured data to HTML; it's to
 allow it be extracted, and subsequently processed, as data.

 More to the point, allow to be extracted and used as bibliography source
 data for another publication to avoid repetitive data entry.

Yes.

 Citation and bibliographic formatting conventions do include
 information that suggests type; it's not that it requires a human
 reader to decipher.

 OK. The styles that I've observed make a difference that isn't traceable to
 the availability of fields on an item have mainly made a distinction between
 atomic publications and compilations.

Yes. But you also have styles that have conventions like if you have
a book, format title in italics, else ... So there are little hints
like that which give a (human) reader information they can use to find
the source in question.

As the creator of CSL, I've always said my intention is to contribute
toward helping us move beyond some of these eccentric traditions,
though!

       • Related, BibTeX cannot represent much of the data in widely used
 bibliographic applications such as Endnote, RefWorks and Zotero except
 in
 very general ways.

 Do you have an example? (I've never used the other formats.)

 Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
 few others; PO from the BBC, and SIOC):

 https://www.zotero.org/trac/wiki/BiboMapping

 On the surface, it seems that it would possible to mint more field types and
 publications for BibTeX to support those cases, but what is the publication
 type information used for? Are there as many different entry presentations
 as there are entry types? Or are the type tokens supposed to be mapped to
 localized human-readable label strings?

It depends. For Zotero, a lot of it is about mapping to particular UI
configurations for data entry and editing.

But they can also be used for mapping to output styling as defined in
CSL (which is loosely inspired by BibTeX's BST language, but is XML).

 Also, the non-flatness I see is an item being part of a compilation which is
 already supported by BibTeX without allowing the whole model to generalize
 into a graph.

Where is the generic BibTeX key to denote a containing item? There's
no publication-title or 

[whatwg] Naming of Self-closing start tag state

2009-05-21 Thread Geoffrey Sneddon
I think this is a bit of a misnomer, as the current token can be an  
end tag token (although it will throw a parse error whatever happens  
once it reaches this state). I suggest renaming it to self-closing  
tag state.


--
Geoffrey Sneddon
http://gsnedders.com/



Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Edward O'Connor
 Both FOAF and vCard have unstructured personal name properties
 (foaf:name and v:fn) that address this.

 But vCard required both N and FN, so if you only have FN, you can't get an N
 without a lot of dictionary-based domain knowledge and special rules. (Or
 you can make a GIGO N...)

 Hmm ... that's not how it's implemented in hcard.

It is, actually. hCard requires both FN and N, but allows N to be
implied by FN in some cases.

http://microformats.org/wiki/hcard#Implied_.22n.22_Optimization


Ted


Re: [whatwg] DOMParser / XMLSerializer

2009-05-21 Thread Boris Zbarsky

Anne van Kesteren wrote:

2)  DOMParser can parse from a byte array instead of a string; this
 makes it a little easier to work with XML in encodings other than
 UTF-8 or UTF-16.


ECMASCript doesn't have byte arrays though. (Though it would be nice if it did.)


Sure, but it has arrays that you can put integers in the 0-255 range into.


2)  XMLSerializer can serialize a subtree rooted at a given node without
 removing the node from its current location in the DOM.


Isn't this true for innerHTML too?


No, you'd need outerHTML for that.  At least if you want to get the same 
behavior as XMLSerializer has.


-Boris