Re: [whatwg] on bibtex-in-html5
On May 20, 2009, at 19:24, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 Quoting from the blog post: On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Those are good criteria. • BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. The set of fields is more of an issue, but it can be fixed by inventing more fields--it doesn't mean the whole base solution needs to be discarded. Fortunately, having custom fields in .bib doesn't break existing pre-Web, pre-ISBN bibliography styles. I've used at least these custom fields: key: Show this citation pseudo-id in rendering instead of the actual id used for matching. url: The absolute URL of a resource that is on the Web. refdate: The date when the author made the reference to an ephemeral source such as a Web page. isbn: The ISBN of a publication. stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E) Particularly the 'url' and 'isbn' field names should be obvious and uncontroversial additions. • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI- significant part.) If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have a rather brittle and anachronistic approach to extension. Problems of this nature haven't stopped the WHATWG in the past. :-) • The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether on it is THIS document or THAT document. When you are referring to THAT document, you generally want the names of the authors--not their full business cards. Therefore, vCard is an overkill, and conversion to .bib is more useful than conversion to vCard for this use case. My suggestion instead? • reuse Dublin Core and vCard for the generic data: titles, creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they omit This would make conversion to and from the dominant bibliography format (.bib) more complex. Furthermore, there's a risk of a GIGO effect where the conversion can't be done algorithmically. (IIRC, you can't algorithmically map a .bib author name to the vCard name structure without a huge dictionary of names.) • typing should NOT be handled a bibtex-type property, but the same way everything else is typed in the microdata proposal: a global identifier Why is typing even needed except for separating articles from compilations? • make it possible for people to
Re: [whatwg] Exposing known data types in a reusable way
Interesting. Despite my PoV against the microdata proposal, I've taken a look at it and find a minor typo: Within 5.4.1 vCard, by the end of the n property description, the spec reads: The value of the fn property a name in one of the following forms: shouldn't it read: The value of the fn property is a name in one of the following forms: ? Maybe this will grant me a seat for posterity on the acknowledgements section =P. On Wed, May 20, 2009 at 1:07 AM, Ian Hickson i...@hixie.ch wrote: Some of the use cases I collected from the e-mails sent in over the past few months were the following: USE CASE: Exposing contact details so that users can add people to their address books or social networking sites. SCENARIOS: * Instead of giving a colleague a business card, someone gives their colleague a URL, and that colleague's user agent extracts basic profile information such as the person's name along with references to other people that person knows and adds the information into an address book. * A scholar and teacher wants other scholars (and potentially students) to be able to easily extract information about who he is to add it to their contact databases. * Fred copies the names of one of his Facebook friends and pastes it into his OS address book; the contact information is imported automatically. * Fred copies the names of one of his Facebook friends and pastes it into his Webmail's address book feature; the contact information is imported automatically. * David can use the data in a web page to generate a custom browser UI for including a person in our address book without using brittle screen-scraping. REQUIREMENTS: * A user joining a new social network should be able to identify himself to the new social network in way that enables the new social network to bootstrap his account from existing published data (e.g. from another social nework) rather than having to re-enter it, without the new site having to coordinate (or know about) the pre-existing site, without the user having to give either sites credentials to the other, and without the new site finding out about relationships that the user has intentionally kept secret. (http://w2spconf.com/2008/papers/s3p2.pdf) * Data should not need to be duplicated between machine-readable and human-readable forms (i.e. the human-readable form should be machine-readable). * Shouldn't require the consumer to write XSLT or server-side code to read the contact information. * Machine-readable contact information shouldn't be on a separate page than human-readable contact information. * The information should be convertible into a dedicated form (RDF, JSON, XML, vCard) in a consistent manner, so that tools that use this information separate from the pages on which it is found have a standard way of conveying the information. * Should be possible for different parts of a contact to be given in different parts of the page. For example, a page with contact details for people in columns (with each row giving the name, telephone number, etc) should still have unambiguous grouped contact details parseable from it. * Parsing rules should be unambiguous. * Should not require changes to HTML5 parsing rules. USE CASE: Exposing calendar events so that users can add those events to their calendaring systems. SCENARIOS: * A user visits the Avenue Q site and wants to make a note of when tickets go on sale for the tour's stop in his home town. The site says October 3rd, so the user clicks this and selects add to calendar, which causes an entry to be added to his calendar. * A student is making a timeline of important events in Apple's history. As he reads Wikipedia entries on the topic, he clicks on dates and selects add to timeline, which causes an entry to be added to his timeline. * TV guide listings - browsers should be able to expose to the user's tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on. * Paul sometimes gives talks on various topics, and announces them on his blog. He would like to mark up these announcements with proper scheduling information, so that his readers' software can automatically obtain the scheduling information and add it to their calendar. Importantly, some of the rendered data might be more informal than the machine-readable data required to produce a calendar event. * David can use the data in a web page to generate a custom browser UI for adding an event to our calendaring software without using brittle screen-scraping. * http://livebrum.co.uk/: the author would like people to be able to
Re: [whatwg] on bibtex-in-html5
Hi Henri, On Thu, May 21, 2009 at 4:00 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 20, 2009, at 19:24, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 Quoting from the blog post: On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Those are good criteria. Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. • BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. Surely that should not limit how we address this going forward? The set of fields is more of an issue, but it can be fixed by inventing more fields--it doesn't mean the whole base solution needs to be discarded. Fortunately, having custom fields in .bib doesn't break existing pre-Web, pre-ISBN bibliography styles. I've used at least these custom fields: key: Show this citation pseudo-id in rendering instead of the actual id used for matching. url: The absolute URL of a resource that is on the Web. refdate: The date when the author made the reference to an ephemeral source such as a Web page. isbn: The ISBN of a publication. stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E) Particularly the 'url' and 'isbn' field names should be obvious and uncontroversial additions. Trust me: this is not nearly as simple as you think. More below ... • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping Here's some info on Microsoft's bib format for OOXML, that will give you some info: http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 Here's the type schema for CSL (though it needs work, and we de-emphasize this for formatting in any case; CSL is oriented towards output formatting only really): http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup Here's the variable list: http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI-significant part.) The point is, when you get beyond dealing with secondary literature (the domain of BibTeX and the sciences), the range of possible data expands significantly. Things can get really complicated. Consider what's actually pretty simple comparatively: An English translation of a classic work. You often need original publication information such as title (in the original language), publisher and issued date, etc.
Re: [whatwg] on bibtex-in-html5
Oops; two quick things ... On Thu, May 21, 2009 at 8:02 AM, Bruce D'Arcus bdar...@gmail.com wrote: Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. I meant it's JUST that ... Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping FWIW, the Zotero types here refer to what's in their UI ATM. They will, however, be moving to a more flexible and relational UI model here that more closely reflects the BIBO model. Reason? Users were asking for things not easily accommodated in the current, flat, approach (example: a review might be published in a newspaper or a journal, or broadcast on the radio on a podcast). Bruce
Re: [whatwg] on bibtex-in-html5
On May 21, 2009, at 15:02, Bruce D'Arcus wrote: Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. OK. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. In order to not get completely ignored in the technology and sciences sectors, a bibliography microdata format needs to be able to plug into the network effects of BibTeX. Having a non-flat microdata format while BibTeX remains flat would seriously hinder conversions from microdata to BibTeX. How are non-flat bibliographies (beyond an article being in a book / journal / Web site) presented? Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. More to the point, allow to be extracted and used as bibliography source data for another publication to avoid repetitive data entry. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. OK. The styles that I've observed make a difference that isn't traceable to the availability of fields on an item have mainly made a distinction between atomic publications and compilations. • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping On the surface, it seems that it would possible to mint more field types and publications for BibTeX to support those cases, but what is the publication type information used for? Are there as many different entry presentations as there are entry types? Or are the type tokens supposed to be mapped to localized human-readable label strings? Also, the non-flatness I see is an item being part of a compilation which is already supported by BibTeX without allowing the whole model to generalize into a graph. Here's some info on Microsoft's bib format for OOXML, that will give you some info: http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 It seems relatively straight-forward technically to extend BibTeX with the field types from OOXML that BibTeX doesn't cover. The main issue seems to be the bikeshed of what names to use. Here's the type schema for CSL (though it needs work, and we de-emphasize this for formatting in any case; CSL is oriented towards output formatting only really): http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup Here's the variable list: http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup I don't see a fundamental reason why the BibTeX vocabulary couldn't be extended with stuff from there. • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI-significant part.) The point is, when you get beyond dealing with secondary literature (the domain of BibTeX and the sciences), the range of possible data expands significantly. Things can get really complicated. Consider what's actually pretty simple comparatively: An English translation of a classic work. You often need original publication information such as title (in the original language), publisher and issued date, etc. With a flat model, you have to invent new properties to accommodate every little exception like this. What formats/software do people use for cases like that in practice? If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which
Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)
On Thu, 2009-05-21 at 13:26 +0200, Eduard Pascual wrote: [... lots ...] Eduard, thanks for your long and informative reply. I won't go into every point mentioned in detail, but in summary I'd like to say that your message reassured me on a few points and perhaps CRDF is not as bad as I initially thought. That said, I do think that externalising of the semantics of a document is a mistake. As the author of RDF-EASE, I don't say this without having thought the matter through. CSS was invented as a way to separate out content from styling. Or to put it another way, to separate out data and presentation, which allows the same data to be re-presented (or indeed represented) in many different ways. The unobtrusive scripting movement (for want of a better word) aims to separate out behaviour from data, which I think is also a worthy ideal. But I consider the information which RDFa carries to be very strongly part of the document's *data*, so not especially suitable for separating out. (This consideration very much effected the design of RDF-EASE. You'll note that the -rdf-about and -rdf-content properties which it defines do not allow the author to hard code data into the RDF-EASE file -- they only allow the author to specify an attribute from the (X)HTML file where the data can be found.) That's very much an ideological argument, and I appreciate that not everyone shares my ideology. But for those who don't, there is also the more practical argument that separating out an aspect of the document's meaning from the bulk of the markup increases the fragility of its meaning. If the external file is lost, then part of the document's meaning is lost. Some people might argue that RDF already does this by relying on external vocabularies, but this is only partly so. By simply using span about=#me xmlns:foaf=http://xmlns.com/foaf/0.1/; property=foaf:name.../span then I am, to a certain extent relying on the FOAF project's definition of name to be stable. (Bear with me here, as this is about to start to seem very abstract, but I'll bring it back to the more practical eventually.) Even without RDFa though, I am relying on the usual English definition of name being stable. It might seem unlikely that the standard English definition of words is going to change especially much, but remember that some of HTML5's proponents have lofty ambitions that HTML5 documents should still be readable in 1000 years. Think not of 1000 years, but consider how, just in our own lifetimes, the words 'Web', 'surf' and 'browser' have picked up new meanings which probably surpass their original meanings in terms of day-to-day usage. Look back at how English was spoken 1000 years ago and you'll appreciate how much it's changed. Many people have difficulty reading Shakespeare, who wrote his work a mere ~400 years ago. Chaucer's The Canterbury Tales which was written only 200 years earlier is virtually indecipherable these days. Go back any further and you are effectively looking at another language. Some believe that the future will bring an even faster rate of change to the English language, with new technologies giving us new concepts to think about and label, and the ever wider spread of English as a second language leading to an increase in loan words. A great help in clarifying your usage of terms is the inclusion of a glossary. For example, I could write: dl dtname/dt dd A name is a label for a noun, (human or animal, thing, place, product [as in a brand name] and even an idea or concept), normally used to distinguish one from another. (a href=http://en.wikipedia.org/wiki/Name;source/a) /dd /dl With RDFa, the idea of a glossary can be used to reduce our reliance on external vocabularies: dl xmlns:foaf=http://xmlns.com/foaf/0.1/; xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#; dt about=[foaf:name] property=rdfs:labelname/dt dd about=[foaf:name] property=rdfs:comment datatype= A name is a label for a noun, (human or animal, thing, place, product [as in a brand name] and even an idea or concept), normally used to distinguish one from another. (a rel=rdfs:seeAlso href=http://en.wikipedia.org/wiki/Name;source/a) /dd /dl This doesn't completely eliminate the risk, but goes a long way to mitigating it. Anyway, that's enough on internal/external data. A few more specific points... The reduced number of attributes in CRDF is not aimed to deal with complexity; but with a separate issue: it is easier for a host language to add a rel value for links and an extra attribute with no predefined name, than the bunch of attributes RDFa defines. Not just an extra rel value for link, but in some languages it would involve introducing the link element to begin with. The cost of introducing a new element is significantly higher than new attributes, given that in most implementations of XML-like languages, unknown attributes are generally ignored. Actually, there have
Re: [whatwg] on bibtex-in-html5
On Thu, May 21, 2009 at 9:51 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 21, 2009, at 15:02, Bruce D'Arcus wrote: Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. OK. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. In order to not get completely ignored in the technology and sciences sectors, a bibliography microdata format needs to be able to plug into the network effects of BibTeX. Having a non-flat microdata format while BibTeX remains flat would seriously hinder conversions from microdata to BibTeX. All that matters from a BIbTeX perspective is that the data is a clean superset. E.g. so long as a book, chapter, article, etc. can be reliably converted to and from BibTeX, there's no problem. The same is true of all the other bib formats out there: RIS, NLM, MODS, PRISM, OOXML, etc. How are non-flat bibliographies (beyond an article being in a book / journal / Web site) presented? A journal article is always a good example. If you like, take a look at the RDFa embedded in this example: http://bruce.darcus.name/publications/articles/outside-agitator Now, let's consider the most basic and important distinction: how you represent the journal title. In BibTeX, it's (typically) a flat journal key. In the DC/BIBO representation here, you use a dc:isPartOf relation, so that the triples look like: http://bruce.darcus.name/publications/articles/outside-agitator a bibo:AcademicArticle ; dc:title Dissent, Public Space and the Politics of Citizenship: Riots and the Outside Agitator@en ; bibo:doi 10.1080/1356257042000309652 ; bibo:issue 3 ; bibo:pageEnd 370 ; bibo:pageStart 355 ; bibo:volume 8 ; dc:creator http://bruce.darcus.name/about#me ; dc:isPartOf [ dc:title Space amp; Polity ] . So that same mechanism can be used to represent related titles of all sorts: weblogs, magazines and newspapers, court reporters (which are really just periodicals that published legal decisions), etc. The alternative in a totally flat model is having to invent new title properties every time you come across new data (or using a more generic key than journal to represent the containing title). I explain the basic thinking behind this using some actual examples from citation styles here: http://www.users.muohio.edu/darcusb/misc/citations-spec.html They're really just design notes, but I think communicate the point. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. More to the point, allow to be extracted and used as bibliography source data for another publication to avoid repetitive data entry. Yes. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. OK. The styles that I've observed make a difference that isn't traceable to the availability of fields on an item have mainly made a distinction between atomic publications and compilations. Yes. But you also have styles that have conventions like if you have a book, format title in italics, else ... So there are little hints like that which give a (human) reader information they can use to find the source in question. As the creator of CSL, I've always said my intention is to contribute toward helping us move beyond some of these eccentric traditions, though! • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping On the surface, it seems that it would possible to mint more field types and publications for BibTeX to support those cases, but what is the publication type information used for? Are there as many different entry presentations as there are entry types? Or are the type tokens supposed to be mapped to localized human-readable label strings? It depends. For Zotero, a lot of it is about mapping to particular UI configurations for data entry and editing. But they can also be used for mapping to output styling as defined in CSL (which is loosely inspired by BibTeX's BST language, but is XML). Also, the non-flatness I see is an item being part of a compilation which is already supported by BibTeX without allowing the whole model to generalize into a graph. Where is the generic BibTeX key to denote a containing item? There's no publication-title or
[whatwg] Naming of Self-closing start tag state
I think this is a bit of a misnomer, as the current token can be an end tag token (although it will throw a parse error whatever happens once it reaches this state). I suggest renaming it to self-closing tag state. -- Geoffrey Sneddon http://gsnedders.com/
Re: [whatwg] on bibtex-in-html5
Both FOAF and vCard have unstructured personal name properties (foaf:name and v:fn) that address this. But vCard required both N and FN, so if you only have FN, you can't get an N without a lot of dictionary-based domain knowledge and special rules. (Or you can make a GIGO N...) Hmm ... that's not how it's implemented in hcard. It is, actually. hCard requires both FN and N, but allows N to be implied by FN in some cases. http://microformats.org/wiki/hcard#Implied_.22n.22_Optimization Ted
Re: [whatwg] DOMParser / XMLSerializer
Anne van Kesteren wrote: 2) DOMParser can parse from a byte array instead of a string; this makes it a little easier to work with XML in encodings other than UTF-8 or UTF-16. ECMASCript doesn't have byte arrays though. (Though it would be nice if it did.) Sure, but it has arrays that you can put integers in the 0-255 range into. 2) XMLSerializer can serialize a subtree rooted at a given node without removing the node from its current location in the DOM. Isn't this true for innerHTML too? No, you'd need outerHTML for that. At least if you want to get the same behavior as XMLSerializer has. -Boris