Re: [service-orientated-architecture] Bray Prefers it Web-Style

Stuart Charlton Tue, 26 Sep 2006 08:52:31 -0700

This is an interesting discussion --

Firstly, we need to be careful about terminology here. The initial discussion was about principles, not practice. It's pretty clear that for over 10 years both TBL and the W3C viewed the web as a general purpose networked information system, one that assumed many potential intermediaries and/or consumers, both human and automated.

Secondly, there absolutely is a contract being exposed by an HTML website like CNN.   It's just a contract that is oriented towards a specific class of consumer -- user agents and/or crawlers.   The architecture supports and encourages different kinds of consumers, and there is evidence everywhere that the trend is towards new and different classes of consumer: RSS aggregators, Atom publishers, etc.

For another example, there are cases of websites that create contracts that don't pick a consumer-driven contract, they create their own and expect you to transform it.   Tribune Media, the owners of Zap2it.com, for example, havefor several years now offered an XML feed of all of its TV listings on http://labs.zap2it.com/ free for personal, non-commercial use.   Packages like XMLTV handle these feeds and transform/mediate into whatever other format you need , whether HTML, or a TiVO slice, etc.

The difference here arguably in the scalability of the two approaches. Consumer-driven contracts have a similar intent that REST's uniform interface has - it delivers content in a more consumer-relevant format, making it more open to interoperability and network effects if you target a broad class of consumers.   These are things like RSS, HTML, etc. (In googling about this idea, I noticed a decent essay on the concept here:) http://martinfowler.com/articles/consumerDrivenContracts.html

Producer-driven contracts are not as scalable in the outset, but are still needed in cases where a source of data requires several forms of representation, but also needs referential integrity, consistency, etc. Thus likely will it have a "base" representation like a producer-specific XML feed as in the Zap2IT case. A relational schema, an XML document, an object-oriented domain model seem to fit this approach, with XML being the most interoperable format of interchange.

One likely needs an intermediary of sorts to translate producer-oriented contracts into a contract with more relevance to a wider number of people.   In old school terms, this was the intent, I believe, behind the Common Gateway Interface... translating between god-knows-what database and the HTML or other MIME types.

A couple of observations:

- If CNN isn't providing an "CNN proprietary XML schema feed", it's because providing such a feed likely isn't in the economic interests of CNN. It's a proprietary format for one website, so it would always require some intermediary to work with it.    Secondly it implies that people can ignore adverstisements.

- If CNN isn't versioning its website to preserve HTML layouts, its not because they're breaking a contract, it's because the contract itself - the HTML specification - insists that you MUST be tolerant of changes. Any intermediary or user agent should tolerate HTML that evolves.

This is a major issue that is hurting all XML services, especially WS-* services that typically mandate XSD - we design our XML documents without appropriate extensibilty points and parse our XML with such fragility that we don't follow Postel's law (the robustness principle) -- and anger & confusion ensues.

Anyway, in summary
- there apprently hasn't been enough incentive to provide a CNN-specific XML format
- If enough people really want to extract a website's data in a more precise way, a microformat of some kind will likely emerge. OPML, XOXO, hCard and hCalendar seem to be growing in popularity.
- If people want more precise information out of HTML, they typically write crappy , non-robust consumers because they don't have time to follow all the complexity + flexibility offered by that contractual format.
- There are some providers of very robust and flexible HTML scrapers out there (Kapow RoboSuite comes to mind) that do follow most of the rules quite well.

Cheers
Stu

----- Original Message ----
From: Gregg Wonderly <[EMAIL PROTECTED]>
To: [email protected]
Sent: Monday, September 25, 2006 2:48:17 PM
Subject: Re: [service-orientated-architecture] Bray Prefers it Web-Style

Nick Gall wrote:
> So, while we can debate WHEN the assertion that "one of the principles of
> WWW is that a person is at the end of the line" became clearly FALSE (from
> the beginning or sometime later). It is clearly false today. Just ask TBL or
> anyone at the W3C. At best, one can say that a design principle of the WWW
> is that the architecture and artifacts of the WWW be as easy as possible
> for both people and machines to "process".

I don't think its a question of what the design and concepts is/were when. I
think it's a question of practice. There is content targeted at humans which is
being scraped and recast by computers, because there is only one version of that
data (the HTML version). This means that the system is not meeting the needs of
the many. Instead, it is serving the needs through fragil reengineering of the
content into forms more useful for the masses. That reengineering is costly and
recurring because the systems are not contract based, and thus the consumers are
not in control of the content they consume, in any form.

The public are not in control of what CNN, Microsoft or anyone else does to the
HTML. That content and its rendering allow or prohibit particular uses based on
the whim of the blind designer, who has no idea who the users are, and what
elements of the document are useful.

This extends into the enterprise operations for a lot of REST applications as
well. It's trivial to server content. What is less trivial, is to manage the
content so that it is actually useful to have access to.

Gregg Wonderly

__._,_.___

SPONSORED LINKS

`Computer software program`	`Computer software spy`	`Computer job`
`Database software`	`Discount computer software`

Your email settings: Individual Email|Traditional
Change settings via the Web (Yahoo! ID required)
Change settings via email: Switch delivery to Daily Digest | Switch to Fully Featured
Visit Your Group | Yahoo! Groups Terms of Use | Unsubscribe

__,_._,___

Re: [service-orientated-architecture] Bray Prefers it Web-Style

Reply via email to