Jonathan O'Donnell wrote:

Hi WSG'ers

In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place.

Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the URL, and date information.

If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data.

** Method **


Hi Jonathan,

Given what you have said here, and what I would expect to see in serious authoring tools and CMSs, I think this area is generally neglected in most publishing tools (last time I looked).

Quit a few CMS's say that they are DC compliant, but as you mentioned, do they actually store the data in one place, and not in the web pages? Is it part of the work flow and version control of the documents? I don't think so. I'd be glad if anyone can point me to a product that does address this need.

For a CMS to address this properly, it needs to have incorporated a normalised schema based on DC into it's database. This was all the pages published from this system can incorporate the various metadata as well as "alt" and "longdesc" for images.

Many organisations have legal requirements where they require snapshots of published data from any given time. A publishing system based on DC not only allows this features, but allow a complete analysis of all the subcomponents of a document and the various contributors.

That also leads to problems with document management systems that manage their meta data from properties within the documents and network environment variables.

Last time I tried to extract metadata from MS Word, using Perl and Python, I could only get the standard set of properties, any data in custom properties was unretrievable (at least by me). I don't know what OO or the latest MS Office offers.

But I don't think asking users to maintain this data will work, unless they are librarians. I think that it has to be as automated and as transparent to the user as possible, because most users are just not interested in this level of site QA, unless it is an important component of the job.

Regards
Geoff Deering
******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************

Reply via email to