Re: [WSG] Wild metadata

Steven C. Perkins Mon, 14 Nov 2005 05:44:12 -0800

You might be interested in MKSearch, it searches for DC metadata in thehead section of web pages.


http://www.mksearch.mkdoc.org/


Regards,

Steven C. Perkins

At 03:16 AM 11/14/2005, you wrote:

Hi DC-General and the Web Standards Group

Here's another half-baked idea that I am trying to straighten out.  I
would appreciate your feedback and suggestions.  This will be my last
one for a while, I promise.

** The problem **
On the Web, DC.description and DC.subject are not very effective
finding aids when the full text is indexed.

** The solution **
Wild metadata, such as anchor text, blog descriptions and folksonomies
may provide better description and subject (or keyword) metadata.

** Example **
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
        <head>

<link rel="schema.dc"href="http://purl.org/dc/elements/1.1/"; />

                <link rel="schema.terms" href="http://purl.org/dc/terms/"; />
                <link rel="DC.subject"

href="http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&query=link:http://jod.id.au/tutorial/naked-metadata.html" />

                <link rel="DC.subject"
href="http://del.icio.us/rss/url/e43f0f84e421ed5de166b285eca30468"; />
                <link rel="DC.description"

href="http://www.blogdigger.com/rssLinkSearch.jsp?link=http://jod.id.au/tutorial/wild-metadata.html" />

</head>
<body>
</body>
</html>


** Background **

At the DC-ANZ 2005, David Hawking (Panoptic, CSIRO) convinced me that
DC.Description and DC.Subject metadata aren't very useful finding aids
when the full text of a Web page is indexed.  He showed a comparison of
searches based on subject and description metadata versus searches
based on anchor text alone, and the anchor text search was just as
effective. [1, 2]

Aside from Web page authors, lots of people spend time indexing and
categorising Web pages. They build links, write blog entries and tag
pages in folksonomies. This metadata is wild - it is not crafted or
controlled by the agency who created the page.  It hasn't been
commissioned and it represents a variety of world views.  Individually,
these pieces of metadata may not be very useful.  In numbers, however,
the irregularities begin to smooth out and the information may be as
good or better than metadata written by a Web page author.

The quality will not be as good as trained librarians applying metadata
via a standardised system and controlled vocabularies.  It will,
however, be as good or better than untrained people applying metadata
to their own pages.  It will also be better than no metadata at all.

** Method **
A rough and ready method consists of finding pages that display anchor
text, weblog summaries and folksonomy tags for a given page.
Preference is given to pages that provide results in a well-formed XML
format, as these assist the harvesting process.

* Anchor text *
Yahoo! provides a good listing of anchor text terms via their ability
to find pages that link to a specified URL.
The syntax for Yahoo! is:

http://api.search.yahoo.com/WebSearchService/rss/webSearch.xml?appid=yahoosearchwebrss&query=link:http://jod.id.au/tutorial/naked-metadata.html


* Weblogs *
Weblog search engines like Blogdigger will show blog entries for a
given URL. These can be used as descriptions of the page.
The format for Blogdigger is:

http://www.blogdigger.com/rssLinkSearch.jsp?link=http://jod.id.au/tutorial/wild-metadata.html


* Folksonomies *
I could only find one folksonomy (del.icio.us) that had a syntax for
searching by URL.  Unfortunately, I could not find a simple way to use
this syntax.  Del.icio.us allocates a unique number to each URL.
Therefore, before you can construct a URL, you need to discover what
the URL is.

+       For example, I created the page:
        http://jod.id.au/tutorial/wild-metadata.html
+       I then tagged it in Del.icio.us.
+       I then searched for it in Del.icio.us.
+       Del.icio.us told me that this URL could be referenced in RDF format
at:
        http://del.icio.us/rss/url/e43f0f84e421ed5de166b285eca30468


** Harvesting **
It is all well and good to put metadata into a document. You have to be
able to get it out again for it to be any use.

Both Yahoo! and Del.icio.us provide their results in RSS or Atom
format.  While this makes the results machine-readable (and machine
harvestable), it doesn't make it easy for a mere mortal to read it.

I'm not sure if there are DC.metadata harvesters that can parse RSS or
Atom feeds as metadata.  The possibility exists - I just can't point to
an example.

** Advantages **
+       Wild metadata adds multiple voices to a metadata record.  For
example, wild metadata might exist in different languages.
+       Wild metadata does not cost the Web page author anything, either in
terms of time or money.

** Disadvantages **

+ New pages will not have any wild metadata. Wild metadata buildsover

time.
+       Unpopular pages will not gather much wild metadata.  Wild metadata
clusters to popular pages.

** References **
[1]     David Hawking (Panoptic, CSIRO), 30 May 2005, "Poor search
facilities cost money - is metadata the answer?" DC-ANZ 2005,
http://es.csiro.au/Presentations/Hawking_DC-ANZ.pdf (accessed 13
November 2005).

[2]     David Hawking and Justin Zobel, (Panoptic, CSIRO), Forthcoming,
"Does Topic Metadata Help with Web Search?", Journal of the American
Society for Information Science and Technology (JASIST).
--
Jonathan O'Donnell
mailto:[EMAIL PROTECTED]
http://purl.nla.gov.au/net/jod
+61 4 2575 5829

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************


________________________________________________
Steven C. Perkins, J.D., M.L.L
[EMAIL PROTECTED]
Coordinator of Reference Services
University of Houston Libraries

________________________________________________

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************

Re: [WSG] Wild metadata

Reply via email to