On 7/14/2011 4:48 AM, John Breslin wrote:
Have a read of this conversation on Google+:
https://plus.google.com/u/0/111657459034372773496/posts/ZvS2KrBx8iF
And see what you think...
John
I think first of all you've got to separate "SIOC as a data model"
vs "web sites self-publishing SIOC". SIOC, FOAF and all that are data
models that have value even if you use them internally or use them to
make assertions about third-party web sites. (I've always been one to
move the mountain to Mohamed rather than the other way around.)
Perhaps I've gone down the rabbit hole, but I've spent the last
few months with an NER system that threatens to get 200% recall; yes
it's got problems with precision but I'm convinced that those can be
helped by making the system more aggressive (trying to resolve more
possible interpretations) rather than less. I'm bullish about the next
10 years for IX.
"Self-publishing X" where X is any "semantic" data format (SIOC,
schema.org, ...) competes with intelligent systems that can infer
similar information just by looking at the text. 10 years from now, al
Schema.org particularly puzzles me because I can't believe it's not
already obsolete from Google's perspective. I mean, look at this example:
<div itemscope itemtype="http://schema.org/LocalBusiness">
<h1><span itemprop="name">Beachwalk Beachwear & Giftware</span></h1>
<span itemprop="description"> A superb collection of fine gifts and clothing
to accent your stay in Mexico Beach.</span>
<div itemprop="address" itemscope
itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">3102 Highway 98</span>
<span itemprop="addressLocality">Mexico Beach</span>,
<span itemprop="addressRegion">FL</span>
</div>
Phone: <span itemprop="telephone">850-648-4200</span>
</div>
Maybe Bing actually expects to use this as something other than a
training set, but I'd be pretty shocked if Google didn't already have a
system that could pick out
3102 Highway 98
Mexico Beach, FL
out of free text, segment it into parts, geocode it, and check
it against a database of small businesses. (Remember that a database of
all the streets in the US fits on an SD card.) And really, it's so hard
to parse
Phone: 850-648-4200
Schema.org masquerades as a "semantic data format" but it's really
a way to get a big training set so ultimately an IX system can get all
that stuff without the training wheels.
Don't compete fairly in two-sided markets. The formula for success
is to do something insanely unfair to knock out one side of the market
and get control of the other side before anyone knows what's going on.
--
You received this message because you are subscribed to the Google Groups
"SIOC-Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sioc-dev?hl=en.