On 7/14/2011 4:48 AM, John Breslin wrote:
Have a read of this conversation on Google+:

https://plus.google.com/u/0/111657459034372773496/posts/ZvS2KrBx8iF

And see what you think...

John

I think first of all you've got to separate "SIOC as a data model" vs "web sites self-publishing SIOC". SIOC, FOAF and all that are data models that have value even if you use them internally or use them to make assertions about third-party web sites. (I've always been one to move the mountain to Mohamed rather than the other way around.)

Perhaps I've gone down the rabbit hole, but I've spent the last few months with an NER system that threatens to get 200% recall; yes it's got problems with precision but I'm convinced that those can be helped by making the system more aggressive (trying to resolve more possible interpretations) rather than less. I'm bullish about the next 10 years for IX.

"Self-publishing X" where X is any "semantic" data format (SIOC, schema.org, ...) competes with intelligent systems that can infer similar information just by looking at the text. 10 years from now, al Schema.org particularly puzzles me because I can't believe it's not already obsolete from Google's perspective. I mean, look at this example:

<div itemscope itemtype="http://schema.org/LocalBusiness";>
<h1><span itemprop="name">Beachwalk Beachwear & Giftware</span></h1>
<span itemprop="description"> A superb collection of fine gifts and clothing
  to accent your stay in Mexico Beach.</span>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress";>
<span itemprop="streetAddress">3102 Highway 98</span>
<span itemprop="addressLocality">Mexico Beach</span>,
<span itemprop="addressRegion">FL</span>
</div>
  Phone: <span itemprop="telephone">850-648-4200</span>
</div>

Maybe Bing actually expects to use this as something other than a training set, but I'd be pretty shocked if Google didn't already have a system that could pick out

3102 Highway 98
Mexico Beach, FL

out of free text, segment it into parts, geocode it, and check it against a database of small businesses. (Remember that a database of all the streets in the US fits on an SD card.) And really, it's so hard to parse

Phone: 850-648-4200

Schema.org masquerades as a "semantic data format" but it's really a way to get a big training set so ultimately an IX system can get all that stuff without the training wheels.

Don't compete fairly in two-sided markets. The formula for success is to do something insanely unfair to knock out one side of the market and get control of the other side before anyone knows what's going on.

--
You received this message because you are subscribed to the Google Groups 
"SIOC-Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sioc-dev?hl=en.

Reply via email to