For a while I've noticed that your messages don't show up properly in
Windows Live Mail.
I have many circumstances that are pushing me towards gmail, but you should
correct this because you can assume that you hear maybe 1% of the time when
people have a problem.
-Original Message
Just as a suggestion, you can turn these kind of numbers into a probability
distribution using the beta distribution. If you use (1,1) as a prior you
get something like beta(251,1) for the the probability of the probability
that somebody named "Aaron" is male.
-Original Message-
Fro
I’d say that WikiData is almost implied by the fundamental flaw of DBpedia;
since DBpedia is based on parsing inexact and varied markup, there is a lot
of complexity in getting the accuracy high, particularly in the problem that
it’s hard to interact with Wikipedia with an automated syste
I’d like to see assertions of the sort
“Picture B represents topic X”
in commons. One can easily infer this for some pictures by noticing that
“Picture B is included in the encyclopedia entry for topic X”, but often there
are so many pictures of the topic that they aren’t all included in the
My feelings are strong towards one-line-per-fact.
Large RDF data sets have validity problems, and the difficulty of
convincing publishers that this matters indicates that this situation will
continue.
I’ve thought a bit about the problem of the “streaming converter from
Turtle to
My feelings are strong towards one-line-per-fact.
Large RDF data sets have validity problems, and the difficulty of
convincing publishers that this matters indicates that this situation will
continue.
I’ve thought a bit about the problem of the “streaming converter from
Turtle to
Over time people have gotten the message that you shouldn't write XML
like
System.out.println(""+someString+"")
because it is something that usually ends in tears.
Although (most) RDF toolkits are like XML toolkits in that they choke on
invalid data, people who write RDF seem to
Here is my 2 cents.
I have paid my dues writing CRUD apps for business. They all want the same
thing, something that keeps track of entities and controls how the
organization interacts with those entities.
In one year, for instance, I worked on systems for an academic department
and a lo
I would say that GND is a “good enough” answer.
Most named entities are persons, organizations, events, creative works and
places and these are all mutually exclusive. There ought to be a system
interlock to prevent confusion between them.
“Organism Classification” or whatever you
You’ll do better dealing with bad coordinates if your system can recognize
how bad particular cases are.
The worst error I see in Wikipedia is that sometimes people get east and
west confused, so there is this mirror image of Europe reflected across the
U.K. You find cute little Cze
I think Poland may do better than average because Polish people, out of
national pride, have made a special effort to be well documented in English
Wikipedia and represent a Polish point-of-view on topics like the city of
Gdansk.
One fascinating thing about Wikidata is that it provides
I took a look at that and was concerned about the plan to release a future
data dump in the RDF/XML format.
Most people these days think of RDF/XML as obsolete and the future is in the
Turtle family of languages. RDF/XML has various problems: it can't express
all legal RDF statements and it
Statistical methods can deal with black swans, but you've got to get
away from normal distributions and also model the risk that your model is
wrong.
Since training sets come from the same place sausage comes from,
training sets in machine learning rarely teach the algorithm the correct
From my viewpoint, biases are an issue of statistical sampling.
Wikipedia is an encyclopedia by humans for humans so of course it has a
anthropocentric background, in which the mass of all the concepts swirling
around the Earth like an atmosphere curves the graph, keeping the Sun in
o
On 6/20/2012 6:39 AM, Lydia Pintscher wrote:
> Heya folks :)
>
> We've published a version of the data model that is less technical and
> that should be easier to understand than the very detailed existing
> one. You can find it at
> http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer
>
Hey guys,
I think for a long time semantic projects have focused on
getting data out but haven't incorporated the 'voice of the consumer';
yet, if you think of data quality as 'suitable for customer
requirements' instead of 'process conforms to specification,' this is
the first st
16 matches
Mail list logo