Tried a couple of times to rewrite this, but it grows out of bound
anyhow. Seems like it has its own life.

There is a book from 2000 by Robert Dale and Ehud Reiter; Building
natural language generation systems  ISBN 978-0-521-02451-8

Wikibase items can be rebuilt as Plans from the type statement
(top-down) or as Constituents from the other statements (bottom-up).
The two models does not necessarily agree. This is although only the
overall document structure, and organizing of the data, and it leaves
out the really hard part – the language specific realization.

You can probably redefine Plans and Constituents as entities, I have
toyed around with them as Lua classes, and put them into Wikidata. The
easiest way to reuse them locally would be to use a lookup structure
for fully or partly canned text, and define rules for agreement and
inflection as part of these texts. Piecing together canned text is
hard, but easier than building full prose from the bottom. It is
possible to define a very low-level realization for some languages,
but that is a lot harder.

The idea for lookup of canned text is to use the text that covers most
of the available statements, but still such that most of the remaining
statements can also be covered. That is some kind of canned text might
not support a specific agreement rule, thus some other canned text can
not reference it and less coverage is achieved. For example the
direction to the sea can not be expressed in a canned text for Finnish
and then the distance can not reference the direction.

To get around this I prioritized Plans and Constituents, with those
having higher priority being put first. What a person is known for
should go in front of his other work. I ordered the Plans and
Constituents chronologically to maintain causality. This can also be
called sorting. Priority tend to influence plans, and order influence
constituents. Then there are grouping, which keeps some statements
together.  Length, width, height are typically a group.

A lake can be described with individual canned text for length, width,
and height, but those are given low priority. Then it an be made a
canned text for length and height, with somewhat higher priority. An
even higher priority can be given to a canned text for all three.
Given that all three statements are available then the composite
canned text for all of them will be used. If only some of them exist
then a lower priority canned text will be used.

Note that the book use "canned text" a little different.

Also note that the canned texts can be translated as ordinary message
strings. They can also be defined as a kind of entities in Wikidata.
As ordinary message strings they need additional data, but that comes
naturally as entities in Wikidata. My drodling put it inside each
Wikipedia, as it would be easier to reuse from Lua-modules. (And yes,
you can then override part of the ArticlePlaceholder to show the text
at the special page.)

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to