Tried a couple of times to rewrite this, but it grows out of bound anyhow. Seems like it has its own life.
There is a book from 2000 by Robert Dale and Ehud Reiter; Building natural language generation systems ISBN 978-0-521-02451-8 Wikibase items can be rebuilt as Plans from the type statement (top-down) or as Constituents from the other statements (bottom-up). The two models does not necessarily agree. This is although only the overall document structure, and organizing of the data, and it leaves out the really hard part – the language specific realization. You can probably redefine Plans and Constituents as entities, I have toyed around with them as Lua classes, and put them into Wikidata. The easiest way to reuse them locally would be to use a lookup structure for fully or partly canned text, and define rules for agreement and inflection as part of these texts. Piecing together canned text is hard, but easier than building full prose from the bottom. It is possible to define a very low-level realization for some languages, but that is a lot harder. The idea for lookup of canned text is to use the text that covers most of the available statements, but still such that most of the remaining statements can also be covered. That is some kind of canned text might not support a specific agreement rule, thus some other canned text can not reference it and less coverage is achieved. For example the direction to the sea can not be expressed in a canned text for Finnish and then the distance can not reference the direction. To get around this I prioritized Plans and Constituents, with those having higher priority being put first. What a person is known for should go in front of his other work. I ordered the Plans and Constituents chronologically to maintain causality. This can also be called sorting. Priority tend to influence plans, and order influence constituents. Then there are grouping, which keeps some statements together. Length, width, height are typically a group. A lake can be described with individual canned text for length, width, and height, but those are given low priority. Then it an be made a canned text for length and height, with somewhat higher priority. An even higher priority can be given to a canned text for all three. Given that all three statements are available then the composite canned text for all of them will be used. If only some of them exist then a lower priority canned text will be used. Note that the book use "canned text" a little different. Also note that the canned texts can be translated as ordinary message strings. They can also be defined as a kind of entities in Wikidata. As ordinary message strings they need additional data, but that comes naturally as entities in Wikidata. My drodling put it inside each Wikipedia, as it would be easier to reuse from Lua-modules. (And yes, you can then override part of the ArticlePlaceholder to show the text at the special page.) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l