Hi Jeroen,
we had a similar request like this already a couple of times. Although I
think that having
distribution=on
(which actually is a count for groups) is often not sufficient, e.g.
instead of the pure number users want the percentage (compared to all
values), i.e. 42,8% instead of 3 (out of 7) occurrences. Other
aggregates naturally come to mind as well, but are more complicated
since they aggregate actual values not purely the number of "rows".
Nevertheless, the simple solution you propose, IMHO, should be realized
in a way that /no custom coding is required for query printers. /If I am
not mistaken, the JQPlot-result formats expect a label and a numeric
value for displaying the charts. If your Distributable code would modify
the query results to contain only the labels and the numbers than they
could be rendered also by all other result formats and no additional
code would be needed. The aggregation would be a kind of post processing
of the query results, before they are passed to the result printers
turning a one column query with n lines and m values into a two column
query with m lines.
Would this be sth. your code could support?
thx,
michael
On 08.11.2011 15:08, Jeroen De Dauw wrote:
Hey all,
I have implemented general support for value distributions in result
formats in SMW. This email explains this feature and is meant to
gather feedback on it before SMW 1.7 is released.
== Goal ==
Allow visualizing how many times each value in a result occurs, ie
allow for creating value distributions.
For example, this result set: foo bar baz foo bar bar ohi
Will be turned into
* bar (3)
* foo (2)
* baz (1)
* ohi (1)
This can then be displayed in chart formats, with the value as label
and the occurrence count as value. Although the most obvious use for
this are charts, it can really be used with any format.
== Current implementation: how to use it ==
Each format needs to add support for this functionality before you'll
be able to use it to visualize value distributions. Right now only
jqplotbar and jqplotpie make use of it. All formats that support this
functionality accept 3 additional parameters:
* distribution (on/off) - if a value distribution should be calculated
and shown instead of the regular results.
* distributionsort (asc/desc/none) - the sort of the values, by
occurance count.
* distributionlimit (positive whole number) - the max amount of values
to visualize.
This example will get the countries the matching cities are located
in, count the occurance of each, and display this as a pie chart. Note
the use of the mainlabel parameter. If this is not done, the cities
themselves will also be put into the value distribution.
{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| mainlabel=-
| limit=500
}}
This example will do the same query, but will only show the 10
countries with most matching cities, in descending order.
{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| distributionsort=desc
| distributionlimit=10
| mainlabel=-
| limit=500
}}
You can see these examples and 2 others working on the mapping
documentation wiki, making use of the example semantic data there:
http://mapping.referata.com/wiki/Value_distribution_examples
== Implementation details (technical) ==
After looking into several options I decided to implement this as a
result printer class deriving from SMWResultPrinter, requiring changes
to each format that wants to support this behaviour, but making this
relatively easy. This approach seems like a good balance between
making this functionality available as easy as possible and staying sane.
This class is called SMWDistributablePrinter and can be found here:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup
Example jqplotpie implementation:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup
== Request for comments ==
Feedback is welcome. The main question for users is what names the
parameters should use. Right now they all start with "distribution",
but there might be a better (and shorter) name. From developers I'd
like to know if you agree with this architecture.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
--
Semantic Enterprise Wiki - SMW+ / Halo Extension
Want to get involved? http://smwforum.ontoprise.com/development
--
email: erdm...@ontoprise.de Dr. Michael Erdmann
tel: +49 / 163 / 509 8029 http://www.ontoprise.com
Managing Directors: Prof. Dr. Jürgen Angele, Hans-Peter Schnurr
Register court: Mannheim | Register number: HRB 109540 | Sales-Tax-ID:
DE-201-761-257
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient (or have received this e-mail in error) please
notify the sender immediately and destroy this e-mail. Any unauthorized
copying, disclosure or distribution of the material in this e-mail is strictly
forbidden.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel