Re: [SMW-devel] RFC: Value distribution support in result formats

Michael Erdmann Tue, 29 Nov 2011 09:09:02 -0800

Hi Jeroen,

we had a similar request like this already a couple of times. Although Ithink that having

    distribution=on

(which actually is a count for groups) is often not sufficient, e.g.instead of the pure number users want the percentage (compared to allvalues), i.e. 42,8% instead of 3 (out of 7) occurrences. Otheraggregates naturally come to mind as well, but are more complicatedsince they aggregate actual values not purely the number of "rows".

Nevertheless, the simple solution you propose, IMHO, should be realizedin a way that /no custom coding is required for query printers. /If I amnot mistaken, the JQPlot-result formats expect a label and a numericvalue for displaying the charts. If your Distributable code would modifythe query results to contain only the labels and the numbers than theycould be rendered also by all other result formats and no additionalcode would be needed. The aggregation would be a kind of post processingof the query results, before they are passed to the result printersturning a one column query with n lines and m values into a two columnquery with m lines.


Would this be sth. your code could support?

thx,
  michael

On 08.11.2011 15:08, Jeroen De Dauw wrote:

Hey all,
I have implemented general support for value distributions in resultformats in SMW. This email explains this feature and is meant togather feedback on it before SMW 1.7 is released.
== Goal ==
Allow visualizing how many times each value in a result occurs, ieallow for creating value distributions.
For example, this result set: foo bar baz foo bar bar ohi
Will be turned into
* bar (3)
* foo (2)
* baz (1)
* ohi (1)
This can then be displayed in chart formats, with the value as labeland the occurrence count as value. Although the most obvious use forthis are charts, it can really be used with any format.
== Current implementation: how to use it ==
Each format needs to add support for this functionality before you'llbe able to use it to visualize value distributions. Right now onlyjqplotbar and jqplotpie make use of it. All formats that support thisfunctionality accept 3 additional parameters:
* distribution (on/off) - if a value distribution should be calculatedand shown instead of the regular results.* distributionsort (asc/desc/none) - the sort of the values, byoccurance count.* distributionlimit (positive whole number) - the max amount of valuesto visualize.
This example will get the countries the matching cities are locatedin, count the occurance of each, and display this as a pie chart. Notethe use of the mainlabel parameter. If this is not done, the citiesthemselves will also be put into the value distribution.
{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| mainlabel=-
| limit=500
}}
This example will do the same query, but will only show the 10countries with most matching cities, in descending order.
{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| distributionsort=desc
| distributionlimit=10
| mainlabel=-
| limit=500
}}
You can see these examples and 2 others working on the mappingdocumentation wiki, making use of the example semantic data there:http://mapping.referata.com/wiki/Value_distribution_examples
== Implementation details (technical) ==
After looking into several options I decided to implement this as aresult printer class deriving from SMWResultPrinter, requiring changesto each format that wants to support this behaviour, but making thisrelatively easy. This approach seems like a good balance betweenmaking this functionality available as easy as possible and staying sane.
This class is called SMWDistributablePrinter and can be found here:http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup
Example jqplotpie implementation:http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup
== Request for comments ==
Feedback is welcome. The main question for users is what names theparameters should use. Right now they all start with "distribution",but there might be a better (and shorter) name. From developers I'dlike to know if you agree with this architecture.
Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--



--
Semantic Enterprise Wiki - SMW+ / Halo Extension
Want to get involved? http://smwforum.ontoprise.com/development
--
 email: erdm...@ontoprise.de             Dr. Michael Erdmann
   tel: +49 / 163 / 509 8029             http://www.ontoprise.com
Managing Directors: Prof. Dr. Jürgen Angele, Hans-Peter Schnurr
Register court: Mannheim | Register number: HRB 109540 | Sales-Tax-ID: 
DE-201-761-257
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d

_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Re: [SMW-devel] RFC: Value distribution support in result formats

Reply via email to