Re: [SMW-devel] [Semediawiki-user] RFC: Value distribution support in result formats

Jeroen De Dauw Tue, 08 Nov 2011 21:46:17 -0800

Hey,

> Can it also be used with regular formats like csv or table (resulting in
the value for the distribution to be displayed?)


Each format needs to add support for this functionality, so at the moment
you can not use value distribution with these formats. However, it's
relatively easy to add this support in. I'm not sure that adding it to all
formats is makes much sense though. For a lot of them, the usefulness seems
limited, and it's a bunch of work to do. In that case, it might be worth
reworking the query result class a little so it's possible to modify query
results (in a sane manner) before they get passed to the actual result
printer, which would also allows for other kinds of post-query processing.

> One way to do this would be to automatically give each property some
special properties, such that the property itself could be queried for its
set of unique values, and the number of times each value has been used.

Interesting, I had not thought of this. Implementing this would be
completely different then what I did though, and it'd be as you say more
powerful. If any system to handle this is created, it could probably easily
made more generic, and support all kinds of computations, not just the
occurance count of values of a property. It might even go hand in hand with
query management functionality (allows for automatic invalidation of query
caches when their source data is modified).

This will not be trivial to implement, and is out of scope of what I want
to do here. If such functionality is created, it might make the value
distribution feature a bit obsolete, but I don't see this happening soon
(unless someone throws money or devs at it). I'm curious to your ideas
about this though and have some questions:

* Where/when would this property meta data be computed? On every change of
any occurrence of the property might be quite expensive.
* Where would you defined how to compute this meta data? If possible I'd be
neat to have control over this in the wiki itself.

> although I suppose the most general solution of all would be to implement
aggregation queries.
> ..
> I guess GROUP BY and COUNT() functionality are the bits that would would
jeopardize sanity? :)

I actually discussed this at length with Yaron, and we concluded that
generic group by functionality would not be terribly useful, since it's
hard to imagine cases where you would not just want to count the
occurrences. My current implementation is pretty much equivalent to doing a
group by count I think (not sure, as I'm not that familiar with the SQL
group by statement).

> For the discussion: What about something like this:
> {{#ask: [[Category:Locations]] [[Has location type::City]]
> | ?Located in
> |?count(*)
> | group by=Located in
> | format=jqplotpie
> | mainlabel=-
> | limit=500
> }}

What would the advantage of this syntax be? I suspect It's less clear to
most users, and it's definitely harder to implement, since you'll need to
recognize ?count(*) as a special printout.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1

_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Re: [SMW-devel] [Semediawiki-user] RFC: Value distribution support in result formats

Reply via email to