On 23/02/2011 19:09, Jeroen De Dauw wrote:
> Hey,
>
> Since this discussion is about performance and changes to the SMW
> storage layer, I'd like to bring up an issue I've been having as
> Semantic Maps dev.
>
> To efficiently do spatial operations on geographical data stored by SMW,
> support for spatial extensions in MySQL and PostGis in PostGres is
> needed. Currently there is no way to make use of these database
> extensions, as the SMW storage layer does not allow for:
> * specifying what type of index to place on a field (it only allows
> specifying a field should be indexed)
> * putting SQL functions in insert and select statements (which is needed
> to insert or select geographical entities)

If this is indeed so special, and we see a big need for changing this, 
then we may need to consider tying geo support more closely into the SMW 
architecture. I would be interested to find out if this type of distance 
query is an issue in some Wikia wiki.

We have a general architectural problem of implementation independence 
vs. performance here. We could probably make SMW run faster on MySQL if 
we would commit to supporting only (My)SQL backends. At the same time, 
MySQL is largely unsuitable for most other types of queries that we want 
to answer, leading to the slow queries that some wikis see (MySQL simply 
dies completely when queries reach a certain complexity -- AFAIKT it 
looses most of its optimization capabilities for queries that involve a 
single table many times; in particular it seems to ignore query 
structure in favour of table- or column-based selectivity measures that 
don't help at all if the same table is used many times). But to be fair, 
our queries are quite unusual for classical RDBMs applications.

We have reasons to hope that RDF stores would provide much better 
performance on such queries, but these systems have a completely 
different data model and different capabilities. It should be 
appreciated that the current architecture allows such paradigm shifts in 
the backend to happen without code changes in most parts of SMW and in 
most of its extensions. Since it seems unavoidable to move to RDF stores 
for higher query performance, it might not be very useful to try and 
exploit additional MySQL optimizations now (this would only help sites 
which have mostly simple queries but many distance computations which 
are currently too slow).

If one looks for a more general solution, the question then is how 
specific the MySQL coordinate format actually is. To keep the current 
flexibility of architecture, it might be necessary to make the SQLStore 
implementation aware of geo coordinates (this could also be done with 
hooks). But we must avoid to make the higher levels of the API (e.g. 
datavalue implementations) specific to MySQL. I think there are 
solutions that would meet these requirements, they just need to be 
designed and implemented. The main point it that higher levels should 
exchange data in standard formats (e.g. floating point numbers for 
latitude and longitude) and MySQL specific syntax (e.g. some kind of 
other syntactic formats for geo coords) should only be created in the 
storage layer.

- Markus


>
> I'm not sure to what extend supporting this is possible, but it would
> make a huge difference for working with geographical data in SMW. So I'd
> be nice if this was kept into consideration when modifications to the
> storage layer are made.
>
> In any case, the current distance query in Semantic Maps already
> performs way better then the one in older versions of SMW (which was
> really really ... really bad). More incentive for Wikia to update :)
>
> Cheers
>
> --
> Jeroen De Dauw
> http://www.bn2vs.com
> Don't panic. Don't be evil.
> --
>
>
>
> ------------------------------------------------------------------------------
> Free Software Download: Index, Search&  Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
> generated by your applications, servers and devices whether physical, virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
>
>
>
> _______________________________________________
> Semediawiki-devel mailing list
> Semediawiki-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to