On 23/02/2011 19:09, Jeroen De Dauw wrote: > Hey, > > Since this discussion is about performance and changes to the SMW > storage layer, I'd like to bring up an issue I've been having as > Semantic Maps dev. > > To efficiently do spatial operations on geographical data stored by SMW, > support for spatial extensions in MySQL and PostGis in PostGres is > needed. Currently there is no way to make use of these database > extensions, as the SMW storage layer does not allow for: > * specifying what type of index to place on a field (it only allows > specifying a field should be indexed) > * putting SQL functions in insert and select statements (which is needed > to insert or select geographical entities)
If this is indeed so special, and we see a big need for changing this, then we may need to consider tying geo support more closely into the SMW architecture. I would be interested to find out if this type of distance query is an issue in some Wikia wiki. We have a general architectural problem of implementation independence vs. performance here. We could probably make SMW run faster on MySQL if we would commit to supporting only (My)SQL backends. At the same time, MySQL is largely unsuitable for most other types of queries that we want to answer, leading to the slow queries that some wikis see (MySQL simply dies completely when queries reach a certain complexity -- AFAIKT it looses most of its optimization capabilities for queries that involve a single table many times; in particular it seems to ignore query structure in favour of table- or column-based selectivity measures that don't help at all if the same table is used many times). But to be fair, our queries are quite unusual for classical RDBMs applications. We have reasons to hope that RDF stores would provide much better performance on such queries, but these systems have a completely different data model and different capabilities. It should be appreciated that the current architecture allows such paradigm shifts in the backend to happen without code changes in most parts of SMW and in most of its extensions. Since it seems unavoidable to move to RDF stores for higher query performance, it might not be very useful to try and exploit additional MySQL optimizations now (this would only help sites which have mostly simple queries but many distance computations which are currently too slow). If one looks for a more general solution, the question then is how specific the MySQL coordinate format actually is. To keep the current flexibility of architecture, it might be necessary to make the SQLStore implementation aware of geo coordinates (this could also be done with hooks). But we must avoid to make the higher levels of the API (e.g. datavalue implementations) specific to MySQL. I think there are solutions that would meet these requirements, they just need to be designed and implemented. The main point it that higher levels should exchange data in standard formats (e.g. floating point numbers for latitude and longitude) and MySQL specific syntax (e.g. some kind of other syntactic formats for geo coords) should only be created in the storage layer. - Markus > > I'm not sure to what extend supporting this is possible, but it would > make a huge difference for working with geographical data in SMW. So I'd > be nice if this was kept into consideration when modifications to the > storage layer are made. > > In any case, the current distance query in Semantic Maps already > performs way better then the one in older versions of SMW (which was > really really ... really bad). More incentive for Wikia to update :) > > Cheers > > -- > Jeroen De Dauw > http://www.bn2vs.com > Don't panic. Don't be evil. > -- > > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > > > > _______________________________________________ > Semediawiki-devel mailing list > Semediawiki-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel