On Sonntag, 17. August 2008, Michael Dale wrote: > I do sql queries that do selects with lots of results across several > custom tables. To pull in semantic data I would add the following to > mediaWikis database abstraction query variables: > (assuming I have a bunch of mv_index_page results as part of a complex > join query and wanted property like "spoken_by", or "other properties" > that I could add conditions on I would add the following (where > 'spoken_by' is the $prop variable > > $vars[] = $prop.object_title as $prop; > $from_tables.=' ' . > LEFT JOIN 'smw_relations' > as $prop > ON (mv_mvd_index.mv_page_id= $prop . subject_id > AND > $prop . relation_title = '$prop' > > > This is a LOT more difficult to do with the new database structure.... > Any advice how I would approach this with the new db structure?
Okay, let's see. It's surely not so obvious any more, since the new DB is not optimised for joins with MW page IDs at all. The most straightforward solution (with the most complex query as a result) is to add joins with further tables as needed. smw_rels2 has a triple structure s_id, p_id, o_id, and instead of using direct property string names or MW page ids, these ids now can only be resolved using the table smw_ids. The latter realises a mapping between the store's ids and MediaWiki page names, namespaces, and possibly interwiki names. No MediaWiki page id is tracked there, so if your mv_mvd_index table only holds ids, then you will need another join with page to get from names to ids. Probably it would be faster to hold titles+namespaces in mv_mvd_index right away. This is the canonical modification to use a DB query as before, and again it depends completely on the SMW internal DB layout. It has many more joins, but things might be not too bad, given that all joins with smw_ids join on a single numeric primary key. Another option should be considered as well: if the number of objects in either table is typically very small (e.g. I am not sure how few objects you have in mv_mvd_index), then one can realise the left join in PHP, iterating through the result. This is never as fast as a direct SQL query, but it is the solution that is most robust and requires least maintenance, since one could use SMW API calls to do the individual checks, yielding complete storage abstraction. None of these options are really convincing to me. I think a better way to go would be to move the relevant data (the data that is used to build mv_mvd_index) into SMW, storing it there with the help of (predefined) SMW properties. This must work somehow, since SMW is fairly universal in its data model. The work then would be in extending the storage methods to make sure SMW contains all relevant data. After this, the query part would no longer be direct SQL, but merely an SMW query to the (current) SMW store, whatever it may be. It depends on the details of the data needed for mv_mvd_index whether or not this would work well. I have to see more details here. Finally, it would also be possible to move the construction of the above left join into the SMW storage implementation. It is possible to subclass SMW stores to extend/overwrite any of their functions, and such a new store can easily be shipped and set by extensions. This would enable the tightest integration and it is certainly the preferred way when private SQL queries are really unavoidable. It still is easier to maintain because the extension can presume a certain SMW store to be used as a basis for its customisation, and it is clear that users cannot simply change the store without breaking something. Also, minor changes in SMW's storage implementation are simply inherited by the subclassed store. For instance, one could overwrite/extend the initialisation and update methods of SMW to ensure that all data for "special" properties (like spoken_by) is duplicated into another internal table that is easier to use in the above SQL statement. See the current Halo 1.2 release for an example of subclassing another store (including error messages when an admin tries to use another unsupported store). There is yet another solution that requires work on our side: we could extend the query API of SMW stores so that they are able to put query results into a temporary DB table (returning its name), which might then be used for further joins. Optionally this table could use MW page ids or title/namespace pairs. This solution would be good as a general path for SQL-level extensions of SMW, but it might force some inefficient behaviour for some SMW stores. It also precludes some join-order-optimisations that could be possible when everything is done by a single query. Best regards, Markus > > peace, > --michael > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge Build the coolest Linux based applications with Moblin SDK & win > great prizes Grand prize is a trip for two to an Open Source event anywhere > in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Semediawiki-devel mailing list > Semediawiki-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Semantic MediaWiki http://semantic-mediawiki.org http://korrekt.org [EMAIL PROTECTED]
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel