Re: [SMW-devel] Triple store integration - how can we contribute?

Markus Krötzsch Fri, 22 Apr 2011 03:45:54 -0700

On 31/03/11 22:00, Yury Katkov wrote:
> Hello Markus!
>
> We are ready to help you with issues you mentioned.


Hi Yury,

it took me some more time to get some basic internal changes done which 
did not really allow any parallel work. I have now finished 90% of the 
required architectural changes in SMW, and implemented fundamental 
SPARQL database binding mechanisms. Below are more specific answers o 
your questions -- I will send another mail to this list about the new 
SPARQL store support.

>
> Here are our questions:
> 1) When do you plan to finish this work? Here by ‘finish’ we mean
> beta-version, not totally stable but acceptable for research topics.

SMW now has complete data synchronisation with arbitrary SPARQL/SPARQL 
Update capable stores. Some features are still shaky or disabled due to 
the major changes I did in the architecture. In particular, Type:Record 
is currently disabled, and various SMW extensions may not be compatible 
with the code.

Still missing is the use of SPARQL instead of SQL for executing #ask. I 
plan to provide this until the end of next week.

> 2) Where it is possible to see the current progress of triple store
> support, in which svn-branch all the development is doing?

Everything is in our main SVN trunk.

>
> 3) What do you think about future support of bi-directional interaction
> between SMW and triple storage?

The new code already does this, but with SMW data. The architecture 
provides a lot of new components to support this. In particular, there 
is now a generic SMWSparqlDatabase class that provides a low-level 
communication layer with SPARQL services.

> Here we are primarily interested in third-party updates of SMW data.
>   For example, one can add a new wikipage with certain properties. The
> inference engine that runs on a triple store will classify it and add
> the classification result in a store. It would be great if wiki could
> act accordingly: for example the new category would be added on a wikipage.
> The question is: do you plan such things at least in a future? Or is it
> an extension developers work to do the described connection?

The extension RDFIO attempts to enable part of this functionality, of 
course without referring to a SPARQL store. The challenge is to turn 
data into something that can be displayed on wiki pages. It should not 
be too hard to use graphs to manage additional data that is not coming 
from wiki pages. This could also be used in queries, I guess. But 
triggering automated page edits based on changes in the SPARQL store is 
not easy. To start with, SPARQL does not provide any protocol for 
monitoring changes, so the wiki would not even notice if some category 
is added there. A method similar to RDFIO (inserting data via the wiki) 
seems more adequate.

>
> And several question about the current work:
>
> ===1 ===
>  > "Could we use FILTER on URI strings to find URIs that begin with a
> certain prefix, or is this too slow?"
> Have I understood this correctly: you need to measure performance of the
> SPARQL query with regex clause in FILTER section [1]? Such queries will
> be in approximately following form:
> SELECT ?a
> WHERE
> { ?a rdf:type <some-URI-for-wikipage>
> FILTER regex( ?a, “foaf”, i )
> }

Yes, this was my question, but I now think that this is not a good solution.

>
> ===2===
>  > Which parts of SPARQL UPDATE are currently (not) supported reliably?
> Do we need do find/invent the test suite for this part of the working
> draft [2] and report about how many features of this draft is currently
> supported in 4Store?

All required operations work properly on 4Store, and indeed they are not 
so complex. So this seems to be no major concern. A major problem with 
4store might be that it consumes high amounts of CPU when executing 
updates of any form -- I hope that this will be fixed soon.

>
> ===3===
>  > Is it generally best to do SPARQL DELETE INSERT for updates, or does
> the store require us to decompose this into multiple queries? Are there
> performance differences?
> Here if we understand correctly you ask us to compare DELETE/INSERT[3]
> query with first DELETE and then INSERT by measuring their performance
> on some set of test samples.

Yes, such types of optimizations could be more interesting now, since 
the code now generates real queries to look at. So the question is less 
hypothetical.

Yet, I think that other changes can lead to more significant performance 
gains. For one thing, we could reduce the amount of write queries by 
first checking whether anything changed at all. Moreover, the 4Store 
issue with update CPU load currently dominates overall update 
performance. This seems to occur for all queries (delete or insert) that 
change the stored data.

I will next write a separate email about the RDF store support status 
and possibilities to work with it.

Markus

>
>
>
> [1] http://www.w3.org/TR/rdf-sparql-query/#funcex-regex
> [2] http://www.w3.org/TR/sparql11-update/#t41
> [3] http://www.w3.org/TR/sparql11-update/#t413
>
> On Mon, Mar 28, 2011 at 10:05 AM, Markus Krötzsch
> <mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>
> wrote:
>  >
>  > On 27/03/11 20:52, Laurent Alquier wrote:
>  >>
>  >> Hi Markus
>  >>
>  >> Does this mean it will be possible to consider declaring triples outside
>  >> of a subject page ?
>  >
>  > No, this is not part of the initial design. But the changes will make
> it easier (and more natural) to add this capability later on (maybe
> using named graphs).
>  >
>  > The initial goal of the RDF store project is to improve query
> performance by taking advantage of optimised SPARQL implementations of
> RDF stores.
>  >
>  > Markus
>  >
>  >
>  >> On Sun, Mar 27, 2011 at 2:22 PM, Markus Krötzsch
>  >> <mar...@semantic-mediawiki.org
> <mailto:mar...@semantic-mediawiki.org>
> <mailto:mar...@semantic-mediawiki.org
> <mailto:mar...@semantic-mediawiki.org>>>
>  >> wrote:
>  >>
>  >>    Hi Yuri,
>  >>
>  >>    thank you for your interest. Contributions are very welcome! We
> plan to
>  >>    use an RDF store as a backend for storing and querying data. This
> will
>  >>    not completely abolish the use of MySQL tables, but at least all
> queries
>  >>    should be answered by the triple store (no more SQL-based #ask).
> We will
>  >>    use SPARQL and SPARQL Update for all communication. SPARQL queries to
>  >>    external endpoints are not planned for the initial phase but
> adding them
>  >>    will be much easier after all the SPARQL communication methods are
>  >>    in SMW.
>  >>
>  >>    I am currently changing the SMW code in various places to make it
>  >>    compatible with such a setup. The first RDF store I will consider is
>  >>    4Store.
>  >>
>  >>
>  >>    How could you contribute to speed this up/make the features more
>  >>    complete?
>  >>
>  >>    Right now, I could mainly use support with figuring out the best
> way of
>  >>    formulating queries for achieving certain effects, both with
> 4Store and
>  >>    with Virtuoso (and any other store people want to use). Example
>  >>    questions of this type (to be answered for each store that we care
>  >>    about):
>  >>
>  >>    * "Could we use FILTER on URI strings to find URIs that begin with a
>  >>    certain prefix, or is this too slow?"
>  >>    * "Which parts of SPARQL UPDATE are currently (not) supported
> reliably?"
>  >>    * "Is it generally best to do SPARQL DELETE INSERT for updates,
> or does
>  >>    the store require us to decompose this into multiple queries? Are
> there
>  >>    performance differences?"
>  >>    * "What is the best way to implement counting queries on a given
> store?"
>  >>    (SPARQL 1.1 aggregates are still very preliminary and some stores
> have
>  >>    custom solutions)
>  >>
>  >>    I have more of these questions, and they generally need some
> testing on
>  >>    the real store, so this would be a place where contributors could
> help
>  >>    (also since there are many different stores people might care about).
>  >>
>  >>    There will be some more tasks that can be done as soon as I completed
>  >>    some more work on the SMW base architecture:
>  >>
>  >>    * We would like to support Virtuoso and maybe other stores as well.
>  >>    After 4store works, this should be a rather independent task to
> try and
>  >>    get it to run with other stores (each store will need some amount of
>  >>    special handling or optimisation, this can be prepared by
> answering the
>  >>    above SPARQL support questions).
>  >>
>  >>    * Testing. The more early testers we get, the better for the
> stability
>  >>    of the code.
>  >>
>  >>
>  >>    Regards,
>  >>
>  >>    Markus
>  >>
>  >>
>  >>    On 26/03/11 13:06, Yury Katkov wrote:
>  >> > Hello everyone!
>  >> >
>  >> > We have compared the current solutions for the triple store
>  >> > integration [1] and found that all those solutions are either
>  >> > incomplete or use very hard patches of the SMW core.
>  >> > Recently Markus mentioned that the next version of SMW will be better
>  >> > integrated with RDF store. Is it possible to get some details about
>  >> > these planned integration features? We want to contribute to  this
>  >> > work by solving as many related tasks as possible. Such features
>  >>    would
>  >> > be very useful for our installation of SMW and I believe for the
>  >>    whole
>  >> > SMW project.
>  >> >
>  >> > Sincerely yours,
>  >> > Yury Katkov
>  >> >
>  >> > [1]
>  >> http://www.semantic-mediawiki.org/wiki/SPARQL_and_RDF_stores_for_SMW
>  >> >
>  >> >
>  >>
>   
> ------------------------------------------------------------------------------
>  >> > Enable your software for Intel(R) Active Management Technology to
>  >>    meet the
>  >> > growing manageability and security demands of your customers.
>  >>    Businesses
>  >> > are taking advantage of Intel(R) vPro (TM) technology - will your
>  >>    software
>  >> > be a part of the solution? Download the Intel(R) Manageability
>  >>    Checker
>  >> > today! http://p.sf.net/sfu/intel-dev2devmar
>  >> > _______________________________________________
>  >> > Semediawiki-devel mailing list
>  >> > Semediawiki-devel@lists.sourceforge.net
> <mailto:Semediawiki-devel@lists.sourceforge.net>
>  >> <mailto:Semediawiki-devel@lists.sourceforge.net
> <mailto:Semediawiki-devel@lists.sourceforge.net>>
>  >> > https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>  >> >
>  >>
>  >>
>  >>
>   
> ------------------------------------------------------------------------------
>  >>    Enable your software for Intel(R) Active Management Technology to
>  >>    meet the
>  >>    growing manageability and security demands of your customers.
> Businesses
>  >>    are taking advantage of Intel(R) vPro (TM) technology - will your
>  >>    software
>  >>    be a part of the solution? Download the Intel(R) Manageability
> Checker
>  >>    today! http://p.sf.net/sfu/intel-dev2devmar
>  >>    _______________________________________________
>  >>    Semediawiki-devel mailing list
>  >> Semediawiki-devel@lists.sourceforge.net
> <mailto:Semediawiki-devel@lists.sourceforge.net>
>  >> <mailto:Semediawiki-devel@lists.sourceforge.net
> <mailto:Semediawiki-devel@lists.sourceforge.net>>
>  >> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>  >>
>  >>
>  >>
>  >>
>  >> --
>  >> - Laurent Alquier
>  >> http://www.linfa.net
>  >
>
>
>
> --
> Yury V. Katkov
> Laboratory of intelligent systems
> of the Saint-Petersburg National University of Information Technologies,
> Mechanics and Optics, Russia
> http://ailab.ifmo.ru


------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been 
demonstrated beyond question. Learn why your peers are replacing JEE 
containers with lightweight application servers - and what you can gain 
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Re: [SMW-devel] Triple store integration - how can we contribute?

Reply via email to