Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-12-03 Thread Hugh Williams
Jörn,

> On 30 Nov 2015, at 15:47, Jörn Hees  wrote:
> 
> Hi,
> 
> any ideas on this?
> 
>> On 23 Nov 2015, at 04:01, Jörn Hees  wrote:
>> 
 
 Reading the doc page, i have two remaining questions:
 
 After a normal `rdf_loader_run()`, would a 
 `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
 full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in 
 those cases and will otherwise never arrive at a complete free-text index 
 (not even after the background tasks finished?)?
>>> 
>>> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so 
>>> you can wait for it to run or run it manually itself.
>> 
>> 
>> Yes, that's what i understood, but in the 
>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>>  there's this remark:
>> 
>>> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
>>> applications (e.g. those that import billions of triples) may set off 
>>> triggers. This will make free-text index data incomplete. Calling procedure 
>>> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items 
>>> by dropping and re-inserting every existing free-text index rule.
>> 
>> So i'm asking: is `rdf_loader_run();` one of those "applications" which 
>> deactivate some triggers leading to an incomplete full-text index and need 
>> `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?
> 
> Or is it enough to wait or call `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?

[Hugh] calling `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` is all that is required 
such that an new data get re-indexed as is called in the scheduler event which 
fires every minute to keep the index up to date …

Regards
Hugh

> 
> 
> Cheers,
> 
> Jörn
> 
> 
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
> ___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users



smime.p7s
Description: S/MIME cryptographic signature
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-30 Thread Jörn Hees
Hi,

any ideas on this?

> On 23 Nov 2015, at 04:01, Jörn Hees  wrote:
> 
>>> 
>>> Reading the doc page, i have two remaining questions:
>>> 
>>> After a normal `rdf_loader_run()`, would a 
>>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
>>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in 
>>> those cases and will otherwise never arrive at a complete free-text index 
>>> (not even after the background tasks finished?)?
>> 
>> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
>> can wait for it to run or run it manually itself.
> 
> 
> Yes, that's what i understood, but in the 
> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>  there's this remark:
> 
>> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
>> applications (e.g. those that import billions of triples) may set off 
>> triggers. This will make free-text index data incomplete. Calling procedure 
>> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items 
>> by dropping and re-inserting every existing free-text index rule.
> 
> So i'm asking: is `rdf_loader_run();` one of those "applications" which 
> deactivate some triggers leading to an incomplete full-text index and need 
> `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?

Or is it enough to wait or call `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?


Cheers,

Jörn


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-22 Thread Hugh Williams
Jörn,

> On 23 Nov 2015, at 01:46, Jörn Hees  wrote:
> 
> Hi Hugh,
> 
> thanks again for your replies...
> 
>> On 22 Nov 2015, at 01:46, Hugh Williams  wrote:
>> 
>>> What puzzles me is that after import and several checkpoints and restarts, 
>>> just leaving the DB idle without any queries (see below) it seems to become 
>>> busy.
>>> I guess it does some kind of "re-organization" and i'd mostly like to find 
>>> out how i can tell it "do it now, take all resources you want, don't care 
>>> if anyone is waiting, admin override, full speed ;)".
>>> That would allow me to then have that static state of the DB which i can 
>>> back-up and replay if things go wrong or someone wants an old version, 
>>> leaving us with "ready to use" backups, and not such that first start some 
>>> lengthy "re-organization after mass import".
>>> 
>>> The mentioned "re-organization state" now seems to be over after leaving 
>>> the DB switched on and idle for the last couple of days.
>> 
>> [Hugh] Does your database have Full Text indexing enabled which would is  a 
>> scheduled background task that would take time to complete on a newly loaded 
>> large database like yours, see:
>> 
>>  
>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
> 
> I really think that this could be it, as by default there seems to be an 
> "all" index.

[Hugh] If you installed the Virtuoso Faceted Browser then the FT index would be 
enabled and run as a scheduled job.
> 
> Reading the doc page, i have two remaining questions:
> 
> After a normal `rdf_loader_run()`, would a 
> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
> cases and will otherwise never arrive at a complete free-text index (not even 
> after the background tasks finished?)?

[Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
can wait for it to run or run it manually itself.

> If i have to, a mention of this around 
> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod
>  would be nice.
> 
> I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the 
> DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very 
> little IO. The whole importing of that dataset only took 1:30 hours, but the 
> full-text indexing is still running after 3 hours now... Is there any way to 
> go full speed at the cost of locking the whole DB or something?

[Hugh] Will have to check with development as I am not aware of a param to 
control CPU usage, it should run with full platform utilisation I would have 
thought …

Regards
Hugh

> 
> Cheers,
> Jörn
> 
> 
> 


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-22 Thread Jörn Hees
Hi Hugh,

thanks again for your replies...

> On 22 Nov 2015, at 01:46, Hugh Williams  wrote:
> 
>> What puzzles me is that after import and several checkpoints and restarts, 
>> just leaving the DB idle without any queries (see below) it seems to become 
>> busy.
>> I guess it does some kind of "re-organization" and i'd mostly like to find 
>> out how i can tell it "do it now, take all resources you want, don't care if 
>> anyone is waiting, admin override, full speed ;)".
>> That would allow me to then have that static state of the DB which i can 
>> back-up and replay if things go wrong or someone wants an old version, 
>> leaving us with "ready to use" backups, and not such that first start some 
>> lengthy "re-organization after mass import".
>> 
>> The mentioned "re-organization state" now seems to be over after leaving the 
>> DB switched on and idle for the last couple of days.
> 
> [Hugh] Does your database have Full Text indexing enabled which would is  a 
> scheduled background task that would take time to complete on a newly loaded 
> large database like yours, see:
> 
>   
> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext

I really think that this could be it, as by default there seems to be an "all" 
index.

Reading the doc page, i have two remaining questions:

After a normal `rdf_loader_run()`, would a 
`DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
cases and will otherwise never arrive at a complete free-text index (not even 
after the background tasks finished?)?
If i have to, a mention of this around 
http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod 
would be nice.

I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the DBpedia 
core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very little IO. 
The whole importing of that dataset only took 1:30 hours, but the full-text 
indexing is still running after 3 hours now... Is there any way to go full 
speed at the cost of locking the whole DB or something?

Cheers,
Jörn




--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

2015-11-22 Thread Jörn Hees
Hi,

> On 23 Nov 2015, at 03:36, Hugh Williams  wrote:
> 
>>> 
>>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>> 
>> I really think that this could be it, as by default there seems to be an 
>> "all" index.
> 
> [Hugh] If you installed the Virtuoso Faceted Browser then the FT index would 
> be enabled and run as a scheduled job.

It's a "vanilla" virtuoso-server as installed from the .deb packages of the 
7.2.1 sources, the only VAD package installed is the conductor.


>> 
>> Reading the doc page, i have two remaining questions:
>> 
>> After a normal `rdf_loader_run()`, would a 
>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
>> cases and will otherwise never arrive at a complete free-text index (not 
>> even after the background tasks finished?)?
> 
> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
> can wait for it to run or run it manually itself.


Yes, that's what i understood, but in the 
http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext 
there's this remark:

> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
> applications (e.g. those that import billions of triples) may set off 
> triggers. This will make free-text index data incomplete. Calling procedure 
> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items by 
> dropping and re-inserting every existing free-text index rule.

So i'm asking: is `rdf_loader_run();` one of those "applications" which 
deactivate some triggers leading to an incomplete full-text index and need 
`DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?

Or are the only two interesting alternatives for me waiting or calling 
`DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?


>> If i have to, a mention of this around 
>> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod
>>  would be nice.
>> 
>> I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the 
>> DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very 
>> little IO. The whole importing of that dataset only took 1:30 hours, but the 
>> full-text indexing is still running after 3 hours now... Is there any way to 
>> go full speed at the cost of locking the whole DB or something?
> 
> [Hugh] Will have to check with development as I am not aware of a param to 
> control CPU usage, it should run with full platform utilisation I would have 
> thought 

Thanks,

Jörn


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users