Le 19/03/2017 à 16:15, Hugh Williams a écrit :
Hi Thomas,
Hi,

Is the loading of the dataset now complete or it is still in progress as you 
opening statement is not clear ?

You should not need 40GB RAM for inserting and hosting 240 million triples, 
which should require less then 10GB depending on how well they can be 
compressed for storage in the database.
loading is complete, we finished at 243 188 427 triples , hosting now requires 25GB ram, 15Gb disk, details :

void:triples 243188427 ;
 void:classes 13 ;
 void:entities 58523487 ;
 void:distinctSubjects 58523514 ;
 void:properties 32 ;
 void:distinctObjects 73171603 .

Total pages     1925120
Free pages      607377
Buffers         2720000
Buffers used    244554
Dirty buffers   3
Wired down buffers      0

Table   Index name      Touches         Reads   Read %
DB.DBA.RDF_QUAD         RDF_QUAD        1562356553      36371   0
DB.DBA.RDF_QUAD         RDF_QUAD_POGS   609423455       16989   0
DB.DBA.RDF_QUAD         RDF_QUAD_SP     378769255       35822   0
DB.DBA.RDF_QUAD         RDF_QUAD_GS     340377017       1634    0


I assume you have set the swappiness as suggested previously ?
yes, done, $ sysctl vm.swappiness
vm.swappiness = 10


When you recompiled your Virtuoso was this done from the git stable/7 or 
develop/7 branch , as I latter has a number of memory consumption fixes that 
would not be in stable/7, thus I would suggest building from develop/7.
will investigate.

The two main problems we encountered while loading were :

- logs messages indicating "Flushing at 5.7 MB/s while application is making dirty pages at 1.7 MB/s." which we interpreted as not enough write speed while receiving lots of JDBC INSERTs (disk issue ? buffer issue ? ...)

- high memory consumption (40GB RAM), virtuoso process never releasing memory while loading, free RAM always going down...


Have you provided a copy of your INI file previously,  if not can you provide a 
copy ?
see attached (FYI QueryLog= was not active while loading)

Do ensure the following params are set to 1 in order to clean up unused 
threads/resources and reduce memory consumption of the Virtuoso server, which 
can otherwise be construed as memory leaks.:

ThreadCleanupInterval    = 1
ResourcesCleanupInterval = 1
we have theses settings right.

Thanks for your help,

Thomas

if needed we model ORCID 2016 dataset using :
c1      c2
http://xmlns.com/foaf/0.1/Person        

28021451

http://purl.org/ontology/bibo/Document  

14283692

http://purl.org/ontology/bibo/Journal   

9104659

http://xmlns.com/foaf/0.1/PersonalProfileDocument       

2527333

http://xmlns.com/foaf/0.1/Article       

974945

http://www.w3.org/ns/org#Membership     

807465

http://www.w3.org/2006/vcard/ns#Address         

807423

http://www.w3.org/ns/org#Organization   

807418

http://purl.org/ontology/bibo/Conference        

769451

http://www.w3.org/ns/org#OrganizationalUnit     

649291

http://www.w3.org/2004/02/skos/core#Concept     

371731

http://purl.org/ontology/bibo/Book      

205493

http://www.w3.org/ns/org#Role   

168423

http://www.w3.org/1999/02/22-rdf-syntax-ns#Property     

170

http://www.openlinksw.com/schemas/virtrdf#QuadMapFormat         

130

http://www.openlinksw.com/schemas/virtrdf#array-of-QuadMapFormat        

98

http://www.w3.org/2000/01/rdf-schema#Class      

56

http://www.openlinksw.com/schemas/virtrdf#QuadMapValue  

8

http://www.openlinksw.com/schemas/virtrdf#array-of-QuadMapColumn        

8

http://www.openlinksw.com/schemas/virtrdf#QuadMapColumn         

8





Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.      //              http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers



On 15 Mar 2017, at 17:08, Thomas Michaux <mich...@abes.fr> wrote:

Hello,

FYI, virtuoso still loading but we needed to increase memory ressources,
now the process use almost 40GB of ram :

[devel@tulipe-test2 ~]$ ./memcheck-virtuoso.sh
2017-03-15T17:54 VmSize: 41273424kB 5883

stats for the graph <http://hub.abes.fr/referentiel/ORCID/2016> (forget
to mention, it's the only graph in db) :

239 451 028 triples


this:Dataset a void:Dataset ;
rdfs:seeAlso <http://hub.abes.fr/referentiel/ORCID/2016> ;
rdfs:label "" ;
void:sparqlEndpoint <http://idrefplus.v102.abes.fr:8890/sparql> ;
void:triples 239451028 ;
void:classes 13 ;
void:entities 57692917 ;
void:distinctSubjects 57650847 ;
void:properties 32 ;
void:distinctObjects 72219514 .

this:sameAsLinks a void:Linkset ;
void:inDataset this:Dataset ;
void:triples 997389 ;
void:linkPredicate owl:sameAs .


Le 14/03/2017 à 10:05, Thomas Michaux a écrit :

;
;  virtuoso.ini
;
;  Configuration file for the OpenLink Virtuoso VDBMS Server
;
;  To learn more about this product, or any other product in our
;  portfolio, please check out our web site at:
;
;      http://virtuoso.openlinksw.com/
;
;  or contact us at:
;
;      general.informat...@openlinksw.com
;
;  If you have any technical questions, please contact our support
;  staff at:
;
;      technical.supp...@openlinksw.com
;
;
;  Database setup
;
[Database]
DatabaseFile       = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.db
ErrorLogFile       = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.log
LockFile           = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.lck
TransactionFile    = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso20170320104252.trx
;TransactionFile    = /LN_Hupe/virtuoso20151207171442.trx
xa_persistent_file = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso.pxa
ErrorLogLevel      = 7
FileExtend         = 200
MaxCheckpointRemap = 481280
;UnremapQuota       = 0
DefaultIsolation   = 2
Striping           = 0
TempStorage        = TempDatabase

[TempDatabase]
DatabaseFile       = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso-temp.db
TransactionFile    = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/db/virtuoso-temp.trx
MaxCheckpointRemap = 2000
Striping           = 0

;
;  Server parameters
;
[Parameters]
ServerPort               = 1111
LiteMode                 = 0
DisableUnixSocket        = 1
DisableTcpSocket         = 0
;SSLServerPort                  = 2111
;SSLCertificate                 = cert.pem
;SSLPrivateKey                  = pk.pem
;X509ClientVerify               = 0
;X509ClientVerifyDepth          = 0
;X509ClientVerifyCAFile         = ca.pem
MaxClientConnections     = 10
CheckpointInterval       = 40
O_DIRECT                 = 1
CaseMode                 = 2
MaxStaticCursorRows      = 5000
CheckpointAuditTrail     = 1
AllowOSCalls             = 0
SchedulerInterval        = 10
;DirsAllowed              = ., 
/usr/local/virtuoso-opensource/share/virtuoso/vad, /home/devel, /LN_Hupe, 
/LN_Hupe/dumpviaf
;production
DirsAllowed              = ., 
/usr/local/virtuoso-opensource/share/virtuoso/vad, /home/devel/logs
ThreadCleanupInterval    = 1
ThreadThreshold          = 10
ResourcesCleanupInterval = 1
FreeTextBatchSize        = 100000
SingleCPU                = 0
VADInstallDir            = /usr/local/virtuoso-opensource/share/virtuoso/vad/
PrefixResultNames        = 0
RdfFreeTextRulesSize     = 100
IndexTreeMaps            = 256
MaxMemPoolSize           = 200000000
PrefixResultNames        = 0
MacSpotlight             = 0
IndexTreeMaps            = 64
MaxQueryMem              = 3G   ; memory allocated to query processor
VectorSize               = 1000 ; initial parallel query vector (array of query 
operations) size
MaxVectorSize            = 1000000      ; query vector size threshold.
AdjustVectorSize         = 0
ThreadsPerQuery          = 8
AsyncQueueMaxThreads     = 10
ColumnStore              = 1
;server side query logging
;At run time, this may be enabled or disabled with prof_enable (), overriding 
the specification of the ini file
QueryLog                 = virtuoso.qrl
;;
;; When running with large data sets, one should configure the Virtuoso
;; process to use between 2/3 to 3/5 of free system memory and to stripe
;; storage on all available disks.
;;
;; Uncomment next two lines if there is 2 GB system memory free
;NumberOfBuffers          = 170000
;MaxDirtyBuffers          = 130000
;; Uncomment next two lines if there is 4 GB system memory free
;NumberOfBuffers          = 340000
; MaxDirtyBuffers          = 250000
;; Uncomment next two lines if there is 8 GB system memory free
;NumberOfBuffers          = 680000
;MaxDirtyBuffers          = 500000
;; Uncomment next two lines if there is 16 GB system memory free
;NumberOfBuffers          = 1360000
;MaxDirtyBuffers          = 1000000
;; Uncomment next two lines if there is 32 GB system memory free
NumberOfBuffers          = 2720000
MaxDirtyBuffers          = 2000000

;; Uncomment next two lines if there is 48 GB system memory free
;NumberOfBuffers          = 4000000
;MaxDirtyBuffers          = 3000000
;; Uncomment next two lines if there is 64 GB system memory free
;NumberOfBuffers          = 5450000
;MaxDirtyBuffers          = 4000000
;;
;; Note the default settings will take very little memory
;; but will not result in very good performance
;;
;NumberOfBuffers          = 10000
;MaxDirtyBuffers          = 6000
[HTTPServer]
ServerPort                  = 8890
ServerRoot                  = 
/usr/local/virtuoso-opensource/var/lib/virtuoso/vsp
MaxClientConnections        = 10
DavRoot                     = DAV
EnabledDavVSP               = 0
HTTPProxyEnabled            = 0
TempASPXDir                 = 0
DefaultMailServer           = localhost:25
ServerThreads               = 10
MaxKeepAlives               = 10
KeepAliveTimeout            = 10
MaxCachedProxyConnections   = 10
ProxyConnectionCacheTimeout = 15
HTTPThreadSize              = 280000
HttpPrintWarningsInOutput   = 0
Charset                     = UTF-8
HTTPLogFile                 = /home/devel/logs/http20032017.log
MaintenancePage             = atomic.html
EnabledGzipContent          = 1

[AutoRepair]
BadParentLinks = 0

[Client]
SQL_PREFETCH_ROWS  = 100
SQL_PREFETCH_BYTES = 16000
SQL_QUERY_TIMEOUT  = 0
SQL_TXN_TIMEOUT    = 0
;SQL_NO_CHAR_C_ESCAPE           = 1
;SQL_UTF8_EXECS                 = 0
;SQL_NO_SYSTEM_TABLES           = 0
;SQL_BINARY_TIMESTAMP           = 1
;SQL_ENCRYPTION_ON_PASSWORD     = -1

[VDB]
ArrayOptimization           = 0
NumArrayParameters          = 10
VDBDisconnectTimeout        = 1000
KeepConnectionOnFixedThread = 0

[Replication]
ServerName   = db-TULIPEDEV
ServerEnable = 1
QueueMax     = 50000

;
;  Striping setup
;
;  These parameters have only effect when Striping is set to 1 in the
;  [Database] section, in which case the DatabaseFile parameter is ignored.
;
;  With striping, the database is spawned across multiple segments
;  where each segment can have multiple stripes.
;
;  Format of the lines below:
;    Segment<number> = <size>, <stripe file name> [, <stripe file name> .. ]
;
;  <number> must be ordered from 1 up.
;
;  The <size> is the total size of the segment which is equally divided
;  across all stripes forming  the segment. Its specification can be in
;  gigabytes (g), megabytes (m), kilobytes (k) or in database blocks
;  (b, the default)
;
;  Note that the segment size must be a multiple of the database page size
;  which is currently 8k. Also, the segment size must be divisible by the
;  number of stripe files forming  the segment.
;
;  The example below creates a 200 meg database striped on two segments
;  with two stripes of 50 meg and one of 100 meg.
;
;  You can always add more segments to the configuration, but once
;  added, do not change the setup.
;
[Striping]
Segment1 = 100M, db-seg1-1.db, db-seg1-2.db
Segment2 = 100M, db-seg2-1.db
;...
;[TempStriping]
;Segment1                       = 100M, db-seg1-1.db, db-seg1-2.db
;Segment2                       = 100M, db-seg2-1.db
;...
;[Ucms]
;UcmPath                        = <path>
;Ucm1                           = <file>
;Ucm2                           = <file>
;...

[Zero Config]
ServerName = virtuoso (TULIPEDEV)
;ServerDSN                      = ZDSN
;SSLServerName                  =
;SSLServerDSN                   =

[Mono]
;MONO_TRACE                     = Off
;MONO_PATH                      = <path_here>
;MONO_ROOT                      = <path_here>
;MONO_CFG_DIR                   = <path_here>
;virtclr.dll                    =

[URIQA]
DynamicLocal = 0
DefaultHost  = localhost:8890

[SPARQL]
ExternalQuerySource        = 1
ExternalXsltSource         = 1
;DefaultGraph                   = http://localhost:8890/dataspace
;ImmutableGraphs                = http://localhost:8890/dataspace
ResultSetMaxRows           = 100000
MaxQueryCostEstimationTime = 400        ; in seconds
MaxQueryExecutionTime      = 40 ; in seconds
DefaultQuery               = select ?p,?o from 
<http://hub.abes.fr/referentiel/ORCID/2016> where 
{<http://orcid.org/0000-0002-1275-0840/affiliation/2/organisation/universidaddechile/adresse/1>
 ?p ?o} limit 50
DeferInferenceRulesInit    = 1  ; controls inference rules loading
;PingService                    = http://rpc.pingthesemanticweb.com/

[Plugins]
;Load4                  = plain, im
;Load5          = plain, wbxml2
;Load6                  = plain, hslookup
;Load7                  = attach, libphp5.so
;Load8                  = Hosting, hosting_php.so
;Load9                  = Hosting,hosting_perl.so
;Load10         = Hosting,hosting_python.so
;Load11         = Hosting,hosting_ruby.so
;Load12                         = msdtc,msdtc_sample
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to