Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
On 2 Sep 2014, at 00:01, Hugh Williams wrote: >>> Development indicate your suggestion is not without merit but >>> implementation is not as simple as it may seems as the indexes are not all >>> sequential, but something like that could possibly be implemented. It is >>> suggested you could try dropping the indexes on RDF_QUAD table, load the >>> Freebase datasets and then recreate indexes after loading, which would >>> require a smaller working set that would better fix into the 32GB RAM >>> available. The command for dropping the necessary indexes are: >>> >>> drop index rdf_quad_pogs; >>> drop index rdf_quad_sp; >>> drop index rdf_quad_op; >>> drop index rdf_quad_gs; >>> >>> and the respective indexes can then be recreated as detailed at: >>> >>> >>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme >>> >>> Note you need to recreate the column-wise indexes being v7. Let us know how >>> this works for you. >> >> Cool, will try. > > [Hugh] OK, let us know the outcome ... After 5 days: [Sun Sep 7 23:43:31 2014] virtuoso-t[11495]: segfault at ip 008c3a8e sp 7f4ec2f79d20 error 7 in virtuoso-t[40+b47000] yay... Performance also breaks down at some point with dropped indexes and 2 run_rdf_loaders as suggested. I'll append the output of `select * from DB.DBA.LOAD_LIST;` but for now i give up... Cheers, Jörn ll_file ll_graph ll_statell_started ll_done ll_host ll_work_time ll_error VARCHAR NOT NULL VARCHAR INTEGER TIMESTAMPTIMESTAMPINTEGER INTEGER VARCHAR ___ /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aa.nt.gz http://rdf.freebase.com 2 2014.9.2 17:20.13 927132000 2014.9.2 17:34.28 11121000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ab.nt.gz http://rdf.freebase.com 2 2014.9.2 17:20.59 818941000 2014.9.2 17:35.42 994563000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ac.nt.gz http://rdf.freebase.com 2 2014.9.2 17:34.28 46244000 2014.9.2 17:52.1 4205 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ad.nt.gz http://rdf.freebase.com 2 2014.9.2 17:35.43 10491000 2014.9.2 17:53.58 266217000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ae.nt.gz http://rdf.freebase.com 2 2014.9.2 17:52.1 4551 2014.9.2 18:11.45 21522000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.af.nt.gz http://rdf.freebase.com 2 2014.9.2 17:53.58 27075 2014.9.2 18:14.13 26648 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ag.nt.gz http://rdf.freebase.com 2 2014.9.2 18:11.45 25765000 2014.9.2 18:29.32 312824000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ah.nt.gz http://rdf.freebase.com 2 2014.9.2 18:14.13 27152 2014.9.2 18:34.51 216078000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ai.nt.gz http://rdf.freebase.com 2 2014.9.2 18:29.32 321036000 2014.9.2 18:54.37 54526000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aj.nt.gz http://rdf.freebase.com 2 2014.9.2 18:34.51 220487000 2014.9.2 18:54.44 130952000 0 NULLNULL /usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ak.nt.gz http
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Jörn, On 1 Sep 2014, at 16:44, Jörn Hees wrote: > > On 1 Sep 2014, at 17:15, Hugh Williams wrote: > >> [Hugh] Did you let the load continue or was it stopped ? > > yupp, i let it continue but it was killed for out of memory 2.5 days after my > last mail... :-/ > > >> Development indicate your suggestion is not without merit but implementation >> is not as simple as it may seems as the indexes are not all sequential, but >> something like that could possibly be implemented. It is suggested you could >> try dropping the indexes on RDF_QUAD table, load the Freebase datasets and >> then recreate indexes after loading, which would require a smaller working >> set that would better fix into the 32GB RAM available. The command for >> dropping the necessary indexes are: >> >> drop index rdf_quad_pogs; >> drop index rdf_quad_sp; >> drop index rdf_quad_op; >> drop index rdf_quad_gs; >> >> and the respective indexes can then be recreated as detailed at: >> >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme >> >> Note you need to recreate the column-wise indexes being v7. Let us know how >> this works for you. > > Cool, will try. [Hugh] OK, let us know the outcome ... > >> Note you can also use the ld_meter scripts we provided for monitoring the >> Virtuoso Bulk loader activity as detailed at: >> >> >> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility >> >> Also, how many "rdf_loader_run()" processes do you have running when >> performing the load, as for v7 we recommend running Number of Core * 0.4 >> for best performance typically ? > > Thanks, didn't know these. I'll probably not run multiple rdf_loaders at the > same time as deactivating the indices, etc. (i assume it's meant for cases > where you have enough RAM and aren't invalidating even more of the cache > hierarchy by several processes concurring?) [Hugh] You should run multiple rdf_loader_run() processes as they are many datasets to load and you want to achieve maximum platform utilisation (mainly optimum use of cores for parallel loading of triples) during the load. Regards Hugh > > Cheers, > Jörn > > > -- > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > ___ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
On 1 Sep 2014, at 17:15, Hugh Williams wrote: > [Hugh] Did you let the load continue or was it stopped ? yupp, i let it continue but it was killed for out of memory 2.5 days after my last mail... :-/ > Development indicate your suggestion is not without merit but implementation > is not as simple as it may seems as the indexes are not all sequential, but > something like that could possibly be implemented. It is suggested you could > try dropping the indexes on RDF_QUAD table, load the Freebase datasets and > then recreate indexes after loading, which would require a smaller working > set that would better fix into the 32GB RAM available. The command for > dropping the necessary indexes are: > > drop index rdf_quad_pogs; > drop index rdf_quad_sp; > drop index rdf_quad_op; > drop index rdf_quad_gs; > > and the respective indexes can then be recreated as detailed at: > > > http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme > > Note you need to recreate the column-wise indexes being v7. Let us know how > this works for you. Cool, will try. > Note you can also use the ld_meter scripts we provided for monitoring the > Virtuoso Bulk loader activity as detailed at: > > > http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility > > Also, how many "rdf_loader_run()" processes do you have running when > performing the load, as for v7 we recommend running Number of Core * 0.4 for > best performance typically ? Thanks, didn't know these. I'll probably not run multiple rdf_loaders at the same time as deactivating the indices, etc. (i assume it's meant for cases where you have enough RAM and aren't invalidating even more of the cache hierarchy by several processes concurring?) Cheers, Jörn -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers On 25 Aug 2014, at 14:33, Jörn Hees wrote: > Hi again, > > On 22 Aug 2014, at 17:44, Jörn Hees wrote: > >> On 22 Aug 2014, at 17:51, Hugh Williams wrote: >> >>> What I would not expect though is for the memory consumption to continue to >>> increase until the server is killed due to oom error which would imply a >>> possible memory leak, which is why I recommend building with the develop/7 >>> build where there have been improvement in memory management. >> >> I currently just used the stable 7.1.0 release. I'll try with the dev build >> again and report back... > > So i'm running the import on a fresh dev build since my last email and i'm > now at a total memory consumption of 31218/32177 MB (buffers: 15 MB, cache: > remaining ~700 MB). > > The Virtuoso process has allocated 31.5 GB (VIRT), 30.1 GB (RES) and 3.812 MB > (SHR) Memory. > > I'm not sure if i really have to run the importer till it's killed for out of > memory (as i said it becomes pretty slow after a while and is currently only > seeking around with 200 KB/s) or if this is enough already. As > NumberOfBuffers is set to 272 as recommended i guess that anything above > 21 GB is suspicious... we're at > 31 GB now. > > > I've also split up the input file into 100M line chunks so that i can track > the progress a bit better... > 14 of these are completely loaded now, so 1.4 G triples, the 15th is > currently running. > These are the start times as reported in DB.DBA.LOAD_LIST. I added a column > for loaded triples (not necessarily unique): > 2014.8.22 19:59 0 > 2014.8.22 20:09 100M > 2014.8.22 20:22 200M > 2014.8.22 20:39 300M > 2014.8.22 20:53 400M > 2014.8.22 21:11 500M > 2014.8.22 21:31 600M > 2014.8.22 22:03 700M > 2014.8.22 22:39 800M > 2014.8.22 23:32 900M > 2014.8.23 00:17 1G > 2014.8.23 02:47 1.1G > 2014.8.23 08:51 1.2G > 2014.8.23 18:02 1.3G > 2014.8.24 16:16 1.4G > > The import times for 100M triples seem to be roughly about: > - 10 minutes initially > - 30 minutes after 600M loaded triples > - 45 minutes after 900M triples > - 2h:30 after 1G triples (I'm guessing that this is when the set Memory-Limit > is hit) > - 6h after 1.1G triples > - 10h after 1.2G triples > - 22h after 1.3G triples > - >22h after 1.4G triples > > The last 4 lines sadly don't give me the impression that this scales nearly > linearly after virtuoso runs out of fast random access memory and has to rely > on block storage :-/ Is there maybe a setting which allows virtuoso to fall > back to a merge-sort like approach like creating sorted temp dbs and then > merging them bottom up? Wouldn't this scale way beyond the available RAM > sizes and not cause the seek&wait pattern i observe?!? > > > Anything else i can do to help to debug this? Can i stop the import? [Hugh] Did you let the load continue or was it stopped ? Development indicate your suggestion is not without merit but implementation is not as simple as it may seems as the indexes are not all sequential, but something like that could possibly be implemented. It is suggested you could try dropping the indexes on RDF_QUAD table, load the Freebase datasets and then recreate indexes after loading, which would require a smaller working set that would better fix into the 32GB RAM available. The command for dropping the necessary indexes are: drop index rdf_quad_pogs; drop index rdf_quad_sp; drop index rdf_quad_op; drop index rdf_quad_gs; and the respective indexes can then be recreated as detailed at: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme Note you need to recreate the column-wise indexes being v7. Let us know how this works for you. Note you can also use the ld_meter scripts we provided for monitoring the Virtuoso Bulk loader activity as detailed at: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtTipsAndTricksGuideLDMeterUtility Also, how many "rdf_loader_run()" processes do you have running when performing the load, as for v7 we recommend running Number of Core * 0.4 for best performance typically ? Regards Hugh > > Cheers, > Jörn > -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Hi again, On 22 Aug 2014, at 17:44, Jörn Hees wrote: > On 22 Aug 2014, at 17:51, Hugh Williams wrote: > >> What I would not expect though is for the memory consumption to continue to >> increase until the server is killed due to oom error which would imply a >> possible memory leak, which is why I recommend building with the develop/7 >> build where there have been improvement in memory management. > > I currently just used the stable 7.1.0 release. I'll try with the dev build > again and report back... So i'm running the import on a fresh dev build since my last email and i'm now at a total memory consumption of 31218/32177 MB (buffers: 15 MB, cache: remaining ~700 MB). The Virtuoso process has allocated 31.5 GB (VIRT), 30.1 GB (RES) and 3.812 MB (SHR) Memory. I'm not sure if i really have to run the importer till it's killed for out of memory (as i said it becomes pretty slow after a while and is currently only seeking around with 200 KB/s) or if this is enough already. As NumberOfBuffers is set to 272 as recommended i guess that anything above 21 GB is suspicious... we're at > 31 GB now. I've also split up the input file into 100M line chunks so that i can track the progress a bit better... 14 of these are completely loaded now, so 1.4 G triples, the 15th is currently running. These are the start times as reported in DB.DBA.LOAD_LIST. I added a column for loaded triples (not necessarily unique): 2014.8.22 19:59 0 2014.8.22 20:09 100M 2014.8.22 20:22 200M 2014.8.22 20:39 300M 2014.8.22 20:53 400M 2014.8.22 21:11 500M 2014.8.22 21:31 600M 2014.8.22 22:03 700M 2014.8.22 22:39 800M 2014.8.22 23:32 900M 2014.8.23 00:17 1G 2014.8.23 02:47 1.1G 2014.8.23 08:51 1.2G 2014.8.23 18:02 1.3G 2014.8.24 16:16 1.4G The import times for 100M triples seem to be roughly about: - 10 minutes initially - 30 minutes after 600M loaded triples - 45 minutes after 900M triples - 2h:30 after 1G triples (I'm guessing that this is when the set Memory-Limit is hit) - 6h after 1.1G triples - 10h after 1.2G triples - 22h after 1.3G triples - >22h after 1.4G triples The last 4 lines sadly don't give me the impression that this scales nearly linearly after virtuoso runs out of fast random access memory and has to rely on block storage :-/ Is there maybe a setting which allows virtuoso to fall back to a merge-sort like approach like creating sorted temp dbs and then merging them bottom up? Wouldn't this scale way beyond the available RAM sizes and not cause the seek&wait pattern i observe?!? Anything else i can do to help to debug this? Can i stop the import? Cheers, Jörn -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Hey guys, you will have a much easer time if you use :BaseKB http://basekb.com/ The fact is that 50% of Freebase is junk, and you can't afford to load that. It pains me to see people having these problems when I load :BaseKB on a repeatable basis on relatively modest hardware, and in fact, you can get it pre-loaded and be running in minutes https://aws.amazon.com/marketplace/pp/B00KRKRYW0 Take it from me, this is the difference between "just works" and days of suffering. Make it easy for yourself. On Fri, Aug 22, 2014 at 11:44 AM, Jörn Hees wrote: > Hi Hugh, > > thanks for the reply. > > I know 32 GB is probably not much considering the size of the dumps, but > it's the size limit of our VMs :( > So i'd be willing to live with a bit slower import and response times if i > can still leave it on a VM. > > On 22 Aug 2014, at 17:51, Hugh Williams wrote: > > > What I would not expect though is for the memory consumption to continue > to increase until the server is killed due to oom error which would imply a > possible memory leak, which is why I recommend building with the develop/7 > build where there have been improvement in memory management. > > I currently just used the stable 7.1.0 release. I'll try with the dev > build again and report back... > > Cheers, > Jörn > > > > -- > Slashdot TV. > Video for Nerds. Stuff that matters. > http://tv.slashdot.org/ > ___ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users > -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254paul.houle on Skype ontolo...@gmail.com -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Hi Hugh, thanks for the reply. I know 32 GB is probably not much considering the size of the dumps, but it's the size limit of our VMs :( So i'd be willing to live with a bit slower import and response times if i can still leave it on a VM. On 22 Aug 2014, at 17:51, Hugh Williams wrote: > What I would not expect though is for the memory consumption to continue to > increase until the server is killed due to oom error which would imply a > possible memory leak, which is why I recommend building with the develop/7 > build where there have been improvement in memory management. I currently just used the stable 7.1.0 release. I'll try with the dev build again and report back... Cheers, Jörn -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Hi Jörn, Are you running the Virtuoso open source git [1] stable or develop 7 branch ? I would recommend the load be performed with the develop/7 branch if this is not already being used. From analysis development have performed in-house earlier this year, it was found the latest Freebase datasets need about 13,000,000 buffers ie about 105GB RAM to load in memory and not have to use disk which significantly reduces the load rate. This is because the dataset contains many large literal values and thus does not compress very well and also a lot of duplicate data, so even though it is only about 1.9 billion triples as you have seen the actual as you have observed also. What I would not expect though is for the memory consumption to continue to increase until the server is killed due to oom error which would imply a possible memory leak, which is why I recommend building with the develop/7 build where there have been improvement in memory management. To speed the load you should consider using faster disk (SSD's) ideally as a trade off for insufficient memory when loading the dataset, and also database striping [2] for improved parallel i/O access to the database files if possible. Another option would be to load the dataset in 4 parts, which should give the leave enough On our LOD Cache Cloud server [3] which is a 4 node cluster with 768GB RAM and 60billion + triples load the Freebase datasets loaded in about 1.7 hrs: SQL> select min(ll_started) as start, max(ll_done) as finish, datediff('second', min(ll_started), max(ll_done)) as delta from load_list where ll_graph like 'http://commondatastorage.googleapis.com/freebase-public/rdf/freebase-rdf-2013-11-17-00-00.gz'; startfinish delta TIMESTAMPTIMESTAMPINTEGER ___ 2013.12.2 22:34.9 0 2013.12.3 0:16.24 0 6135 1 Rows. -- 74 msec. SQL> On the single server database we testing on with 105GB RAM it loaded in about 2hrs. Best Regards Hugh Williams Professional Services OpenLink Software, Inc. // http://www.openlinksw.com/ Weblog -- http://www.openlinksw.com/blogs/ LinkedIn -- http://www.linkedin.com/company/openlink-software/ Twitter -- http://twitter.com/OpenLink Google+ -- http://plus.google.com/100570109519069333827/ Facebook -- http://www.facebook.com/OpenLinkSoftware Universal Data Access, Integration, and Management Technology Providers [1] http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSGitUsage [2] http://docs.openlinksw.com/virtuoso/dbadm.html#ini_Striping [3] http://lod.openlinksw.com On 22 Aug 2014, at 14:41, Jörn Hees wrote: > Hi, > > TLDR: When importing the Freebase RDF dump Virtuoso seems to consume way more > RAM than configured. > > i'm trying to load the Freebase RDF dump ( > https://developers.google.com/freebase/data ) into a clean Virtuoso > OpenSource 7.1.0 instance running on a VM with 4 cores and 32 GB of RAM, 300+ > GB HD free. > The dump file contains 2,656,580,382 rows (even though the page claims 1.9 > billion triples, maybe outdated or dups). > Before attempting to load the whole Freebase dump, i loaded the basekb.com > dump which contained 1,205,456,739 triples into the store which was already > filled with DBpedia without any noticeable problem. > > The Freebase dump rdf_loader_run() import starts with rapid IO rates (several > 100MB/s read and write bursts) and quickly consumes ~ 25 GB of RAM as > configured. It then continues to slowly consume more and more RAM ~ 1 > MB/minute. As this goes on, the IO rates slowly drop down to some KB/s read > and no / very very rare writes. htop at this point shows that the process > spends nearly all its time on IO wait. After a couple of days Virtuoso is > finally killed by the kernel when it consumed all RAM of the machine and > wants even more. > > I already tried adding 16 GB swap. This didn't help but made the machine > completely unresponsive after 4 days (sshd seems to have been swapped out and > never came back over a couple of hour long retries to ssh into the VM). > > Ubuntu 12.04 LTS or 14.04.1 LTS doesn't seem to make a difference. > > A colleague is reporting that the import works fine on a 256 GB RAM, 8 core > machine with settings for 64 GB... takes about 1 day to import, the final DB > is ~ 130 GB. Mine never gets to > 100 GB before Virtuoso is killed. > > > The instance is set up following my tutorial > http://joernhees.de/blog/2014/04/23/setting-up-a-local-dbpedia-3-9-mirror-with-virtuoso-7/ > just substitute the DBpedia Datasets with the Freebase triple dump and > Wikidata links. > > The virtuoso.ini values are set as suggested for 32 GB of RAM, there's > nothing else running on the VM: > [Database] > MaxCheckpointRemap = 2000 // also tried with 62500, so ~1/4th > of NumberOfBuffers as in the blogpost > [Parameters] > ;; Uncomment next
[Virtuoso-users] importing Freebase RDF dump: slows down, memleak?
Hi, TLDR: When importing the Freebase RDF dump Virtuoso seems to consume way more RAM than configured. i'm trying to load the Freebase RDF dump ( https://developers.google.com/freebase/data ) into a clean Virtuoso OpenSource 7.1.0 instance running on a VM with 4 cores and 32 GB of RAM, 300+ GB HD free. The dump file contains 2,656,580,382 rows (even though the page claims 1.9 billion triples, maybe outdated or dups). Before attempting to load the whole Freebase dump, i loaded the basekb.com dump which contained 1,205,456,739 triples into the store which was already filled with DBpedia without any noticeable problem. The Freebase dump rdf_loader_run() import starts with rapid IO rates (several 100MB/s read and write bursts) and quickly consumes ~ 25 GB of RAM as configured. It then continues to slowly consume more and more RAM ~ 1 MB/minute. As this goes on, the IO rates slowly drop down to some KB/s read and no / very very rare writes. htop at this point shows that the process spends nearly all its time on IO wait. After a couple of days Virtuoso is finally killed by the kernel when it consumed all RAM of the machine and wants even more. I already tried adding 16 GB swap. This didn't help but made the machine completely unresponsive after 4 days (sshd seems to have been swapped out and never came back over a couple of hour long retries to ssh into the VM). Ubuntu 12.04 LTS or 14.04.1 LTS doesn't seem to make a difference. A colleague is reporting that the import works fine on a 256 GB RAM, 8 core machine with settings for 64 GB... takes about 1 day to import, the final DB is ~ 130 GB. Mine never gets to > 100 GB before Virtuoso is killed. The instance is set up following my tutorial http://joernhees.de/blog/2014/04/23/setting-up-a-local-dbpedia-3-9-mirror-with-virtuoso-7/ just substitute the DBpedia Datasets with the Freebase triple dump and Wikidata links. The virtuoso.ini values are set as suggested for 32 GB of RAM, there's nothing else running on the VM: [Database] MaxCheckpointRemap = 2000 // also tried with 62500, so ~1/4th of NumberOfBuffers as in the blogpost [Parameters] ;; Uncomment next two lines if there is 32 GB system memory free NumberOfBuffers = 272 MaxDirtyBuffers = 200 As I already tried a lot of things but can't get this to work, i'd be thankful for feedback or someone looking into why virtuoso is consuming all of the RAM. Cheers, Jörn -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users