très bon thinking batman! On the wireless at the Panera but going to abandon it soon as they don't believe in AC here. Da Stink On Jun 24, 2012, at 9:37 AM, Stefan Scheffler wrote:
> Am 24.06.2012 18:29, schrieb Paolo Castagna: >> Stefan Scheffler wrote: >>> Hey Paolo. >>> Thanks for your reply. >>> I used tdbloader2 with an own Tokenizer / Errorhandler (which just >>> catches / skips errors and writes them into a file). >>> the command was /.tdbloader2 --loc=<store> <srcpath>/* >>> >>> Is there a possibility to do incremental loads with the script files or >>> do i have to write a own program? >> Hi Stefan, >> if you want to run an incremental load you should use tdbloader, not >> tdbloader2. >> tdbloader supports incremental loads, tdbloader2 not. >> >> If you are loading large datasets make sure you have enough RAM (you can load >> data on a machine with a lot of RAM and move indexes elsewhere). >> >> Paolo > Thank you. i will try it tomorrow. > Stefan >> >>> Regards, >>> Stefan >>> >>> Am 24.06.2012 10:42, schrieb Paolo Castagna: >>>> Hi Stefan, >>>> as Rob said, loading data into an empty TDB store is a different from >>>> loading >>>> data into an existing TDB store. >>>> >>>> I assume that for your second data load you used tdbloader not >>>> tdbloader2. >>>> >>>> tdbloader2 does not even support incremental data loads (i.e. it will >>>> overwrite >>>> your existing data). I suspect this is what is going on. >>>> >>>> Can you share the exact commands you used as well as links to the RDF >>>> data? >>>> (this way others can replicate your experiments). >>>> >>>> Regards, >>>> Paolo >>>> >>>> Stefan Scheffler wrote: >>>>> Hello, >>>>> At the moment i am doing some performance checks on tdb. The first i >>>>> checked was the import of the tdbloader2 and i got some weird results. >>>>> Maybe someone can help me out. Here are my testbase and the results. >>>>> >>>>> The first test was to store 12 GB of triples into an empty store (i used >>>>> the german dbpedia). >>>>> >>>>> Load time: 16 minutes >>>>> average loading: ca 81.000 triple / second >>>>> index time: 40 minutes >>>>> store size: 9,3GB >>>>> >>>>> >>>>> The second test was to store the same data into an allready filled store >>>>> As i started the import i created a store with 348.398.593 Triples from >>>>> DNB and HBZ (which are german libraries, store size: 33 GB). >>>>> Then i started to load the german dbpedia in. >>>>> >>>>> Load time: 3 hours and 4 minutes >>>>> average loading: ca 7200 / second >>>>> index time: 38 minutes >>>>> store size: 19 GB!!!!! >>>>> >>>>> Why does the loading time increases that immense? My expectation was, >>>>> that the index time increases. But it does not. There where no other big >>>>> proccesses running nearby. And why does the store size shrink to 19GB? I >>>>> am totally confused about that point. >>>>> >>>>> With friendly regards >>>>> Stefan >>>>> >>> > >