Kunal, msec_time() returns a timer with 1 millisecond resolution, so the following will work:
declare start, finish, time_spent integer; start := msec_time(); do_something; finish := msec_time(); time_spent := finish - start; There's function now() that returns millisecond counter as well but it returns current transaction timestamp, not an accurate physical time. Best Regards, Ivan Mikhailov, OpenLink Software. On Thu, 2008-02-14 at 10:46 -0800, Kunal Patel wrote: > Thanks Ivan, > > I also want to collect statistics on how much time is taken in > loading each file and the overall time spent in loading the whole > dataset. Is there an easy way to do that? > > Regards, > Kunal > > > > Ivan Mikhailov <[email protected]> wrote: > Kunal, > > No, LUBM_LOAD_LOG2 uses single-threaded parsers in parallel. > It's OK for > big number of files and big number of CPU cores because it can > load all > cores without much lock contention. For UNIPROT case, it's > probably > enough to > > create function DB.DBA.UNIPROT_LOAD (in log_mode integer := 1) > { > DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename1'), > 'http://base_uri_1', 'destination_graph_1', log_mode, 3); > DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename2'), > 'http://base_uri_2', 'destination_graph_2', log_mode, 3); > ... > DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output('filename9'), > 'http://base_uri_9', 'destination_graph_9', log_mode, 3); > } > > If you're starting from blank database and you can drop it and > re-create > in case of error signalled, use it this way: > > checkpoint; > checkpoint_interval(6000); > DB.DBA.UNIPROT_LOAD (0), > checkpoint; > checkpoint_interval(60); > > If the database contains important data already and there's no > way to > stop it and backup before the load then use > > checkpoint; > checkpoint_interval(6000); > DB.DBA.UNIPROT_LOAD (), > checkpoint; > checkpoint_interval(60); > > > Best Regards, > Ivan Mikhailov, > OpenLink Software. > > On Wed, 2008-02-13 at 15:19 -0800, Kunal Patel wrote: > > Hi Ivan, > > > > Thanks for the detailed response. I downloaded the Uniprot > KB from > > > ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/ (I am > > using all the files except uniparc.rdf.gz and > uniref.rdf.gz) > > The relation between the various files is documented at > > http://dev.isb-sib.ch/projects/uniprot-rdf/intro.html > > > > Again to make sure that I understood you correctly, the best > way to > > load the uniprot data for me would be to create a procedure > similar to > > LUBM_LOAD_LOG2 (say UNIPROT_LOAD_LOG2) and call this > procedure as > > follows, > > > > UNIPROT_LOAD_LOG2 (vector ('data-dir'), 3); > > > > This will use 3 processing threads per parsing. > > > > Regards, > > Kunal > > > > Ivan Mikhailov wrote: > > Kunal, > > > > I've downloaded uniprit_sprot.xml.gz (442729K) and > > unprot_trembl.xml.gz(2858M) . Both are from > > ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/ > that > > is > > unavailable for me ATM . > > Where can I get the rest? Should that files reside in a > single > > graph and > > be queries as a single big set of triples or they have > > different meaning > > and should be queried separately (i.e. the location of a > > triple is > > important for what does it mean, e.g. reviewed data are > > separated from > > dirty drafts)? I'm weak in proteins, but I'd like to be > ready > > to more > > UniProt-related queries because this data set is quite > > popular. > > > > With only 4 CPUs single multithreaded parser can be the best > > choice. > > Note that the 'number of threads' parameter of > > DB.DBA.RDF_LOAD_RDFXML() > > mentions threads used to process data from file, an extra > > thread will > > read the text and parse it, so for 4 CPU cores there's no > need > > in > > parameter value greater than 3. Three processing threads per > > one parsing > > tread is usually good ratio because parsing is usually three > > times > > faster than the rest of loading so CPU loading is well > > balanced. I'm > > using 2 x Quad Xeon so I will choose between 8 > single-threaded > > parsers > > or 2 parsers with 3 processing threads each. With 4 cores > you > > may simply > > load file after file with 3 processing threads. > > > > The most important performance tuning thing is to ensure > that > > you have > > set proper > > > > NumberOfBuffers = 1000000 > > MaxDirtyBuffers = 800000 > > MaxCheckpointRemap = 1000000 > > > > in [Parameters] section of virtuoso configuration file > > (virtuoso.ini or > > the like) . > > > > (Note for other readers: these numbers are reasonable for 16 > > GB RAM > > Linux box, please refer to User's Guide before tweaking your > > settings) > > > > You may note that 1 million of 8 kilobyte buffers is only 8 > > Gb, leaving > > almost unused 8 Gb. This is done intentionally because some > > Linux > > installation demonstrated running out of OS physical memory > > due to > > fragmentation if almost all memory is allocated only once > and > > never > > re-allocated during the run. It seems to be Linux-specific > > problem of > > memory allocator, at least during long data loading we've > seen > > cases of > > stable size of the virtuoso process, zero activity of other > > processes > > and decreasing amount of available memory. We have no > accurate > > explanation and workaround for this phenomenon ATM. When > there > > are no > > such massive operations as loading huge database, I set up > to > > > > NumberOfBuffers = 1500000 > > MaxDirtyBuffers = 1200000 > > MaxCheckpointRemap = 1500000 > > > > and it's still OK. Thus after loading all data you may wish > to > > shutdown, > > tweak and start server again. > > > > If you have ext2fs or ext3fs filesystem then it's better to > > have enough > > free space on disk to not make it more than 80% full. When > > it's almost > > full it may allocate database file badly, resulting in > > measurable loss > > of disk access speed. That is not Virtuoso-specific fact, > but > > a common > > hint for all database-like applications with random access > to > > big files. > > > > Best Regards, > > Ivan Mikhailov, > > OpenLink Software. > > > > On Wed, 2008-02-13 at 10:47 -0800, Kunal Patel wrote: > > > Hi Ivan, > > > > > > I am working with a 4 CPU machine with 16 GB RAM. The > > UniProt data > > > is distributed in 9 RDF files and 1 OWL file. > > > > > > The OWL file will act as the rule set for the RDF data. > Most > > of the > > > RDF files are of reasonable size, except one which is of > > size 41 GB. > > > Do you have any suggestion on what load method > > (multithreaded parsers > > > OR asynchronous queue of singe threaded parsers) would be > > best for > > > this dataset. > > > > > > Thanks, > > > Kunal > > > > > > > > > > > > > > > ______________________________________________________________________ > > Never miss a thing. Make Yahoo your homepage. > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft > > Defy all challenges. Microsoft(R) Visual Studio 2008. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ > Virtuoso-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/virtuoso-devel > > > > > > ______________________________________________________________________ > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try > it now.
