enhancerstresstest.mdtext

rwesten Tue, 03 Jul 2012 00:52:00 -0700

Author: rwesten
Date: Tue Jul  3 07:51:32 2012
New Revision: 1356597

URL: http://svn.apache.org/viewvc?rev=1356597&view=rev
Log:
Documentation for the Stanbol Enhancer Stress Test Utility


Added:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext?rev=1356597&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
 Tue Jul  3 07:51:32 2012
@@ -0,0 +1,115 @@
+Title: Stanbol Enhancer Stres Test Utility
+
+As of [STANBOL-670](https://issues.apache.org/jira/browse/STANBOL-670) Apache 
Stanbol provides an utility that allows users to stress test the Stanbol 
Enhancer by using multiple concurrent requests. This might be useful for both: 
(1) Stanbol Users that want to check if their Stanbol installation can cope 
with those situations and how different [Enhancement Chain](../enhancer/chains) 
configurations do affect processing times. (2) [Enhancement 
Engine](../enhancer/engines) developers that want to test their engines and 
maybe also services called by those engines.
+
+In addition this Utility also provides some statistics including
+
+* __Round Trip Time__: The whole request/response time including sending - 
request transmission - server side parsing - processing - server side 
serialization - response transmission and client side parsing.
+* __Enhancement Chain processing__: This is the time needed by the 
[EnhancementJobManager](../engines/enhancementjobmanager.html) to process the 
[ContentItem](enhancer/contentitem.html). This data are provided by the 
[Execution Metadata](../enhancer/executionmetadata.html)
+* __EnhancementEngine processing__: Also the processing times of all 
[Enhancement Engines](../enhancer/engines) used in the tested [Enhancemet 
Chain](../enhancer/chain) are tracked. Those data are also provides by the 
[Execution Metadata](../enhancer/executionmetadata.html)
+
+## Usage
+
+This utility is part of the Apache Stanbol Integration tests and is also run 
during normal builds against the default chain of the Stanbol Enhancer. As any 
integration test it can be also run standalone and against Stanbol Servers 
running at a configured URL.
+
+To use this tool you need to [checkout and build](introduction.html) Apache 
Stanbol and than change to the <code>{stanbol-source}/integration-tests</code> 
directory. Within this directory one can now call this utility using
+
+    :::bash
+    mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest
+
+this will make 500 requests with 5 concurrent threads on the {stanbol-server} 
using [DBpedia.org](http://dbpedia.org) abstracts as content. The 
integration-test includes up to 10000 those abstracts that can be used for 
testing.
+
+This utility can be configured using the following system properties:
+
+* __test.server.url__: The URL of the Apache Stanbol instance that will be 
used for testing (e.g. http://localhost:8080)
+* __test__: The simple class name of the integration test to run. To use this 
tool this MUST BE set to '<code> MultiThreadedTest</code>'.
+*__stanbol.it.multithreadtest.chain__: The name of the enhancement chain to 
test. If not present the default chain will be tested.
+* __stanbol.it.multithreadtest.data__: Allows to specify the data used for the 
tests. Files, Resources available via the class path and URLs are supported. 
Referenced data may be compressed using 'gz' and 'bz2'. 'zip' is also supported 
however only the first entry of the ZIP file is processed. Supported data 
formats include plain text and RDF serializations supported by Apache Clerezza. 
See the section about Test Data for details.
+* __stanbol.it.multithreadtest.media-type__: While the Tool supports 
auto-detection of the 'Media-Type' for common file endings (e.g. *.txt, *.rdf, 
â¦) this property can be used to manually specify the media type. In addition 
it allows to parse the charset used for plain text files (e.g. 
"text/plain;charset=UTF-8) 
+* __stanbol.it.multithreadtest.data-property__: In case RDF is used for test 
data this can be used to specify the property of triples their values are used 
as test data. If '*' is parsed all triples with Literals as values will be 
used. '<code>http://dbpedia.org/ontology/abstract</code>' is used as default if 
this property is missing.
+* __stanbol.it.multithreadtest.threads__: The number of concurrent threads 
used during stress testing. The default is 5.
+* __stanbol.it.multithreadtest.requests__: The maximum number of requests. 
This only applies if the configured data set would provide more data items. By 
default this is set to 500. This can be deactivated by setting to values less 
equals than 0.
+* __stanbol.it.multithreadtest.rdf-format__: The RDF serialization used for 
the '<code>Accept</code>' header in enhancement requests. Apache Stanbol will 
send Enhancement Results using this format. The default is 
'<code>application/rdf+json</code>'
+
+Here is an example that makes extensive use of custom options:
+
+    :::bash
+    mvn -o test -Dtest=MultiThreadedTest \
+        
-Dstanbol.it.multithreadtest.data=/stanbol/test/data/stanbol-test-data.txt.gz \
+        -Dstanbol.it.multithreadtest.requests=10000 \
+        -Dstanbol.it.multithreadtest.threads=20 \
+        -Dstanbol.it.multithreadtest.rdf-format=text/turtle \
+        -Dtest.server.url=http://www.example.org:8080/stanbol
+
+_NOTES:_
+
+* With Java System properties are parsed using 
'<code>-D{property}={value}</code>'
+* If you get OutOfMemory errors you might need to increase the memory of the 
'<code>Xmx</code>' parameter of the '<code>MAVEN_OPTS</code>' system variable. 
This might especially happen if you use RDF data for your test as those are 
loaded into memory.
+
+## Test Data
+
+Thie tool supports two different test data formats
+
+### Plain Text Files
+
+All test data are within a single text file. Single texts are separated by two 
(or more) empty lines.
+
+The following example includes three content items:
+
+    :::text
+    Astronomers discover largest star on record
+    
+    European astronomers have discovered the largest star yet on record; it is 
300 times the mass of our sun, beyond the previously accepted limit of 150 
solar masses.
+
+    Paul Crowther, professor of astrophysics at [â¦]
+    
+    
+    Australian election debate moved to avoid clash with cookery show
+
+    A televised debate between Australia's candidates for Prime Minister [â¦]
+    
+    
+    The Only Joy In Town
+
+    by Joni Mitchell   
+
+    I want to paint a picture
+    Botticelli * style
+    Instead of Venus on a clam *
+    I'd paint this flower child
+
+Plain text test data are read sequentially from the provided source. This 
ensures that only ~100 content items are loaded into memory at any given time. 
So this is the preferred option for large test data sets.
+
+Text files can recognized by the file ending "txt" to the parsed resource. For 
resources with other engines the property 
'<code>stanbol.it.multithreadtest.media-type=text/plain</code>' must be 
specified. If the test data are not encoded using '<code>UTF-8</code>' the 
charset MUST BE parsed by using the '<code>charset</code>' parameter (e.g. 
'<code>stanbol.it.multithreadtest.media-type=text/plain;charset=iso-8859-7</code>').
+
+### RDF data
+
+The tool also allows to use RDF graphs as test data. This is mainly because in 
a lot of cases it is the easiest to use RDF dumps of public datasets - such as 
DBpedia.org - for testing. Users need to be aware that RDF data are imported 
into an in-memory graph.
+
+Content Items are extracted by
+
+* Filtering Triples that use the value configured by 
'<code>stanbol.it.multithreadtest.data-property</code>' as property 
('<code>{prefix}:{local-name}</code>' is supported for registered prefixes). As 
default '<code>http://dbpedia.org/ontology/abstract</code>' is used. If 
'<code>*</code>' is configured than all triples are taken into account.
+* Filter Triples that have a Literal value as Object
+
+Supported RDF formats and mapped file endings:
+
+* '<code>application/rdf+xml</code>' - file endings '<code>.rdf</code>' and 
'<code>.xml</code>'
+* '<code>text/turtle</code>' - file ending '<code>.ttl</code>'
+* '<code>application/x-turtle</code>' - no file endings
+* '<code>text/rdf+nt</code>' - file endings '<code>.nt</code>'
+* '<code>text/rdf+n3</code>' - file endings '<code>.n3</code>'
+* '<code>application/rdf+json</code>' - file endings '<code>.json</code>'
+
+If you want to use a different file ending you need to parse the Media-Type 
using the '<code>stanbol.it.multithreadtest.media-type</code>' property
+
+### Support for compressed test data
+
+Bot plain text and RDF data can be efficiently compressed. Because of that 
this utility also supports compressed files. The compression format is detected 
by the file ending.
+
+Supported are
+
+* '<code>.gz</code>'
+* '<code>.bz2</code>'
+* '<code>.zip</code>' - only the first entry in the ZIP archive is processed
+
+Compressed files need to use double endings (e.g. 
'<code>test-data.txt.gz</code>' or '<code>test-data.rdf.bz2</code>').
\ No newline at end of file

svn commit: r1356597 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils: ./ enhancerstresstest.mdtext

Reply via email to