Author: rwesten
Date: Tue Jul 3 07:51:32 2012
New Revision: 1356597
URL: http://svn.apache.org/viewvc?rev=1356597&view=rev
Log:
Documentation for the Stanbol Enhancer Stress Test Utility
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext?rev=1356597&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/enhancerstresstest.mdtext
Tue Jul 3 07:51:32 2012
@@ -0,0 +1,115 @@
+Title: Stanbol Enhancer Stres Test Utility
+
+As of [STANBOL-670](https://issues.apache.org/jira/browse/STANBOL-670) Apache
Stanbol provides an utility that allows users to stress test the Stanbol
Enhancer by using multiple concurrent requests. This might be useful for both:
(1) Stanbol Users that want to check if their Stanbol installation can cope
with those situations and how different [Enhancement Chain](../enhancer/chains)
configurations do affect processing times. (2) [Enhancement
Engine](../enhancer/engines) developers that want to test their engines and
maybe also services called by those engines.
+
+In addition this Utility also provides some statistics including
+
+* __Round Trip Time__: The whole request/response time including sending -
request transmission - server side parsing - processing - server side
serialization - response transmission and client side parsing.
+* __Enhancement Chain processing__: This is the time needed by the
[EnhancementJobManager](../engines/enhancementjobmanager.html) to process the
[ContentItem](enhancer/contentitem.html). This data are provided by the
[Execution Metadata](../enhancer/executionmetadata.html)
+* __EnhancementEngine processing__: Also the processing times of all
[Enhancement Engines](../enhancer/engines) used in the tested [Enhancemet
Chain](../enhancer/chain) are tracked. Those data are also provides by the
[Execution Metadata](../enhancer/executionmetadata.html)
+
+## Usage
+
+This utility is part of the Apache Stanbol Integration tests and is also run
during normal builds against the default chain of the Stanbol Enhancer. As any
integration test it can be also run standalone and against Stanbol Servers
running at a configured URL.
+
+To use this tool you need to [checkout and build](introduction.html) Apache
Stanbol and than change to the <code>{stanbol-source}/integration-tests</code>
directory. Within this directory one can now call this utility using
+
+ :::bash
+ mvn -o test -Dtest.server.url={stanbol-server} -Dtest=MultiThreadedTest
+
+this will make 500 requests with 5 concurrent threads on the {stanbol-server}
using [DBpedia.org](http://dbpedia.org) abstracts as content. The
integration-test includes up to 10000 those abstracts that can be used for
testing.
+
+This utility can be configured using the following system properties:
+
+* __test.server.url__: The URL of the Apache Stanbol instance that will be
used for testing (e.g. http://localhost:8080)
+* __test__: The simple class name of the integration test to run. To use this
tool this MUST BE set to '<code> MultiThreadedTest</code>'.
+*__stanbol.it.multithreadtest.chain__: The name of the enhancement chain to
test. If not present the default chain will be tested.
+* __stanbol.it.multithreadtest.data__: Allows to specify the data used for the
tests. Files, Resources available via the class path and URLs are supported.
Referenced data may be compressed using 'gz' and 'bz2'. 'zip' is also supported
however only the first entry of the ZIP file is processed. Supported data
formats include plain text and RDF serializations supported by Apache Clerezza.
See the section about Test Data for details.
+* __stanbol.it.multithreadtest.media-type__: While the Tool supports
auto-detection of the 'Media-Type' for common file endings (e.g. *.txt, *.rdf,
â¦) this property can be used to manually specify the media type. In addition
it allows to parse the charset used for plain text files (e.g.
"text/plain;charset=UTF-8)
+* __stanbol.it.multithreadtest.data-property__: In case RDF is used for test
data this can be used to specify the property of triples their values are used
as test data. If '*' is parsed all triples with Literals as values will be
used. '<code>http://dbpedia.org/ontology/abstract</code>' is used as default if
this property is missing.
+* __stanbol.it.multithreadtest.threads__: The number of concurrent threads
used during stress testing. The default is 5.
+* __stanbol.it.multithreadtest.requests__: The maximum number of requests.
This only applies if the configured data set would provide more data items. By
default this is set to 500. This can be deactivated by setting to values less
equals than 0.
+* __stanbol.it.multithreadtest.rdf-format__: The RDF serialization used for
the '<code>Accept</code>' header in enhancement requests. Apache Stanbol will
send Enhancement Results using this format. The default is
'<code>application/rdf+json</code>'
+
+Here is an example that makes extensive use of custom options:
+
+ :::bash
+ mvn -o test -Dtest=MultiThreadedTest \
+
-Dstanbol.it.multithreadtest.data=/stanbol/test/data/stanbol-test-data.txt.gz \
+ -Dstanbol.it.multithreadtest.requests=10000 \
+ -Dstanbol.it.multithreadtest.threads=20 \
+ -Dstanbol.it.multithreadtest.rdf-format=text/turtle \
+ -Dtest.server.url=http://www.example.org:8080/stanbol
+
+_NOTES:_
+
+* With Java System properties are parsed using
'<code>-D{property}={value}</code>'
+* If you get OutOfMemory errors you might need to increase the memory of the
'<code>Xmx</code>' parameter of the '<code>MAVEN_OPTS</code>' system variable.
This might especially happen if you use RDF data for your test as those are
loaded into memory.
+
+## Test Data
+
+Thie tool supports two different test data formats
+
+### Plain Text Files
+
+All test data are within a single text file. Single texts are separated by two
(or more) empty lines.
+
+The following example includes three content items:
+
+ :::text
+ Astronomers discover largest star on record
+
+ European astronomers have discovered the largest star yet on record; it is
300 times the mass of our sun, beyond the previously accepted limit of 150
solar masses.
+
+ Paul Crowther, professor of astrophysics at [â¦]
+
+
+ Australian election debate moved to avoid clash with cookery show
+
+ A televised debate between Australia's candidates for Prime Minister [â¦]
+
+
+ The Only Joy In Town
+
+ by Joni Mitchell
+
+ I want to paint a picture
+ Botticelli * style
+ Instead of Venus on a clam *
+ I'd paint this flower child
+
+Plain text test data are read sequentially from the provided source. This
ensures that only ~100 content items are loaded into memory at any given time.
So this is the preferred option for large test data sets.
+
+Text files can recognized by the file ending "txt" to the parsed resource. For
resources with other engines the property
'<code>stanbol.it.multithreadtest.media-type=text/plain</code>' must be
specified. If the test data are not encoded using '<code>UTF-8</code>' the
charset MUST BE parsed by using the '<code>charset</code>' parameter (e.g.
'<code>stanbol.it.multithreadtest.media-type=text/plain;charset=iso-8859-7</code>').
+
+### RDF data
+
+The tool also allows to use RDF graphs as test data. This is mainly because in
a lot of cases it is the easiest to use RDF dumps of public datasets - such as
DBpedia.org - for testing. Users need to be aware that RDF data are imported
into an in-memory graph.
+
+Content Items are extracted by
+
+* Filtering Triples that use the value configured by
'<code>stanbol.it.multithreadtest.data-property</code>' as property
('<code>{prefix}:{local-name}</code>' is supported for registered prefixes). As
default '<code>http://dbpedia.org/ontology/abstract</code>' is used. If
'<code>*</code>' is configured than all triples are taken into account.
+* Filter Triples that have a Literal value as Object
+
+Supported RDF formats and mapped file endings:
+
+* '<code>application/rdf+xml</code>' - file endings '<code>.rdf</code>' and
'<code>.xml</code>'
+* '<code>text/turtle</code>' - file ending '<code>.ttl</code>'
+* '<code>application/x-turtle</code>' - no file endings
+* '<code>text/rdf+nt</code>' - file endings '<code>.nt</code>'
+* '<code>text/rdf+n3</code>' - file endings '<code>.n3</code>'
+* '<code>application/rdf+json</code>' - file endings '<code>.json</code>'
+
+If you want to use a different file ending you need to parse the Media-Type
using the '<code>stanbol.it.multithreadtest.media-type</code>' property
+
+### Support for compressed test data
+
+Bot plain text and RDF data can be efficiently compressed. Because of that
this utility also supports compressed files. The compression format is detected
by the file ending.
+
+Supported are
+
+* '<code>.gz</code>'
+* '<code>.bz2</code>'
+* '<code>.zip</code>' - only the first entry in the ZIP archive is processed
+
+Compressed files need to use double endings (e.g.
'<code>test-data.txt.gz</code>' or '<code>test-data.rdf.bz2</code>').
\ No newline at end of file