Author: rwesten
Date: Tue Jul 10 07:45:59 2012
New Revision: 1359510

URL: http://svn.apache.org/viewvc?rev=1359510&view=rev
Log:
ManagedSite documentation; Copied README.md from commons/solr to the webpage

Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
   (with props)
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
   (with props)
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
   (with props)
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext?rev=1359510&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
 Tue Jul 10 07:45:59 2012
@@ -0,0 +1,105 @@
+Title: ManagedSite
+
+A ManagedSite allow users to manage a collection of Entities by using the 
RESTful API of the Entityhub. Other than the ReferencedSite implementation it 
does not allow to refer to remote services. Therefor all changes to Entities 
managed by a ManagedSite are preformed via the RESTful API of the Entityhub.
+
+Users can configure multiple ManagedSites with the Stanbol Entitiyhub. They 
are identified by their id and share the id-space with other Sites (e.g. other 
ReferencedSite). The RESTful services of a ManagedSite are available via the 
URL pattern
+
+    http://{stanbol-instance}/entityhub/site/{siteId}
+
+_NOTE:_ To make this documentation less abstract it will use a scenario that 
assumes that someone wants to managing the [IPTC Descriptive 
NewsCodes](http://www.iptc.org/cms/site/index.html?channel=CH0103#descrncd) by 
using a ManagedSite. Typical Stanbol users will want to manage their own 
Entities (e.g. Tags/Categories of their CMS) instead.
+
+### Manage Entities by using RESTful services
+
+The RESTful API of Managed Sites is the same as of other Sites only the 
"/entity" Endpoint  does also support to create, update and delete Entities.
+
+The following Example shows how to upload a SKOS vocabulary to a ManagedSite:
+
+    :::bash
+    curl -i -X PUT -H "Content-Type: application/rdf+xml" -T subject-code.rdf \
+        "http://localhost:8080/site/iptc/entity";
+
+This example assumes that Stanbol is running on 'localhost' port '8080' and 
that a ManagedSite with the id 'iptc' was configured. The uploaded file 
'subject-code.rdf' contains the IPTC 
[subject-codes](http://cv.iptc.org/newscodes/subjectcode/). To upload also the 
vocabulary containing the [genre](http://cv.iptc.org/newscodes/genre/)s one 
needs to call
+
+    :::bash
+    curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf 
"http://localhost:8080/site/iptc/entity";
+
+Calls like that will create/update all Entities contained in the parsed RDF 
data. If one wants to ensure that only a single Entity is created/updated one 
can specify the 'id' parameter.
+
+    :::bash
+    curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf 
"http://localhost:8080/site/iptc/entity?id=http://cv.iptc.org/newscodes/genre/Exclusive";
+
+This will ignore all other RDF data but only update the 'genre:Exclusive' 
entity.
+
+For the full documentation of the CRUD interface of the '/entity' endpoint of 
a ManagedSite please have a look at the RESTful API documentation served by the 
Web UI of the Stanbol Entityhub.
+
+### Configuration of ManagedSites
+
+Currently their is a single implementation of the ManagesSite interface that 
uses a <code>Yard</code> instance for managing the entities.
+
+For using a YardSite users need to configure two Services:
+
+1. Yard: The Entityhub currently includes two different Yard implementations. 
The SolrYard and the ClerezzaYard. The SolrYard is optimal for the use with the 
Stanbol Enhancer as it allows very fast label based retrieval of Entities. So 
if you plan to use the ManagedSite primarily with the Stanbol Enhancer this is 
definitely the Yard implementation to choose. The ClerezzaYard stores the 
managed Entities within a TripleStore. While the ClerezzaYard is not as 
efficient for the use with the StanbolEnhancer its data can be queried by using 
the SPARQL endpoint of Apache Stanbol.
+2. YardSite: This configures the ManagedSite. This configuration links to the 
configured Yard via its id.
+
+#### Configuration of a SolrYard:
+
+This describes how to configure an SolrYard to be used with an YardSite by 
using the Configuration tab of the Apache Felix Webconsole 
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+![Typical SolrYard configuration for a 
YardSite](entityhub-manatedsite-solryard-config.png)
+
+The above figure shows a typical SolrYard configuration for a YardSite. 
Important properties are 
+
+* __ID__: This MUST BE unique to all other Yards. It is recommended to use 
"{siteId}Yard".
+* __Solr Index/Core__: This is the name of the SolrCore that will be used to 
store the data. Here it is recommended to use the same name as the {siteId}. 
This is because the RESTful API of the SolrCore is published under 
<code>http://{stanbol-instance}/solr/default/{solrCore}</code>. So using the 
same name as {siteId} and {solrCore} makes it easier for map the RESTful API of 
the SolrCore with the ManagedSite published under 
<code>http://{stanbol-instance}/entityhub/stite/{siteId}</code>.
+* __Use default SolrCore configuration__: If enabled the SolrCore will be 
automatically created by using the default configuration. Users will typically 
want to use this option. Only users that want to use a special SolrCore 
configuration will need to deactivate this option and to provide a 
<code>{solrCore}.solrindex.zip</code> archive containing the special 
configuration in the <code>{stanbol-workingdir}/stanbol/datafiles</code> 
directory. See the[Managing Solr 
Indexes](../utils/commons-solr.html#managingsolrindexes) section for detailed 
information. 
+
+#### Configuration of a ClerezzaYard:
+
+This describes how to configure an ClerezzaYard to be used with an YardSite by 
using the Configuration tab of the Apache Felix Webconsole 
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+![Typical ClerezzaYard configuration for a 
YardSite](entityhub-managedsite-clerezzayard-config.png)
+
+The above figure shows a typical ClerezzaYard configuration for a YardSite. 
Important properties are
+
+* __ID__: This MUST BE unique to all other Yards. It is recommended to use 
"{siteId}Yard".
+* __Graph URI__: This allows to configure the URI of the named graph used to 
store the RDF data. If a graph with this URL is already present than it will be 
reused by this Yard. Otherwise an empty graph with this URI is created using 
the Clerezza 
[TcManager](http://incubator.apache.org/clerezza/mvn-site/rdf.core/apidocs/org/apache/clerezza/rdf/core/access/TcManager.html).
 If this field is empty an URN will be used as default groph URI.
+
+The ClerezzaYard also registers the its RDF graph with the Apache Stanbol 
SPARQL service available at <code>http://{stanbol-instance}/sparql</code>
+
+To query the RDF graph of a ClerezzaYard you need to specify the its 
configured Graph URI in SPARQL queries posted to the Stanbol SPARQL endpoint
+
+    :::bash
+    curl -i -X POST -d "graphuri=http://cv.iptc.org/newscodes"; \
+        --data-urlencode "[email protected]" \
+        "http://localhost:8080/sparql";
+
+where 'sparqlQuery.txt' refers to a file containing the SPARQL query e.g.
+
+    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
+    SELECT distinct ?concept ?prefLabel ?altLabel ?parent
+    WHERE {
+        ?concept a skos:Concept .
+        ?concept skos:prefLabel ?prefLabel .
+        OPTIONAL {
+            ?concept skos:altLabel ?altLabel .
+        }
+    }
+
+#### Configuration of the YardSite
+
+Finally you need to configure the YardSite that uses the previously configured 
Yard instance (either SolrYard or ClerezzaYard). Again this will show how to 
configure the YardSite by using the Configuration tab of the Apache Felix 
Webconsole 
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+![Typical YardSite configuration](entityhub-managedsite-yardsite-config.png)
+
+The above figure shows the configuration of the YardSite. The important 
properties are
+
+* __ID__: This is the {siteId} used to map this ManagedSite to the RESTful API 
of the Stanbol Entityhub. Make sure that the ID is unique over all configured 
Sites.
+* __Yard ID__: Here you need to put the ID of the Yard configured in the 
previous step. If no Yard with that ID is active the ManagedSite will not be 
initialized and therefore be not available on the RESTful API
+
+The __Entity Prefix(es)__ are an optional configuration. This is used by the 
SiteManager (the "/entityhub/sites" endpoint) if requested entities can be 
dereferenced via a registered site. If not present the SiteManager will try to 
dereference every request by using this ManagedSite. So correctly configuring 
this may slightly improve performance by avoiding unnecessary requests.
+
+The __Field Mappings__ can be used to copy property values of created/updates 
Entities to other properties. The mappings used in the above figure ensure that 
SKOS preferred/alternate labels, FOAF (Friend of a Friend) names, Dublin Core 
titles as well as the name property of the schema.org ontology are copied over 
to rdfs:label. This configuration is the default as the Stanbol Enhancer uses 
<code>rdfs:label</code> as default property for linking entities based on their 
names.
+
+After completing all those steps you should see a new empty ManagedSite under
+
+    http://{stanbol-instance}/entityhub/site/iptc

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext?rev=1359510&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
 Tue Jul 10 07:45:59 2012
@@ -0,0 +1,311 @@
+Title: Stanbol Commons Solr
+
+Solr is used by several Apache Stanbol components. The Apache Stanbol Solr 
Commons artifacts provide a set of utilities that ease the use of Solr within 
OSGi, allow the initialization and management of Solr indexes as well as the 
publishing of Solrs RESTful interface on the OSGi HttpService.
+
+Although this utilities where implemented with the requirements of Apache 
Stanbol in mind they do not depend on other Stanbol components that are not 
themselves part of
+"stanbol.commons".
+
+
+## Solr OSGi Bundle
+
+The "org.apache.commons.solr.core" bundle currently includes all dependencies 
required by Solr and also exports the client as well as the server API. For 
details please have a look at the pom file of the "solr.core" artifact.
+
+Please note also the exclusion list, because some libraries currently not 
directly used by Stanbol are explicitly excluded. Using such features within a 
"solrConf.xml" or "schema.xml" will result in "ClassNotFoundException" and 
"ClassNotFoundErrors".
+
+If you require an additional Library that is currently not included please 
give us a short notice on the stanbol-dev mailing list.
+
+
+## Solr Server Components
+
+This section provides information how to managed and get access to the server 
side CoreContainer and SolrCore components of Solr.
+
+
+### Accessing CoreContainers and SolrCores
+
+All CoreContainer and SolrCores initialized by the Stanbol Solr framework are 
registered with the OSGi Service Registry. This means that other Bundels can 
obtain them by using
+
+    CoreContainer defaultSolrServer;
+    ServiceReference ref = bundleContext.getServiceReference(
+        CoreContainer.class.getName())
+    if (ref != null) {
+        defaultSolrServer = (CoreContainer) bundleContext.getService(ref);
+    } else {
+        defaultSolrServer = null; //no SolrServer available
+    }
+
+It is also possible to track service registration and unregistration events by 
using the OSGi ServiceTracker utility.
+
+The above Code snippet would always return the SolrServer with the highest 
priority (the highest value for the "service.ranking" property). However the 
OSGi Service Registry allows also to obtain/track service by the usage of 
filters. For specifying such filters it is important to know what metadata are 
provided when services are registered with the OSGi Service Registry.
+
+
+#### Metadata for CoreContainer:
+
+* **org.apache.solr.core.CoreContainer.name**: The name of the SolrServer. The 
name MUST BE provided for each Solr CoreContainer registered with this 
framework. It is a required field for each configuration. If two CoreContainers 
are registered with the same name the "service.ranking" property shall be used 
to determine the current active CoreContainer for an request. However others 
registered for the same name may be used as fallbacks. The container name is 
used as a URL path component when the `publishREST` parameter is true. It is 
recommended to use lowercase names without non ASCII characters.
+* **org.apache.solr.core.CoreContainer.dir**: The directory of a 
CoreContainer. This is the directory containing the "solr.xml" file.
+* **org.apache.solr.core.CoreContainer.solrXml**: The name of the Solr 
CoreContainer configuration file. Currently always "sold.xml".
+* **org.apache.solr.core.CoreContainer.cores**: A read only collection of the 
names of all cores registered with the CoreContainer.
+* **service.ranking**: The OSGi "service.ranking" property is used to specify 
the ranking of a CoreContainer. The CoreContainer with the highest ranking is 
considered as the default server and will be returned by calls to 
bundleContext.getServiceReference(..) without the use of an filter.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that 
allows to enable/disable the publishing of the Solr RESTful API on 
"http://{host}:{port}/solr/{server-name}";. Requires the 
"SolrServerPublishingComponent" to be active.
+
+
+#### Metadata for SolrCores:
+
+* **org.apache.solr.core.SolrCore.name**: The name of the SolrCore as 
registered with the CoreContainer
+* **org.apache.solr.core.SolrCore.dir**: The instance directory of the SolrCore
+* **org.apache.solr.core.SolrCore.datadir**: The data directory of the SolrCore
+* **org.apache.solr.core.SolrCore.indexdir**: The directory of the index used 
by this SolrCore
+* **org.apache.solr.core.SolrCore.schema**: The name (excluding the directory) 
of the Solr schema used by this core
+* **org.apache.solr.core.SolrCore.solrconf**: The name (excluding the 
directory) of the Solr core configuration file
+
+In addition the following metadata of the CoreContainer for this SolrCore are 
also available
+
+* **org.apache.solr.core.CoreContainer.id**: The `SERVICE_ID` of the 
CoreContainer this SolrCore is registered with. This is usually the easiest way 
to obtain the ServiceReference to the CoreContainer of an SolrCore.
+* **org.apache.solr.core.CoreContainer.name**: The name of the CoreContainer 
this SolrCore is registered with. Note that multiple CoreContainers may be 
registered for the same name. Therefore this property MUST NOT be used to 
filter for the ServiceReference to the CoreContainer of an SolrCore.
+* **org.apache.solr.core.CoreContainer.dir**: The Solr directory of the 
CoreContainer for this SolrCore.
+* **service.ranking**: The OSGi service.ranking of the CoreContainer this 
SolrCore is registered with. SolrCores do not define there own service.ranking 
but use the ranking of the CoreContainer they are registered with.
+
+The the mentioned keys used for metadata of registered CoreContainer and 
SolrCores are defined as public constants in the 
[SolrConstants](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/solr/core/src/main/java/org/apache/stanbol/commons/solr/SolrConstants.java)
 class.
+
+
+### ReferencedSolrServer
+
+This component allows to initialize a Solr server running within the same JVM 
as Stanbol based on indexes provided by a directory on the local file system. 
This does not support management capabilities, but it initializes a Solr 
CoreContainer based on the data in the file system and registers it (including 
all SolrCores) with the OSGi Service Registry as described above.
+
+The ReferencedSolrServer uses the ManagedServiceFactory pattern. This means 
that instances are created by parsing configurations to the OSGi 
ConfigurationAdmin service. Practically this means that:
+
+* users can create instances by using the Configuration tab of the Apache 
Felix Web Console
+* programmers can directly use the ConfigurationAdmin service to create/update 
and delete configurations
+* Configurations can also parsed via the Apache Sling [OSGi 
installer](http://sling.apache.org/site/osgi-installer.html) framework. Meaning 
configurations can be includes within the Stanbol launchers, Bundles or copied 
to a directory configured for the [File 
Provider](http://svn.apache.org/repos/asf/sling/trunk/installer/providers/file/)
+
+Configurations need to include the following properties (see also section 
"Metadata for CoreContainer" for details about such properties)
+
+* **org.apache.solr.core.CoreContainer.name**: The name for the Solr Server
+* **org.apache.solr.core.CoreContainer.dir**: The path to the directory on the 
local file system that is used to initialize the CoreContainer
+* **service.ranking**: The OSGi service ranking used to register the 
CoreContainer and its SolrCores. If not specified '0' will be used as default. 
The value MUST BE an integer number.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that 
allows to enable/disable the publishing of the Solr RESTful API on 
"http://{host}:{port}/solr/{server-name}";. Requires the 
"SolrServerPublishingComponent" to be active.
+
+**NOTE:** Keep in mind that of the RESTful API of the SolrServer is published 
users might use the Admin Request handler to manipulate the SolrConfiguration. 
In such cases the metadata provided by the ServiceReferences for the 
CoreContainer and SolrCores might get out of sync with the actual configuration 
of the Server.
+
+
+### ManagedSolrServer
+
+This component allows to manage a multi core Solr server. It provides an API 
to create, update and remove SolrCores. In addition cores can be activated and 
deactivated.
+
+
+#### Creating ManagedServerInstances
+
+The ManagedSolrServer uses the ManagedServiceFactory pattern. This means that 
instances are created by parsing configurations to the OSGi ConfigurationAdmin 
service. Practically this means that:
+
+* users can create instances by using the Configuration tab of the Apache 
Felix Web Console
+* programmers can directly use the ConfigurationAdmin service to create/update 
and delete configurations
+* Configurations can also parsed via the Apache Sling [OSGi 
installer](http://sling.apache.org/site/osgi-installer.html) framework. Meaning 
configurations can be includes within the Stanbol launchers, Bundles or copied 
to a directory configured for the [File 
Provider](http://svn.apache.org/repos/asf/sling/trunk/installer/providers/file/)
+
+Configurations need to include the following properties (see also section 
"Metadata for CoreContainer" for details about such properties). Although the 
properties are the same as for the ReferencedSolrServer their semantics differs 
in some aspects.
+
+* **org.apache.solr.core.CoreContainer.name**: The name for the Solr Server
+* **org.apache.solr.core.CoreContainer.dir**: Optionally an directory to store 
the data. If not specified the data will be stored in an directory with the 
configured server-name at the default location (currently 
"${sling.home}/indexes/" or "indexes/" if the environment variable 'sling.home' 
is not present). Users that want to create multiple ManagedSolrServer with the 
same name need to specify the directory or servers will override each others 
data.
+* **service.ranking**: The OSGi service ranking used to register the 
CoreContainer and its SolrCores. If not specified '0' will be used as default. 
The value MUST BE an integer number. In scenarios where a single 
ManagedSolrServer is expected it is highly recommended to specify 
`Integer.MAX_VALUE` (2147483647) as service ranking. This will ensure that this 
server can not be overridden by others.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that 
allows to enable/disable the publishing of the Solr RESTful API on 
"http://{host}:{port}/solr/{server-name}";. Requires the 
"SolrServerPublishingComponent" to be active.
+
+**NOTE:** Keep in mind that of the RESTful API of the SolrServer is published 
users might use the Admin Request handler to manipulate the SolrConfiguration. 
In such cases the metadata provided by the ServiceReferences for the 
CoreContainer and SolrCores might get out of sync with the actual configuration 
of the Server.
+
+
+#### Managing Solr Indexes
+
+This describes how to manage (create, update, remove, activate, deactivate) 
Indexes on a ManagedSolrServer.
+
+Managed Indexes do not 1:1 correspond to SolrCores registered on the 
CoreContainer. However all SolrCores on the CoreContainer do have a 1:1 mapping 
with a managed index on the Managed SolrServer.
+
+Managed Index can be in one of the following States (defined by the 
ManagedIndexState enumeration):
+
+* **UNINITIALISED**: An index that was created but is still missing the 
configuration and/or index data is in that state. The ManagedSolrServer API 
allows to create indexes by referring to a Solr-Index-Archive. Such archives 
are than requested via the Stanbol DataFileProvider service. Usually users can 
provide them by copying the lined index to the "/sling/datafiles" folder.
+* **INACTIVE**: This indicated that an index is was deactivated via the 
ManagedSolrServer API. The data are still kept, but the SolrCore was removed 
from the CoreContainer.
+* **ACTIVE**: This indicates that an index is active and can be used. Only 
Indexes that are ACTIVE are registered with the CoreContainer.
+* **ERROR**: This state indicates some error during the the initialization. 
The stack trace of the error is available in the IndexMetadata.
+
+Indexes can not only be managed by calls to the API of the ManagedSolrServer. 
The "org.apache.stanbol.commons.solr.install" bundle provides also support for 
installing/uninstalling indexes by using the Apache Sling [OSGi 
installer](http://sling.apache.org/site/osgi-installer.html) framework. This 
allows to install indexes by providing Solr-Index-Archives or 
Solr-Index-Archive-References to any available Provider. By default Apache 
Stanbol includes Provider for the Launchers and Bundles. However the Sling 
Installer Framework also includes Providers for Directories on the File and JCR 
Repositories.
+
+Solr-Index-Archives do use the following name pattern:
+
+    {name}.solrindex[.zip|.gz|.bz2]
+
+* They are normal achieves starting with the instance directory of a Solr Core.
+* The name of this instance directory MUST BE the same as the {name} of the 
archive.
+* The second extensions specifies the type of the archive. If no extension is 
specified the type of the Archive might still be detected by reading the first 
few bytes of the Archive.
+
+Solr-Index-Archive-References are normal Java properties files and do use the 
following name pattern:
+
+    {name}.solrindex.ref
+
+The following keys are used (see also 
org.apache.stanbol.commons.solr.managed.ManagedIndexConstants):
+
+* **Index-Archive**: Comma separated list of Solr-Index-Archives that can be 
used for initializing this index. The first index archive in the list has the 
highest priority. Higher priority archives will replace the data of lower 
priority once as soon as they become available. This feature is intended to be 
used to allow the replacement of a small sample dataset (e.g. shipped within a 
Bundle or the Launcher) with the full dataset download later from a remote 
Internet archive or pushed manually to the `sling/datafiles` folder of a 
previously installed Stanbol instance. For instance the `dbpedia.solrindex.ref` 
archive reference configuration provided in the default launcher has the line: 
`Index-Archive=dbpedia.solrindex.zip,dbpedia_43k.solrindex.zip` and only 
`dbpedia_43k.solrindex.zip` is shipped in the default launchers allowing for 
override by any archive named `dbpedia.solrindex.zip`.
+* **Index-Name**: The name of the Index. If not specified the {name} part of 
the first Index-Archive in the list will be used.
+* **Server-Name**: The name of the ManagedSolrServer this Solr index MUST BE 
deployed on. If not present it will be deployed on the default 
ManagedSolrServer (the ManagedSolrServer with the highest priority.
+* **Synchronized**: Boolean switch. If enabled the index will be synchronized 
with the referenced Solr-Index-Archives. That means the DataFileTracker service 
will be used to periodically track the states of referenced 
Solr-Index-Archives. This allows to initialize/update and uninitialise managed 
Solr indexes by simple making Solr-Index-Archives un-/available to the 
DataFileProvider infrastructure (such as Users copying/deleting files in the 
"/sling/datafiles" directory).
+* **other Properties**: All parsed properties are forwarded to the 
DataFileProvider/DataFileTracker service when looking for the referenced 
Solr-Index-Archives. This components might also define some special keys 
associated with specific functionalities. Please look at the documentation of 
this services for details.
+
+
+#### Other interesting Notes
+
+* SolrCore directory names created by the ManagedSolrServer use the current 
date as suffix. If a directory with that name already exists (e.g. because the 
same index was already updated on the very same day) than an additional 
"-{count}" suffix will be added to the end.
+* The Managed SolrServer stores its configuration within the persistent space 
of the Bundle provided by the OSGi environment. When using one of the default 
Stanbol launchers this is within "{sling.home}/felix/bundle{bundle-id}/data". 
The "{bundle-id}" of the "org.apache.stanbol.commons.solr.managed" bundle can 
be looked up the the [Bundle tab](http://localhost:8080/system/console/bundles) 
of the Apache Felix Webconsole. The actual configuration of a ManagedSolrServer 
is than in ".config/index-config/{service.pid}". The "{service.pid}" can be 
also looked up via the Apache Felix Web-console in the [Configuration Status 
tab](http://localhost:8080/system/console/config). Within this folder the Solr 
index reference files (normal java properties files) with all the information 
about the current state of the managed indexes are present.
+* Errors that occur during the asynchronous initialization of SolrCores are 
stored within the IndexingProperties. They can therefore be requested via the 
API of the ManagedSolrServer but also be looked up within the persistent state 
of the ManagedSolrServer (see above where such files are located).
+
+
+## Solr Client Components
+
+This sections describes how to use Solr servers and indexes referenced and 
managed by the "org.apache.stanbol.commons.solr" framework.
+Principally there are two possibilities: (1) to directly access Solr indexes 
via the SolrServer Java API and (2) to publish locally managed index on the 
OSGi HttpService and than use such indexes via the Solr RESTful API.
+
+The Stanbol Solr framework does not provide utilities for accessing remote 
Solr servers, because this is already easily possible by using SolrJ.
+
+
+### Java API
+
+This describes how to lookup and access a Solr Server initialized by the 
"org.apache.stanbol.commons.solr" framework. The client side Java API of Solr 
is defined by the SolrServer abstract class. The implementation used for 
accessing a SolrCore running in the same JVM is the EmbeddedSolrServer.
+
+All Solr server (CoreContainer) and Solr indexes (SolrCore) initialized by the 
ReferencedSolrServer and/or ManagedSolrServer are registered with the OSGi 
service registry. More information about this can be found in the first part of 
the "Solr Server Components" of this documentation.
+
+OSGi already provides APIs and utilities to lookup and track registered 
services. In the following I will provide some examples how to lookup 
SolrServers registered as OSGi services.
+
+
+#### IndexReference
+
+The IndexReference is a Java class that manages a reference to an Index. It 
defines a constructor that takes a serverName and coreName. In addition there 
is a static parse(String ref) method that takes
+
+* file URLs
+* file paths and
+* [server-name:]core-name like references.
+
+The IndexMetadata class also defines a getter to get the IndexReference.
+
+One feature of the IndexReference is also that it provides getters of Filters 
as used to lookup/track the referenced CoreContainer/SolrCore in the OSGi 
service Registry. The returned filter include the constraint for the registered 
interface (OBJECTCLASS). Therefore when using this filters one can parse NULL 
for the class parameter
+
+To lookup the CoreContainer of the referenced index:
+
+    bundleContext.getServiceReferences(null, indexReference.getServerFilter());
+
+To lookup the SolrCore for the referenced index:
+
+    bundleContext.getServiceReferences(null, indexReference.getIndexFilter());
+
+
+#### Lookup Solr Indexes
+
+This example shows how to lookup the default CoreContainer and create a 
SolrServer for the core "mydata".
+
+    ComponentContext context; // typically passed to the activate method
+    BundleContext bc = context.getBundleContext();
+    ServiceReference coreContainerRef =
+        bc.getServiceReference(CoreContainer.class.getName());
+    CoreContainer coreContainer = (CoreContainer) 
bc.getService(coreContainerRef)
+    SolrServer server = new EmbeddedSolrServer(coreContainer, "mydata");
+
+Now there might be cases where several CoreContainers are available and 
"mydata" is not available on the default one. The "default" refers to the one 
with the highest "service.ranking" value. In this case we need to know a 
available property we can use to filter for the right CoreContainer. In this 
case we assume the index is on a CoreContainer registered with the name 
"myserver".
+
+    ComponentContext context; // typically passed to the activate method
+    BundleContext bc = context.getBundleContext();
+
+    // Now let's use the IndexReference to create the filter
+    IndexReference indexRef = new IndexReference("myserver", "mydata");
+    ServiceReference[] coreContainerRefs = bc.getServiceReferences(
+        null, indexRef.getServerFilter());
+
+    // TODO: check that coreContainerRefs != null AND not empty!
+    // Now we have all References to CoreContainers with the name "myserver"
+    // Yes one can register several for the same name (e.g. to have fallbacks)
+    // let get the one with the highest service.ranking
+    Arrays.sort(coreContainerRefs, ServiceReferenceRankingComparator.INSTANCE);
+
+    // Create the SolrServer (same as above)
+    CoreContainer coreContainer = (CoreContainer) 
bc.getService(coreContainerRefs[0])
+    SolrServer server = new EmbeddedSolrServer(coreContainer, 
indexRef.getIndex());
+
+In cases where one only knows the name of the SolrCore (and not the 
CoreContainer) the initialization looks like this.
+
+    ComponentContext context; // typically passed to the activate method
+    BundleContext bc = context.getBundleContext();
+    String nameFilter = String.format("(%s=%s)", 
SolrConstants.PROPERTY_CORE_NAME, "mydata");
+    ServiceReference[] solrCoreRefs = bc.getServiceReferences(
+        SolrCore.class.getName(), nameFilter);
+
+    // TODO: check that != null AND not empty!
+    // Now we have all References to CoreContainer with a SolrCore "mydata"
+    // let get the one with the highest service.ranking
+    Arrays.sort(solrCoreRefs, ServiceReferenceRankingComparator.INSTANCE);
+
+    // Now get the SolrCore and create the SolrServer
+    SolrCore core = (SolrCore) bc.getService(solrCoreRefs[0]);
+
+    // core.getCoreDescriptor() might be null if SolrCore is not
+    // registered with a CoreContainer
+    SolrServer server = new EmbeddedSolrServer(
+        core.getCoreDescriptor().getCoreContainer(), "mydata");
+
+
+#### Tracking Solr Indexes
+
+The above examples do a lookup at a single point in time. However because OSGi 
is an dynamic environment where services can come the go at every time in most 
cases users might rather want to track services. To do this OSGi provides the 
ServiceTracker utility.
+
+To ease the tracking of SolrServers the "org.apache.stanbol.commons.solr.core" 
bundle provides the RegisteredSolrServerTracker. The following examples show 
how to create a Managed SolrIndex and than track the SolrServer.
+
+First during the activation we need to check if "mydata" is already created 
and create it if not. Than we can start tracking the index:
+
+    BundleContext bc;
+    // The ManagedSolrServer instance can be looked up manually using a service
+    // reference or using declarative services / SCR injection
+    IndexMetadata metadata = managedServer.getIndexMetadata("mydata");
+    if (metadata == null) {
+        // No index with that name:
+        // Asynchronously init the index as soon as the solrindex archive is 
available
+        metadata = managedServer.createSolrIndex("mydata", 
"mydata.solrindex.zip", null);
+    }
+    RegisteredSolrServerTracker indexTracker =
+        new RegisteredSolrServerTracker(bc, metadata.getIndexReference());
+
+    // Do not forget to close the tracker while deactivating
+    indexTracker.open();
+
+Now every time we need the SolrServer we can retrieve it from the indexTracker
+
+    private SolrServer getServer() {
+        SolrServer server = indexTracker.getService();
+        if(server == null) {
+            // Report the missing server
+            throw new IllegalStateException("Server 'mydata' not active");
+        } else {
+            return server;
+        }
+    }
+
+The RegisteredSolrServerTracker does take "service.ranking" into account. So 
if there are more Services available that match the passed IndexReference those 
methods will always return the one with the highest "service.ranking". In case 
arrays are returned such arrays are sorted accordingly.
+
+
+### RESTful API
+
+The following describes how to publish the RESTful API of CoreContainer 
registered as OSGi services on the OSGi HttpService. The functionality 
described in this section is provided by the 
"org.apache.stanbol.commons.solr.web" artifact.
+
+
+#### SolrServerPublishingComponent
+
+This is an OSGi component that starts immediate and does not require a 
configuration. Its main purpose is to track all CoreContainers with the 
property "org.apache.solr.core.CoreContainer.publishREST=true". For all such 
CoreContainers it publishes the RESTful API under the URL
+
+    http://{host}:{port}/solr/{server-name}
+
+If two CoreContainers with the same {server-name} (the value of the 
"org.apache.solr.core.CoreContainer.name" property) are registered the one with 
the highest "service.ranking" is published.
+
+The root-prefix ("/solr" by default) can be configured by setting the 
"org.apache.stanbol.commons.solr.web.dispatchfilter.prefix" property.
+
+
+#### SolrDispatchFilterComponent
+
+This Component provides the same functionality as the 
SolrServerPublishingComponent, but can be configured specifically for a 
CoreContainer. It is intended to be used if one wants to publish the RESTful 
API of a specific CoreContainer under a specific location. To deactivate the 
publishing of the same core on the SolrServerPublishingComponent users need to 
set the "org.apache.solr.core.CoreContainer.publishREST" to false.
+
+This component is configured by two properties
+
+* **org.apache.stanbl.commons.solr.web.dispatchfilter.name**: The 
{server-name} of the CoreContainer to publish ({server-name} refers to the 
value of the "org.apache.solr.core.CoreContainer.name" property).
+* **org.apache.stanbl.commons.solr.web.dispatchfilter.prefix**: The prefix 
path to publish the server. The {server-name} is NOT appended to the configured 
prefix. Note that a Servlet Filter with `{prefix}/.*` is registered with the 
OSGi HttpService.
+
+If two CoreContainers with the same {server-name} (the value of the 
"org.apache.solr.core.CoreContainer.name" property) are registered the one with 
the highest "service.ranking" is published.
+


Reply via email to