Author: rwesten
Date: Tue Jul 10 07:45:59 2012
New Revision: 1359510
URL: http://svn.apache.org/viewvc?rev=1359510&view=rev
Log:
ManagedSite documentation; Copied README.md from commons/solr to the webpage
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
(with props)
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
(with props)
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
(with props)
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-clerezzayard-config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-managedsite-yardsite-config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png?rev=1359510&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/entityhub-manatedsite-solryard-config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext?rev=1359510&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/entityhub/managedsite.mdtext
Tue Jul 10 07:45:59 2012
@@ -0,0 +1,105 @@
+Title: ManagedSite
+
+A ManagedSite allow users to manage a collection of Entities by using the
RESTful API of the Entityhub. Other than the ReferencedSite implementation it
does not allow to refer to remote services. Therefor all changes to Entities
managed by a ManagedSite are preformed via the RESTful API of the Entityhub.
+
+Users can configure multiple ManagedSites with the Stanbol Entitiyhub. They
are identified by their id and share the id-space with other Sites (e.g. other
ReferencedSite). The RESTful services of a ManagedSite are available via the
URL pattern
+
+ http://{stanbol-instance}/entityhub/site/{siteId}
+
+_NOTE:_ To make this documentation less abstract it will use a scenario that
assumes that someone wants to managing the [IPTC Descriptive
NewsCodes](http://www.iptc.org/cms/site/index.html?channel=CH0103#descrncd) by
using a ManagedSite. Typical Stanbol users will want to manage their own
Entities (e.g. Tags/Categories of their CMS) instead.
+
+### Manage Entities by using RESTful services
+
+The RESTful API of Managed Sites is the same as of other Sites only the
"/entity" Endpoint does also support to create, update and delete Entities.
+
+The following Example shows how to upload a SKOS vocabulary to a ManagedSite:
+
+ :::bash
+ curl -i -X PUT -H "Content-Type: application/rdf+xml" -T subject-code.rdf \
+ "http://localhost:8080/site/iptc/entity"
+
+This example assumes that Stanbol is running on 'localhost' port '8080' and
that a ManagedSite with the id 'iptc' was configured. The uploaded file
'subject-code.rdf' contains the IPTC
[subject-codes](http://cv.iptc.org/newscodes/subjectcode/). To upload also the
vocabulary containing the [genre](http://cv.iptc.org/newscodes/genre/)s one
needs to call
+
+ :::bash
+ curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf
"http://localhost:8080/site/iptc/entity"
+
+Calls like that will create/update all Entities contained in the parsed RDF
data. If one wants to ensure that only a single Entity is created/updated one
can specify the 'id' parameter.
+
+ :::bash
+ curl -i -X PUT -H "Content-Type: application/rdf+xml" -T genre.rdf
"http://localhost:8080/site/iptc/entity?id=http://cv.iptc.org/newscodes/genre/Exclusive"
+
+This will ignore all other RDF data but only update the 'genre:Exclusive'
entity.
+
+For the full documentation of the CRUD interface of the '/entity' endpoint of
a ManagedSite please have a look at the RESTful API documentation served by the
Web UI of the Stanbol Entityhub.
+
+### Configuration of ManagedSites
+
+Currently their is a single implementation of the ManagesSite interface that
uses a <code>Yard</code> instance for managing the entities.
+
+For using a YardSite users need to configure two Services:
+
+1. Yard: The Entityhub currently includes two different Yard implementations.
The SolrYard and the ClerezzaYard. The SolrYard is optimal for the use with the
Stanbol Enhancer as it allows very fast label based retrieval of Entities. So
if you plan to use the ManagedSite primarily with the Stanbol Enhancer this is
definitely the Yard implementation to choose. The ClerezzaYard stores the
managed Entities within a TripleStore. While the ClerezzaYard is not as
efficient for the use with the StanbolEnhancer its data can be queried by using
the SPARQL endpoint of Apache Stanbol.
+2. YardSite: This configures the ManagedSite. This configuration links to the
configured Yard via its id.
+
+#### Configuration of a SolrYard:
+
+This describes how to configure an SolrYard to be used with an YardSite by
using the Configuration tab of the Apache Felix Webconsole
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+
+
+The above figure shows a typical SolrYard configuration for a YardSite.
Important properties are
+
+* __ID__: This MUST BE unique to all other Yards. It is recommended to use
"{siteId}Yard".
+* __Solr Index/Core__: This is the name of the SolrCore that will be used to
store the data. Here it is recommended to use the same name as the {siteId}.
This is because the RESTful API of the SolrCore is published under
<code>http://{stanbol-instance}/solr/default/{solrCore}</code>. So using the
same name as {siteId} and {solrCore} makes it easier for map the RESTful API of
the SolrCore with the ManagedSite published under
<code>http://{stanbol-instance}/entityhub/stite/{siteId}</code>.
+* __Use default SolrCore configuration__: If enabled the SolrCore will be
automatically created by using the default configuration. Users will typically
want to use this option. Only users that want to use a special SolrCore
configuration will need to deactivate this option and to provide a
<code>{solrCore}.solrindex.zip</code> archive containing the special
configuration in the <code>{stanbol-workingdir}/stanbol/datafiles</code>
directory. See the[Managing Solr
Indexes](../utils/commons-solr.html#managingsolrindexes) section for detailed
information.
+
+#### Configuration of a ClerezzaYard:
+
+This describes how to configure an ClerezzaYard to be used with an YardSite by
using the Configuration tab of the Apache Felix Webconsole
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+
+
+The above figure shows a typical ClerezzaYard configuration for a YardSite.
Important properties are
+
+* __ID__: This MUST BE unique to all other Yards. It is recommended to use
"{siteId}Yard".
+* __Graph URI__: This allows to configure the URI of the named graph used to
store the RDF data. If a graph with this URL is already present than it will be
reused by this Yard. Otherwise an empty graph with this URI is created using
the Clerezza
[TcManager](http://incubator.apache.org/clerezza/mvn-site/rdf.core/apidocs/org/apache/clerezza/rdf/core/access/TcManager.html).
If this field is empty an URN will be used as default groph URI.
+
+The ClerezzaYard also registers the its RDF graph with the Apache Stanbol
SPARQL service available at <code>http://{stanbol-instance}/sparql</code>
+
+To query the RDF graph of a ClerezzaYard you need to specify the its
configured Graph URI in SPARQL queries posted to the Stanbol SPARQL endpoint
+
+ :::bash
+ curl -i -X POST -d "graphuri=http://cv.iptc.org/newscodes" \
+ --data-urlencode "[email protected]" \
+ "http://localhost:8080/sparql"
+
+where 'sparqlQuery.txt' refers to a file containing the SPARQL query e.g.
+
+ PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
+ SELECT distinct ?concept ?prefLabel ?altLabel ?parent
+ WHERE {
+ ?concept a skos:Concept .
+ ?concept skos:prefLabel ?prefLabel .
+ OPTIONAL {
+ ?concept skos:altLabel ?altLabel .
+ }
+ }
+
+#### Configuration of the YardSite
+
+Finally you need to configure the YardSite that uses the previously configured
Yard instance (either SolrYard or ClerezzaYard). Again this will show how to
configure the YardSite by using the Configuration tab of the Apache Felix
Webconsole
[http://{stanbol-instance}/system/console/configMgr](http://localhost:8080/system/console/configMgr).
+
+
+
+The above figure shows the configuration of the YardSite. The important
properties are
+
+* __ID__: This is the {siteId} used to map this ManagedSite to the RESTful API
of the Stanbol Entityhub. Make sure that the ID is unique over all configured
Sites.
+* __Yard ID__: Here you need to put the ID of the Yard configured in the
previous step. If no Yard with that ID is active the ManagedSite will not be
initialized and therefore be not available on the RESTful API
+
+The __Entity Prefix(es)__ are an optional configuration. This is used by the
SiteManager (the "/entityhub/sites" endpoint) if requested entities can be
dereferenced via a registered site. If not present the SiteManager will try to
dereference every request by using this ManagedSite. So correctly configuring
this may slightly improve performance by avoiding unnecessary requests.
+
+The __Field Mappings__ can be used to copy property values of created/updates
Entities to other properties. The mappings used in the above figure ensure that
SKOS preferred/alternate labels, FOAF (Friend of a Friend) names, Dublin Core
titles as well as the name property of the schema.org ontology are copied over
to rdfs:label. This configuration is the default as the Stanbol Enhancer uses
<code>rdfs:label</code> as default property for linking entities based on their
names.
+
+After completing all those steps you should see a new empty ManagedSite under
+
+ http://{stanbol-instance}/entityhub/site/iptc
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext?rev=1359510&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/utils/commons-solr.mdtext
Tue Jul 10 07:45:59 2012
@@ -0,0 +1,311 @@
+Title: Stanbol Commons Solr
+
+Solr is used by several Apache Stanbol components. The Apache Stanbol Solr
Commons artifacts provide a set of utilities that ease the use of Solr within
OSGi, allow the initialization and management of Solr indexes as well as the
publishing of Solrs RESTful interface on the OSGi HttpService.
+
+Although this utilities where implemented with the requirements of Apache
Stanbol in mind they do not depend on other Stanbol components that are not
themselves part of
+"stanbol.commons".
+
+
+## Solr OSGi Bundle
+
+The "org.apache.commons.solr.core" bundle currently includes all dependencies
required by Solr and also exports the client as well as the server API. For
details please have a look at the pom file of the "solr.core" artifact.
+
+Please note also the exclusion list, because some libraries currently not
directly used by Stanbol are explicitly excluded. Using such features within a
"solrConf.xml" or "schema.xml" will result in "ClassNotFoundException" and
"ClassNotFoundErrors".
+
+If you require an additional Library that is currently not included please
give us a short notice on the stanbol-dev mailing list.
+
+
+## Solr Server Components
+
+This section provides information how to managed and get access to the server
side CoreContainer and SolrCore components of Solr.
+
+
+### Accessing CoreContainers and SolrCores
+
+All CoreContainer and SolrCores initialized by the Stanbol Solr framework are
registered with the OSGi Service Registry. This means that other Bundels can
obtain them by using
+
+ CoreContainer defaultSolrServer;
+ ServiceReference ref = bundleContext.getServiceReference(
+ CoreContainer.class.getName())
+ if (ref != null) {
+ defaultSolrServer = (CoreContainer) bundleContext.getService(ref);
+ } else {
+ defaultSolrServer = null; //no SolrServer available
+ }
+
+It is also possible to track service registration and unregistration events by
using the OSGi ServiceTracker utility.
+
+The above Code snippet would always return the SolrServer with the highest
priority (the highest value for the "service.ranking" property). However the
OSGi Service Registry allows also to obtain/track service by the usage of
filters. For specifying such filters it is important to know what metadata are
provided when services are registered with the OSGi Service Registry.
+
+
+#### Metadata for CoreContainer:
+
+* **org.apache.solr.core.CoreContainer.name**: The name of the SolrServer. The
name MUST BE provided for each Solr CoreContainer registered with this
framework. It is a required field for each configuration. If two CoreContainers
are registered with the same name the "service.ranking" property shall be used
to determine the current active CoreContainer for an request. However others
registered for the same name may be used as fallbacks. The container name is
used as a URL path component when the `publishREST` parameter is true. It is
recommended to use lowercase names without non ASCII characters.
+* **org.apache.solr.core.CoreContainer.dir**: The directory of a
CoreContainer. This is the directory containing the "solr.xml" file.
+* **org.apache.solr.core.CoreContainer.solrXml**: The name of the Solr
CoreContainer configuration file. Currently always "sold.xml".
+* **org.apache.solr.core.CoreContainer.cores**: A read only collection of the
names of all cores registered with the CoreContainer.
+* **service.ranking**: The OSGi "service.ranking" property is used to specify
the ranking of a CoreContainer. The CoreContainer with the highest ranking is
considered as the default server and will be returned by calls to
bundleContext.getServiceReference(..) without the use of an filter.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that
allows to enable/disable the publishing of the Solr RESTful API on
"http://{host}:{port}/solr/{server-name}". Requires the
"SolrServerPublishingComponent" to be active.
+
+
+#### Metadata for SolrCores:
+
+* **org.apache.solr.core.SolrCore.name**: The name of the SolrCore as
registered with the CoreContainer
+* **org.apache.solr.core.SolrCore.dir**: The instance directory of the SolrCore
+* **org.apache.solr.core.SolrCore.datadir**: The data directory of the SolrCore
+* **org.apache.solr.core.SolrCore.indexdir**: The directory of the index used
by this SolrCore
+* **org.apache.solr.core.SolrCore.schema**: The name (excluding the directory)
of the Solr schema used by this core
+* **org.apache.solr.core.SolrCore.solrconf**: The name (excluding the
directory) of the Solr core configuration file
+
+In addition the following metadata of the CoreContainer for this SolrCore are
also available
+
+* **org.apache.solr.core.CoreContainer.id**: The `SERVICE_ID` of the
CoreContainer this SolrCore is registered with. This is usually the easiest way
to obtain the ServiceReference to the CoreContainer of an SolrCore.
+* **org.apache.solr.core.CoreContainer.name**: The name of the CoreContainer
this SolrCore is registered with. Note that multiple CoreContainers may be
registered for the same name. Therefore this property MUST NOT be used to
filter for the ServiceReference to the CoreContainer of an SolrCore.
+* **org.apache.solr.core.CoreContainer.dir**: The Solr directory of the
CoreContainer for this SolrCore.
+* **service.ranking**: The OSGi service.ranking of the CoreContainer this
SolrCore is registered with. SolrCores do not define there own service.ranking
but use the ranking of the CoreContainer they are registered with.
+
+The the mentioned keys used for metadata of registered CoreContainer and
SolrCores are defined as public constants in the
[SolrConstants](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/solr/core/src/main/java/org/apache/stanbol/commons/solr/SolrConstants.java)
class.
+
+
+### ReferencedSolrServer
+
+This component allows to initialize a Solr server running within the same JVM
as Stanbol based on indexes provided by a directory on the local file system.
This does not support management capabilities, but it initializes a Solr
CoreContainer based on the data in the file system and registers it (including
all SolrCores) with the OSGi Service Registry as described above.
+
+The ReferencedSolrServer uses the ManagedServiceFactory pattern. This means
that instances are created by parsing configurations to the OSGi
ConfigurationAdmin service. Practically this means that:
+
+* users can create instances by using the Configuration tab of the Apache
Felix Web Console
+* programmers can directly use the ConfigurationAdmin service to create/update
and delete configurations
+* Configurations can also parsed via the Apache Sling [OSGi
installer](http://sling.apache.org/site/osgi-installer.html) framework. Meaning
configurations can be includes within the Stanbol launchers, Bundles or copied
to a directory configured for the [File
Provider](http://svn.apache.org/repos/asf/sling/trunk/installer/providers/file/)
+
+Configurations need to include the following properties (see also section
"Metadata for CoreContainer" for details about such properties)
+
+* **org.apache.solr.core.CoreContainer.name**: The name for the Solr Server
+* **org.apache.solr.core.CoreContainer.dir**: The path to the directory on the
local file system that is used to initialize the CoreContainer
+* **service.ranking**: The OSGi service ranking used to register the
CoreContainer and its SolrCores. If not specified '0' will be used as default.
The value MUST BE an integer number.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that
allows to enable/disable the publishing of the Solr RESTful API on
"http://{host}:{port}/solr/{server-name}". Requires the
"SolrServerPublishingComponent" to be active.
+
+**NOTE:** Keep in mind that of the RESTful API of the SolrServer is published
users might use the Admin Request handler to manipulate the SolrConfiguration.
In such cases the metadata provided by the ServiceReferences for the
CoreContainer and SolrCores might get out of sync with the actual configuration
of the Server.
+
+
+### ManagedSolrServer
+
+This component allows to manage a multi core Solr server. It provides an API
to create, update and remove SolrCores. In addition cores can be activated and
deactivated.
+
+
+#### Creating ManagedServerInstances
+
+The ManagedSolrServer uses the ManagedServiceFactory pattern. This means that
instances are created by parsing configurations to the OSGi ConfigurationAdmin
service. Practically this means that:
+
+* users can create instances by using the Configuration tab of the Apache
Felix Web Console
+* programmers can directly use the ConfigurationAdmin service to create/update
and delete configurations
+* Configurations can also parsed via the Apache Sling [OSGi
installer](http://sling.apache.org/site/osgi-installer.html) framework. Meaning
configurations can be includes within the Stanbol launchers, Bundles or copied
to a directory configured for the [File
Provider](http://svn.apache.org/repos/asf/sling/trunk/installer/providers/file/)
+
+Configurations need to include the following properties (see also section
"Metadata for CoreContainer" for details about such properties). Although the
properties are the same as for the ReferencedSolrServer their semantics differs
in some aspects.
+
+* **org.apache.solr.core.CoreContainer.name**: The name for the Solr Server
+* **org.apache.solr.core.CoreContainer.dir**: Optionally an directory to store
the data. If not specified the data will be stored in an directory with the
configured server-name at the default location (currently
"${sling.home}/indexes/" or "indexes/" if the environment variable 'sling.home'
is not present). Users that want to create multiple ManagedSolrServer with the
same name need to specify the directory or servers will override each others
data.
+* **service.ranking**: The OSGi service ranking used to register the
CoreContainer and its SolrCores. If not specified '0' will be used as default.
The value MUST BE an integer number. In scenarios where a single
ManagedSolrServer is expected it is highly recommended to specify
`Integer.MAX_VALUE` (2147483647) as service ranking. This will ensure that this
server can not be overridden by others.
+* **org.apache.solr.core.CoreContainer.publishREST**: Boolean switch that
allows to enable/disable the publishing of the Solr RESTful API on
"http://{host}:{port}/solr/{server-name}". Requires the
"SolrServerPublishingComponent" to be active.
+
+**NOTE:** Keep in mind that of the RESTful API of the SolrServer is published
users might use the Admin Request handler to manipulate the SolrConfiguration.
In such cases the metadata provided by the ServiceReferences for the
CoreContainer and SolrCores might get out of sync with the actual configuration
of the Server.
+
+
+#### Managing Solr Indexes
+
+This describes how to manage (create, update, remove, activate, deactivate)
Indexes on a ManagedSolrServer.
+
+Managed Indexes do not 1:1 correspond to SolrCores registered on the
CoreContainer. However all SolrCores on the CoreContainer do have a 1:1 mapping
with a managed index on the Managed SolrServer.
+
+Managed Index can be in one of the following States (defined by the
ManagedIndexState enumeration):
+
+* **UNINITIALISED**: An index that was created but is still missing the
configuration and/or index data is in that state. The ManagedSolrServer API
allows to create indexes by referring to a Solr-Index-Archive. Such archives
are than requested via the Stanbol DataFileProvider service. Usually users can
provide them by copying the lined index to the "/sling/datafiles" folder.
+* **INACTIVE**: This indicated that an index is was deactivated via the
ManagedSolrServer API. The data are still kept, but the SolrCore was removed
from the CoreContainer.
+* **ACTIVE**: This indicates that an index is active and can be used. Only
Indexes that are ACTIVE are registered with the CoreContainer.
+* **ERROR**: This state indicates some error during the the initialization.
The stack trace of the error is available in the IndexMetadata.
+
+Indexes can not only be managed by calls to the API of the ManagedSolrServer.
The "org.apache.stanbol.commons.solr.install" bundle provides also support for
installing/uninstalling indexes by using the Apache Sling [OSGi
installer](http://sling.apache.org/site/osgi-installer.html) framework. This
allows to install indexes by providing Solr-Index-Archives or
Solr-Index-Archive-References to any available Provider. By default Apache
Stanbol includes Provider for the Launchers and Bundles. However the Sling
Installer Framework also includes Providers for Directories on the File and JCR
Repositories.
+
+Solr-Index-Archives do use the following name pattern:
+
+ {name}.solrindex[.zip|.gz|.bz2]
+
+* They are normal achieves starting with the instance directory of a Solr Core.
+* The name of this instance directory MUST BE the same as the {name} of the
archive.
+* The second extensions specifies the type of the archive. If no extension is
specified the type of the Archive might still be detected by reading the first
few bytes of the Archive.
+
+Solr-Index-Archive-References are normal Java properties files and do use the
following name pattern:
+
+ {name}.solrindex.ref
+
+The following keys are used (see also
org.apache.stanbol.commons.solr.managed.ManagedIndexConstants):
+
+* **Index-Archive**: Comma separated list of Solr-Index-Archives that can be
used for initializing this index. The first index archive in the list has the
highest priority. Higher priority archives will replace the data of lower
priority once as soon as they become available. This feature is intended to be
used to allow the replacement of a small sample dataset (e.g. shipped within a
Bundle or the Launcher) with the full dataset download later from a remote
Internet archive or pushed manually to the `sling/datafiles` folder of a
previously installed Stanbol instance. For instance the `dbpedia.solrindex.ref`
archive reference configuration provided in the default launcher has the line:
`Index-Archive=dbpedia.solrindex.zip,dbpedia_43k.solrindex.zip` and only
`dbpedia_43k.solrindex.zip` is shipped in the default launchers allowing for
override by any archive named `dbpedia.solrindex.zip`.
+* **Index-Name**: The name of the Index. If not specified the {name} part of
the first Index-Archive in the list will be used.
+* **Server-Name**: The name of the ManagedSolrServer this Solr index MUST BE
deployed on. If not present it will be deployed on the default
ManagedSolrServer (the ManagedSolrServer with the highest priority.
+* **Synchronized**: Boolean switch. If enabled the index will be synchronized
with the referenced Solr-Index-Archives. That means the DataFileTracker service
will be used to periodically track the states of referenced
Solr-Index-Archives. This allows to initialize/update and uninitialise managed
Solr indexes by simple making Solr-Index-Archives un-/available to the
DataFileProvider infrastructure (such as Users copying/deleting files in the
"/sling/datafiles" directory).
+* **other Properties**: All parsed properties are forwarded to the
DataFileProvider/DataFileTracker service when looking for the referenced
Solr-Index-Archives. This components might also define some special keys
associated with specific functionalities. Please look at the documentation of
this services for details.
+
+
+#### Other interesting Notes
+
+* SolrCore directory names created by the ManagedSolrServer use the current
date as suffix. If a directory with that name already exists (e.g. because the
same index was already updated on the very same day) than an additional
"-{count}" suffix will be added to the end.
+* The Managed SolrServer stores its configuration within the persistent space
of the Bundle provided by the OSGi environment. When using one of the default
Stanbol launchers this is within "{sling.home}/felix/bundle{bundle-id}/data".
The "{bundle-id}" of the "org.apache.stanbol.commons.solr.managed" bundle can
be looked up the the [Bundle tab](http://localhost:8080/system/console/bundles)
of the Apache Felix Webconsole. The actual configuration of a ManagedSolrServer
is than in ".config/index-config/{service.pid}". The "{service.pid}" can be
also looked up via the Apache Felix Web-console in the [Configuration Status
tab](http://localhost:8080/system/console/config). Within this folder the Solr
index reference files (normal java properties files) with all the information
about the current state of the managed indexes are present.
+* Errors that occur during the asynchronous initialization of SolrCores are
stored within the IndexingProperties. They can therefore be requested via the
API of the ManagedSolrServer but also be looked up within the persistent state
of the ManagedSolrServer (see above where such files are located).
+
+
+## Solr Client Components
+
+This sections describes how to use Solr servers and indexes referenced and
managed by the "org.apache.stanbol.commons.solr" framework.
+Principally there are two possibilities: (1) to directly access Solr indexes
via the SolrServer Java API and (2) to publish locally managed index on the
OSGi HttpService and than use such indexes via the Solr RESTful API.
+
+The Stanbol Solr framework does not provide utilities for accessing remote
Solr servers, because this is already easily possible by using SolrJ.
+
+
+### Java API
+
+This describes how to lookup and access a Solr Server initialized by the
"org.apache.stanbol.commons.solr" framework. The client side Java API of Solr
is defined by the SolrServer abstract class. The implementation used for
accessing a SolrCore running in the same JVM is the EmbeddedSolrServer.
+
+All Solr server (CoreContainer) and Solr indexes (SolrCore) initialized by the
ReferencedSolrServer and/or ManagedSolrServer are registered with the OSGi
service registry. More information about this can be found in the first part of
the "Solr Server Components" of this documentation.
+
+OSGi already provides APIs and utilities to lookup and track registered
services. In the following I will provide some examples how to lookup
SolrServers registered as OSGi services.
+
+
+#### IndexReference
+
+The IndexReference is a Java class that manages a reference to an Index. It
defines a constructor that takes a serverName and coreName. In addition there
is a static parse(String ref) method that takes
+
+* file URLs
+* file paths and
+* [server-name:]core-name like references.
+
+The IndexMetadata class also defines a getter to get the IndexReference.
+
+One feature of the IndexReference is also that it provides getters of Filters
as used to lookup/track the referenced CoreContainer/SolrCore in the OSGi
service Registry. The returned filter include the constraint for the registered
interface (OBJECTCLASS). Therefore when using this filters one can parse NULL
for the class parameter
+
+To lookup the CoreContainer of the referenced index:
+
+ bundleContext.getServiceReferences(null, indexReference.getServerFilter());
+
+To lookup the SolrCore for the referenced index:
+
+ bundleContext.getServiceReferences(null, indexReference.getIndexFilter());
+
+
+#### Lookup Solr Indexes
+
+This example shows how to lookup the default CoreContainer and create a
SolrServer for the core "mydata".
+
+ ComponentContext context; // typically passed to the activate method
+ BundleContext bc = context.getBundleContext();
+ ServiceReference coreContainerRef =
+ bc.getServiceReference(CoreContainer.class.getName());
+ CoreContainer coreContainer = (CoreContainer)
bc.getService(coreContainerRef)
+ SolrServer server = new EmbeddedSolrServer(coreContainer, "mydata");
+
+Now there might be cases where several CoreContainers are available and
"mydata" is not available on the default one. The "default" refers to the one
with the highest "service.ranking" value. In this case we need to know a
available property we can use to filter for the right CoreContainer. In this
case we assume the index is on a CoreContainer registered with the name
"myserver".
+
+ ComponentContext context; // typically passed to the activate method
+ BundleContext bc = context.getBundleContext();
+
+ // Now let's use the IndexReference to create the filter
+ IndexReference indexRef = new IndexReference("myserver", "mydata");
+ ServiceReference[] coreContainerRefs = bc.getServiceReferences(
+ null, indexRef.getServerFilter());
+
+ // TODO: check that coreContainerRefs != null AND not empty!
+ // Now we have all References to CoreContainers with the name "myserver"
+ // Yes one can register several for the same name (e.g. to have fallbacks)
+ // let get the one with the highest service.ranking
+ Arrays.sort(coreContainerRefs, ServiceReferenceRankingComparator.INSTANCE);
+
+ // Create the SolrServer (same as above)
+ CoreContainer coreContainer = (CoreContainer)
bc.getService(coreContainerRefs[0])
+ SolrServer server = new EmbeddedSolrServer(coreContainer,
indexRef.getIndex());
+
+In cases where one only knows the name of the SolrCore (and not the
CoreContainer) the initialization looks like this.
+
+ ComponentContext context; // typically passed to the activate method
+ BundleContext bc = context.getBundleContext();
+ String nameFilter = String.format("(%s=%s)",
SolrConstants.PROPERTY_CORE_NAME, "mydata");
+ ServiceReference[] solrCoreRefs = bc.getServiceReferences(
+ SolrCore.class.getName(), nameFilter);
+
+ // TODO: check that != null AND not empty!
+ // Now we have all References to CoreContainer with a SolrCore "mydata"
+ // let get the one with the highest service.ranking
+ Arrays.sort(solrCoreRefs, ServiceReferenceRankingComparator.INSTANCE);
+
+ // Now get the SolrCore and create the SolrServer
+ SolrCore core = (SolrCore) bc.getService(solrCoreRefs[0]);
+
+ // core.getCoreDescriptor() might be null if SolrCore is not
+ // registered with a CoreContainer
+ SolrServer server = new EmbeddedSolrServer(
+ core.getCoreDescriptor().getCoreContainer(), "mydata");
+
+
+#### Tracking Solr Indexes
+
+The above examples do a lookup at a single point in time. However because OSGi
is an dynamic environment where services can come the go at every time in most
cases users might rather want to track services. To do this OSGi provides the
ServiceTracker utility.
+
+To ease the tracking of SolrServers the "org.apache.stanbol.commons.solr.core"
bundle provides the RegisteredSolrServerTracker. The following examples show
how to create a Managed SolrIndex and than track the SolrServer.
+
+First during the activation we need to check if "mydata" is already created
and create it if not. Than we can start tracking the index:
+
+ BundleContext bc;
+ // The ManagedSolrServer instance can be looked up manually using a service
+ // reference or using declarative services / SCR injection
+ IndexMetadata metadata = managedServer.getIndexMetadata("mydata");
+ if (metadata == null) {
+ // No index with that name:
+ // Asynchronously init the index as soon as the solrindex archive is
available
+ metadata = managedServer.createSolrIndex("mydata",
"mydata.solrindex.zip", null);
+ }
+ RegisteredSolrServerTracker indexTracker =
+ new RegisteredSolrServerTracker(bc, metadata.getIndexReference());
+
+ // Do not forget to close the tracker while deactivating
+ indexTracker.open();
+
+Now every time we need the SolrServer we can retrieve it from the indexTracker
+
+ private SolrServer getServer() {
+ SolrServer server = indexTracker.getService();
+ if(server == null) {
+ // Report the missing server
+ throw new IllegalStateException("Server 'mydata' not active");
+ } else {
+ return server;
+ }
+ }
+
+The RegisteredSolrServerTracker does take "service.ranking" into account. So
if there are more Services available that match the passed IndexReference those
methods will always return the one with the highest "service.ranking". In case
arrays are returned such arrays are sorted accordingly.
+
+
+### RESTful API
+
+The following describes how to publish the RESTful API of CoreContainer
registered as OSGi services on the OSGi HttpService. The functionality
described in this section is provided by the
"org.apache.stanbol.commons.solr.web" artifact.
+
+
+#### SolrServerPublishingComponent
+
+This is an OSGi component that starts immediate and does not require a
configuration. Its main purpose is to track all CoreContainers with the
property "org.apache.solr.core.CoreContainer.publishREST=true". For all such
CoreContainers it publishes the RESTful API under the URL
+
+ http://{host}:{port}/solr/{server-name}
+
+If two CoreContainers with the same {server-name} (the value of the
"org.apache.solr.core.CoreContainer.name" property) are registered the one with
the highest "service.ranking" is published.
+
+The root-prefix ("/solr" by default) can be configured by setting the
"org.apache.stanbol.commons.solr.web.dispatchfilter.prefix" property.
+
+
+#### SolrDispatchFilterComponent
+
+This Component provides the same functionality as the
SolrServerPublishingComponent, but can be configured specifically for a
CoreContainer. It is intended to be used if one wants to publish the RESTful
API of a specific CoreContainer under a specific location. To deactivate the
publishing of the same core on the SolrServerPublishingComponent users need to
set the "org.apache.solr.core.CoreContainer.publishREST" to false.
+
+This component is configured by two properties
+
+* **org.apache.stanbl.commons.solr.web.dispatchfilter.name**: The
{server-name} of the CoreContainer to publish ({server-name} refers to the
value of the "org.apache.solr.core.CoreContainer.name" property).
+* **org.apache.stanbl.commons.solr.web.dispatchfilter.prefix**: The prefix
path to publish the server. The {server-name} is NOT appended to the configured
prefix. Note that a Servlet Filter with `{prefix}/.*` is registered with the
OSGi HttpService.
+
+If two CoreContainers with the same {server-name} (the value of the
"org.apache.solr.core.CoreContainer.name" property) are registered the one with
the highest "service.ranking" is published.
+