Author: rwesten
Date: Thu Mar 22 13:52:08 2012
New Revision: 1303784
URL: http://svn.apache.org/viewvc?rev=1303784&view=rev
Log:
Updates to the eHealth demo:
* The shell script now downloads the RDF dumps from a mirror as the original
sites are down since nearly two days
* Added next parst to the README.md
* removed an unneeded config from indexing.properties
Modified:
incubator/stanbol/trunk/demos/ehealth/README.md
incubator/stanbol/trunk/demos/ehealth/index.sh
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
Modified: incubator/stanbol/trunk/demos/ehealth/README.md
URL:
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/README.md?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
--- incubator/stanbol/trunk/demos/ehealth/README.md (original)
+++ incubator/stanbol/trunk/demos/ehealth/README.md Thu Mar 22 13:52:08 2012
@@ -68,4 +68,67 @@ After that the you will be able to
## Backround information about this demo
-TODO!!
\ No newline at end of file
+### Indexing
+
+The configuration used for indexing can be found at
+
+ ./src/main/indexing/config
+
+It contains of the following parts:
+
+* __indexing.properties__: Core configuration for the Indexing tools TODO:
link to docu
+* __mappings.txt__: Configures the indexed fields, data types and property
mappings. TODO: link to dock
+* __fieldboost.properties__: configuration for the field boosts. TODO: link to
dock
+* __ehealth/__: the SolrCore configuration used for indexing. This is used in
this example to customize how Solr indexes labels and ID. See the following
section for details.
+
+#### Customizing the Solr Schema used for indexing
+
+The default SolrCore configuration used by the Apache Entityhub is contained
in the SolrYard module and can be found
[here](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip).
This configuration will be used if no customized configuration is present in
"{indexing-root}/indexing/config/{name}" where {name} refers to the value of
the property "name" in the "indexing.properties".
+
+Users that want/need to customize the SolrCore configuration should start with
the [default
configuration](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip)
extract this zip file to "{incexing-root}/indexing/config" and than rename the
folder to the "name" configured in the "indexing.properties". After that you
can start to customize the configuration of the SolrCore used for the
configuration.
+
+THis demo uses this procedure to define two special Solr field types for
indexing labels and IDs (see
./src/main/indexing/config/ehealth/conf/schema.xml).
+
+ :::xml
+ <!-- intended to be used for labels of drugs -->
+ <fieldType name="label" class="solr.TextField" positionIncrementGap="100"
omitNorms="false">
+ <analyzer>
+ <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+ <filter class="solr.ASCIIFoldingFilterFactory"/>
+ <filter class="solr.WordDelimiterFilterFactory"
+ catenateWords="1" catenateNumbers="1" catenateAll="1"
+ generateWordParts="1" generateNumberParts="0"
+ splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
+ preserveOriginal="0" />
+ <filter class="solr.LowerCaseFilterFactory"/>
+ </analyzer>
+ </fieldType>
+
+ <!-- Field Type used for searching Drugs based on their variouse IDs -->
+ <fieldType name="code_field" class="solr.TextField"
positionIncrementGap="100" omitNorms="false">
+ <analyzer>
+ <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+ <filter class="solr.ASCIIFoldingFilterFactory"/>
+ <filter class="solr.WordDelimiterFilterFactory"
+ catenateWords="1" catenateNumbers="1" catenateAll="1"
+ generateWordParts="1" generateNumberParts="0"
+ splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0"
+ preserveOriginal="0" />
+ </analyzer>
+ </fieldType>
+
+For more information on the tokenizers and filters used by this configuration
please see [Analyzers, Tokenizers, and Token
Filters](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)
documentation.
+
+Such field types are than applied to specific properties with the following
configurations
+
+ <!-- fields that store codes -->
+ <field name="@/skos:notation/" type="code_field" indexed="true"
stored="true" multiValued="true"/>
+ <field name="@/drugbank:ahfsCode/" type="code_field" indexed="true"
stored="true" multiValued="true"/>
+ <field name="@/drugbank:atcCode/" type="code_field" indexed="true"
stored="true" multiValued="true"/>
+ [...]
+ <!-- String fields (e.g. chemical formulars)-->
+ <field name="@/drugbank:smilesStringCanonical/" type="string"
indexed="true" stored="true" multiValued="true"/>
+ <field name="@/drugbank:smilesStringIsomeric/" type="string" indexed="true"
stored="true" multiValued="true"/>
+ [...]
+
+Field
Modified: incubator/stanbol/trunk/demos/ehealth/index.sh
URL:
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/index.sh?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
--- incubator/stanbol/trunk/demos/ehealth/index.sh (original)
+++ incubator/stanbol/trunk/demos/ehealth/index.sh Thu Mar 22 13:52:08 2012
@@ -17,6 +17,22 @@
# 1. build the indexing tool and copy it to the /data directory
+# Such servers are often down - so use a mirror for now.
+# Try to use those links of you want to get the newest version
+
+# SIDER_DUMP=http://www4.wiwiss.fu-berlin.de/sider/sider_dump.nt.bz2
+# DRUGBANK_DUMP=http://www4.wiwiss.fu-berlin.de/drugbank/drugbank_dump.nt.bz2
+# DAILYMED_DUMP=http://www4.wiwiss.fu-berlin.de/dailymed/dailymed_dump.nt.bz2
+#
DISEASOME_DUMP=http://www4.wiwiss.fu-berlin.de/diseasome/diseasome_dump.nt.bz2
+
+# Mirror hosted by the IKS project (http://www.iks-project.org)
+export
IKS_MIRROR=http://dev.iks-project.eu/downloads/stanbol-indices/ehealth/source-files/
+export SIDER_DUMP=$IKS_MIRROR"sider_dump.nt.bz2"
+export DRUGBANK_DUMP=$IKS_MIRROR"drugbank_dump.nt.bz2"
+export DAILYMED_DUMP=$IKS_MIRROR"dailymed_dump.nt.bz2"
+export DISEASOME_DUMP=$IKS_MIRROR"diseasome_dump.nt.bz2"
+
+
if [ ! -f target/indexing ]
then
mkdir -p target/indexing
@@ -56,25 +72,25 @@ then
if [ ! -f sider_dump.nt.bz2 ]
then
echo "Downloading SIDER"
- wget -c http://www4.wiwiss.fu-berlin.de/sider/sider_dump.nt.bz2
+ wget -c $SIDER_DUMP
fi
if [ ! -f drugbank_dump.nt.bz2 ]
then
echo "Downloading DrugBank"
- wget -c http://www4.wiwiss.fu-berlin.de/drugbank/drugbank_dump.nt.bz2
+ wget -c $DRUGBANK_DUMP
fi
if [ ! -f dailymed_dump.nt.bz2 ]
then
echo "Downloading Dailymed"
- wget -c http://www4.wiwiss.fu-berlin.de/dailymed/dailymed_dump.nt.bz2
+ wget -c $DAILYMED_DUMP
fi
if [ ! -f diseasome_dump.nt.bz2 ]
then
echo "Downloading Diseasome"
- wget -c http://www4.wiwiss.fu-berlin.de/diseasome/diseasome_dump.nt.bz2
+ wget -c $DISEASOME_DUMP
fi
cd ..
else
Modified:
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
URL:
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
---
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
(original)
+++
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
Thu Mar 22 13:52:08 2012
@@ -15,9 +15,6 @@
# Indexing Properties
-#change the destination folder
-destination=../..
-
# Here the name of the dataset MUST be specified by the user
# It MUST BE a single word with no spaces.
name=ehealth