Author: rwesten
Date: Thu Mar 22 13:52:08 2012
New Revision: 1303784

URL: http://svn.apache.org/viewvc?rev=1303784&view=rev
Log:
Updates to the eHealth demo:

* The shell script now downloads the RDF dumps from a mirror as the original 
sites are down since nearly two days
* Added next parst to the README.md
* removed an unneeded config from indexing.properties

Modified:
    incubator/stanbol/trunk/demos/ehealth/README.md
    incubator/stanbol/trunk/demos/ehealth/index.sh
    
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties

Modified: incubator/stanbol/trunk/demos/ehealth/README.md
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/README.md?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
--- incubator/stanbol/trunk/demos/ehealth/README.md (original)
+++ incubator/stanbol/trunk/demos/ehealth/README.md Thu Mar 22 13:52:08 2012
@@ -68,4 +68,67 @@ After that the you will be able to 
 
 ## Backround information about this demo
 
-TODO!!
\ No newline at end of file
+### Indexing
+
+The configuration used for indexing can be found at
+
+    ./src/main/indexing/config
+
+It contains of the following parts:
+
+* __indexing.properties__: Core configuration for the Indexing tools TODO: 
link to docu
+* __mappings.txt__: Configures the indexed fields, data types and property 
mappings. TODO: link to dock
+* __fieldboost.properties__: configuration for the field boosts. TODO: link to 
dock
+* __ehealth/__: the SolrCore configuration used for indexing. This is used in 
this example to customize how Solr indexes labels and ID. See the following 
section for details.
+
+#### Customizing the Solr Schema used for indexing
+
+The default SolrCore configuration used by the Apache Entityhub is contained 
in the SolrYard module and can be found 
[here](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip).
 This configuration will be used if no customized configuration is present in 
"{indexing-root}/indexing/config/{name}" where {name} refers to the value of 
the property "name" in the "indexing.properties".
+
+Users that want/need to customize the SolrCore configuration should start with 
the [default 
configuration](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/default.solrindex.zip)
 extract this zip file to "{incexing-root}/indexing/config" and than rename the 
folder to the "name" configured in the "indexing.properties". After that you 
can start to customize the configuration of the SolrCore used for the 
configuration.
+
+THis demo uses this procedure to define two special Solr field types for 
indexing labels and IDs (see 
./src/main/indexing/config/ehealth/conf/schema.xml).
+
+    :::xml
+    <!-- intended to be used for labels of drugs -->
+    <fieldType name="label" class="solr.TextField" positionIncrementGap="100" 
omitNorms="false">
+      <analyzer>
+        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+        <filter class="solr.ASCIIFoldingFilterFactory"/>
+        <filter class="solr.WordDelimiterFilterFactory" 
+            catenateWords="1" catenateNumbers="1" catenateAll="1" 
+            generateWordParts="1" generateNumberParts="0"
+            splitOnCaseChange="0" splitOnNumerics="0" 
stemEnglishPossessive="0" 
+            preserveOriginal="0" />
+        <filter class="solr.LowerCaseFilterFactory"/>
+      </analyzer>
+    </fieldType>
+
+    <!-- Field Type used for searching Drugs based on their variouse IDs -->
+    <fieldType name="code_field" class="solr.TextField" 
positionIncrementGap="100" omitNorms="false">
+      <analyzer>
+        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+        <filter class="solr.ASCIIFoldingFilterFactory"/>
+        <filter class="solr.WordDelimiterFilterFactory" 
+            catenateWords="1" catenateNumbers="1" catenateAll="1" 
+            generateWordParts="1" generateNumberParts="0"
+            splitOnCaseChange="0" splitOnNumerics="0" 
stemEnglishPossessive="0" 
+            preserveOriginal="0" />
+      </analyzer>
+    </fieldType>
+
+For more information on the tokenizers and filters used by this configuration 
please see [Analyzers, Tokenizers, and Token 
Filters](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) 
documentation.
+
+Such field types are than applied to specific properties with the following 
configurations
+
+   <!-- fields that store codes -->
+   <field name="@/skos:notation/" type="code_field" indexed="true" 
stored="true" multiValued="true"/>
+   <field name="@/drugbank:ahfsCode/" type="code_field" indexed="true" 
stored="true" multiValued="true"/>
+   <field name="@/drugbank:atcCode/" type="code_field" indexed="true" 
stored="true" multiValued="true"/>
+   [...] 
+   <!-- String fields (e.g. chemical formulars)-->
+   <field name="@/drugbank:smilesStringCanonical/" type="string" 
indexed="true" stored="true" multiValued="true"/>
+   <field name="@/drugbank:smilesStringIsomeric/" type="string" indexed="true" 
stored="true" multiValued="true"/>
+   [...] 
+
+Field

Modified: incubator/stanbol/trunk/demos/ehealth/index.sh
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/index.sh?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
--- incubator/stanbol/trunk/demos/ehealth/index.sh (original)
+++ incubator/stanbol/trunk/demos/ehealth/index.sh Thu Mar 22 13:52:08 2012
@@ -17,6 +17,22 @@
 
 # 1. build the indexing tool and copy it to the /data directory
 
+# Such servers are often down - so use a mirror for now.
+# Try to use those links of you want to get the newest version
+
+# SIDER_DUMP=http://www4.wiwiss.fu-berlin.de/sider/sider_dump.nt.bz2
+# DRUGBANK_DUMP=http://www4.wiwiss.fu-berlin.de/drugbank/drugbank_dump.nt.bz2
+# DAILYMED_DUMP=http://www4.wiwiss.fu-berlin.de/dailymed/dailymed_dump.nt.bz2
+# 
DISEASOME_DUMP=http://www4.wiwiss.fu-berlin.de/diseasome/diseasome_dump.nt.bz2
+
+# Mirror hosted by the IKS project (http://www.iks-project.org)
+export 
IKS_MIRROR=http://dev.iks-project.eu/downloads/stanbol-indices/ehealth/source-files/
+export SIDER_DUMP=$IKS_MIRROR"sider_dump.nt.bz2"
+export DRUGBANK_DUMP=$IKS_MIRROR"drugbank_dump.nt.bz2"
+export DAILYMED_DUMP=$IKS_MIRROR"dailymed_dump.nt.bz2"
+export DISEASOME_DUMP=$IKS_MIRROR"diseasome_dump.nt.bz2"
+
+
 if [ ! -f target/indexing ]
 then
     mkdir -p target/indexing
@@ -56,25 +72,25 @@ then
     if [ ! -f sider_dump.nt.bz2 ]
     then
         echo "Downloading SIDER"
-        wget -c http://www4.wiwiss.fu-berlin.de/sider/sider_dump.nt.bz2
+        wget -c $SIDER_DUMP
     fi
 
     if [ ! -f drugbank_dump.nt.bz2 ]
     then
         echo "Downloading DrugBank"
-        wget -c http://www4.wiwiss.fu-berlin.de/drugbank/drugbank_dump.nt.bz2
+        wget -c $DRUGBANK_DUMP
     fi
 
     if [ ! -f dailymed_dump.nt.bz2 ]
     then
         echo "Downloading Dailymed"
-        wget -c http://www4.wiwiss.fu-berlin.de/dailymed/dailymed_dump.nt.bz2
+        wget -c $DAILYMED_DUMP
     fi
 
     if [ ! -f diseasome_dump.nt.bz2 ]
     then
         echo "Downloading Diseasome"
-        wget -c http://www4.wiwiss.fu-berlin.de/diseasome/diseasome_dump.nt.bz2
+        wget -c $DISEASOME_DUMP
     fi
     cd ..
 else

Modified: 
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties?rev=1303784&r1=1303783&r2=1303784&view=diff
==============================================================================
--- 
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
 (original)
+++ 
incubator/stanbol/trunk/demos/ehealth/src/main/indexing/config/indexing.properties
 Thu Mar 22 13:52:08 2012
@@ -15,9 +15,6 @@
 
 # Indexing Properties
 
-#change the destination folder
-destination=../..
-
 # Here the name of the dataset MUST be specified by the user
 # It MUST BE a single word with no spaces.
 name=ehealth


Reply via email to