http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15555

            Bug ID: 15555
           Summary: Index 024$a into Identifier-other:u url register when
                    source $2 is uri
 Change sponsored?: ---
           Product: Koha
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5 - low
         Component: Z39.50 / SRU / OpenSearch Servers
          Assignee: gmcha...@gmail.com
          Reporter: dc...@prosentient.com.au
        QA Contact: testo...@bugs.koha-community.org
                CC: m.de.r...@rijksmuseum.nl

Currently, 024$a is indexed into Identifier-other:w, even when it is a URI
(e.g. http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217)

This causes problems because the "w" index type replaces punctuation with
spaces, and tokenizes on spaces, so that the URI is decomposed into a series of
values which are indexed separately. This is definitely not what you want when
indexing a 024$a when it is a URI.

For example:

The url "http://libris.kb.se/resource/bib/219553"; becomes the following:

<index name="Identifier-other" type="w" seq="28">@^</index>
<index name="Identifier-other" type="w" seq="1"></index>
<index name="Identifier-other" type="w" seq="29">http</index>
<index name="Identifier-other" type="w" seq="30">libris</index>
<index name="Identifier-other" type="w" seq="31">kb</index>
<index name="Identifier-other" type="w" seq="32">se</index>
<index name="Identifier-other" type="w" seq="33">resource</index>
<index name="Identifier-other" type="w" seq="34">bib</index>
<index name="Identifier-other" type="w" seq="35">219553</index>

Fortunately, the 024$2 subfield value tells us the source of the identifier,
and "uri" is one of the valid options. So, when we have a 024$2=uri, we can
index the 024$a using the "url" index type. 

(I'm also planning to index into the "phrase" index type for all 024$a as it
performs the normalization but it doesn't tokenize based on the spaces, so this
normal form may still be of use for urls and other identifiers that rely on
punctuation for meaning.)

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

Reply via email to