[jira] Updated: (SOLR-1690) JSONKeyValueTokenizerFactory -- JSON Tokenizer

Ryan McKinley (JIRA) Wed, 30 Dec 2009 10:57:57 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ryan McKinley updated SOLR-1690:
--------------------------------

    Description: 
Sometimes it is nice to group structured data into a single field.

This (rough) patch, takes JSON input and indexes tokens based on the key values 
pairs in the json.

{code:xml|title=schema.xml}
<!-- JSON Field Type -->
    <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" 
omitNorms="true">
      <analyzer type="index">
        <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" 
hierarchicalKey="false"/>
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.TrimFilterFactory" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldtype>
{code}

Given text:
{code}
 { "hello": "world", "rank":5 }
{code}

indexed as two tokens:

|| term position |      1 |     2 |
|| term text |  hello:world     | rank:5 |
|| term type |  word |  word |
|| source start,end |   12,17   | 27,28 |

  was:
Sometimes it is nice to group structured data into a single field.

This (rough) patch, takes JSON input and indexes tokens based on the key values 
pairs in the json.

For example, the text:
{code}
 { "hello": "world", "rank":5 }
{code}
gets indexed as two tokens:

|| term position |      1 |     2 |
|| term text |  hello:world     | rank:5 |
|| term type |  word |  word |
|| source start,end |   12,17   | 27,28 |


> JSONKeyValueTokenizerFactory -- JSON Tokenizer
> ----------------------------------------------
>
>                 Key: SOLR-1690
>                 URL: https://issues.apache.org/jira/browse/SOLR-1690
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: noggit-1.0-A1.jar, 
> SOLR-1690-JSONKeyValueTokenizerFactory.patch
>
>
> Sometimes it is nice to group structured data into a single field.
> This (rough) patch, takes JSON input and indexes tokens based on the key 
> values pairs in the json.
> {code:xml|title=schema.xml}
> <!-- JSON Field Type -->
>     <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" 
> omitNorms="true">
>       <analyzer type="index">
>         <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" 
> hierarchicalKey="false"/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.TrimFilterFactory" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldtype>
> {code}
> Given text:
> {code}
>  { "hello": "world", "rank":5 }
> {code}
> indexed as two tokens:
> || term position |    1 |     2 |
> || term text |        hello:world     | rank:5 |
> || term type |        word |  word |
> || source start,end |         12,17   | 27,28 |

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1690) JSONKeyValueTokenizerFactory -- JSON Tokenizer

Reply via email to