Hi,

I am trying to work out how to store, query and facet machine tags [1] in Solr using a combination of copy fields and pattern tokenizer factories.

I am still relatively new to Solr so despite feeling like I've gone over the docs, and friends, it's entirely possible I've missed something glaringly obvious.

The short version is: Faceting works. Yay! You can facet on the individual parts of a machine tag (namespace, predicate, value) and it does what you'd expect. For example:

?q=*:*&facet=true&facet.field=mt_namespace&rows=0

numFound:115
foo:65
dc:48
lastfm:2

The longer version is: Even though faceting seems to work I can't query (as in ?q=) on the individual fields.

For example, if a single "machinetag" (foo:bar=example) field is copied to "mt_namespace", "mt_predicate" and "mt_value" fields I still can't query for "?q=mt_namespace:foo".

It appears as though the entire machine tag is being copied to mt_namespace even though my reading of the docs is that is a attribute is present in a solr.PatternTokenizerFactory analyzer then only the matching capture group will be stored.

Is that incorrect?

I've included the field/fieldType definitions I'm using below. [2] Any help/suggestions would be appreciated.

Cheers,

[1] http://www.flickr.com/groups/api/discuss/72157594497877875/

[2]

<field name="machine_tags" type="machinetag" indexed="true" stored="true" required="false" multiValued="true"/>

<field name="mt_namespace" type="mt_namespace" indexed="true" stored="true" required="false" multiValued="true" />

<field name="mt_predicate" type="mt_predicate" indexed="true" stored="true" required="false" multiValued="true" />

<field name="mt_value" type="mt_value" indexed="true" stored="true" required="false" multiValued="true" />

<copyField source="machine_tags" dest="mt_namespace" />
<copyField source="machine_tags" dest="mt_predicate" />
<copyField source="machine_tags" dest="mt_value" />

<fieldType name="machinetag" class="solr.TextField" />

<fieldType name="mt_namespace" class="solr.TextField">
  <analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="([a-zA-Z[0-9]](?:\w+)?):.+" group="1" />
   </analyzer>
</fieldType>

<fieldType name="mt_predicate" class="solr.TextField">
  <analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="[a-zA-Z[0-9]](?:\w+)?:([a-zA-Z[0-9]](?:\w+)?)=.+" group="1" />
  </analyzer>
</fieldType>

<fieldType name="mt_value" class="solr.TextField">
  <analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="[a-zA-Z[0-9]](?:\w+)?:[a-zA-Z[0-9]](?:\w+)?=(.+)" group="1" />
  </analyzer>
</fieldType>

Reply via email to