Hi,
I am trying to work out how to store, query and facet machine tags [1]
in Solr using a combination of copy fields and pattern tokenizer factories.
I am still relatively new to Solr so despite feeling like I've gone over
the docs, and friends, it's entirely possible I've missed something
glaringly obvious.
The short version is: Faceting works. Yay! You can facet on the
individual parts of a machine tag (namespace, predicate, value) and it
does what you'd expect. For example:
?q=*:*&facet=true&facet.field=mt_namespace&rows=0
numFound:115
foo:65
dc:48
lastfm:2
The longer version is: Even though faceting seems to work I can't query
(as in ?q=) on the individual fields.
For example, if a single "machinetag" (foo:bar=example) field is copied
to "mt_namespace", "mt_predicate" and "mt_value" fields I still can't
query for "?q=mt_namespace:foo".
It appears as though the entire machine tag is being copied to
mt_namespace even though my reading of the docs is that is a attribute
is present in a solr.PatternTokenizerFactory analyzer then only the
matching capture group will be stored.
Is that incorrect?
I've included the field/fieldType definitions I'm using below. [2] Any
help/suggestions would be appreciated.
Cheers,
[1] http://www.flickr.com/groups/api/discuss/72157594497877875/
[2]
<field name="machine_tags" type="machinetag" indexed="true"
stored="true" required="false" multiValued="true"/>
<field name="mt_namespace" type="mt_namespace" indexed="true"
stored="true" required="false" multiValued="true" />
<field name="mt_predicate" type="mt_predicate" indexed="true"
stored="true" required="false" multiValued="true" />
<field name="mt_value" type="mt_value" indexed="true" stored="true"
required="false" multiValued="true" />
<copyField source="machine_tags" dest="mt_namespace" />
<copyField source="machine_tags" dest="mt_predicate" />
<copyField source="machine_tags" dest="mt_value" />
<fieldType name="machinetag" class="solr.TextField" />
<fieldType name="mt_namespace" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory"
pattern="([a-zA-Z[0-9]](?:\w+)?):.+" group="1" />
</analyzer>
</fieldType>
<fieldType name="mt_predicate" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory"
pattern="[a-zA-Z[0-9]](?:\w+)?:([a-zA-Z[0-9]](?:\w+)?)=.+" group="1" />
</analyzer>
</fieldType>
<fieldType name="mt_value" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory"
pattern="[a-zA-Z[0-9]](?:\w+)?:[a-zA-Z[0-9]](?:\w+)?=(.+)" group="1" />
</analyzer>
</fieldType>