I am not sure I am following correctly. The field I upload the document to
would be "content" the analyzed field is "ColonCancerField". The "content"
field contains the entire text of the document, in my case a pubmed
abstract. This is a tokenized field. I made this field untokenized and I
still received the same results [the results for not instead of not
necessarily (in my current example I have 2 docs with not and 1 doc with
not necessarily {not is of course in the document that contains not
necessarily})]:

http://imgur.com/a/1bfXT

I also tried this:

http://localhost:8983/solr/Cytokine/select?&q=ColonCancerField
:"not+necessarily"

I still receive the two documents, which is the same as doing
ColonCancerField:"not"

Just to clarify the structure looks like this: *content (untokenized,
unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed) then I
browse the ColonCancerField and the facets state that there is 1 document
for not necessarily, but when selecting it, solr returns 2 results.

-Kevin

On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson <jej2...@gmail.com> wrote:

> Can you do the opposite?  Index into an unanalyzed field and copy into the
> analyzed?
>
> If I remember correctly facets are based off of indexed values so if you
> tokenize the field then the facets will be as you are seeing now.
> On Dec 28, 2015 9:45 AM, "Kevin Lopez" <kevin.lopez...@gmail.com> wrote:
>
> > *What I am trying to accomplish: *
> > Generate a facet based on the documents uploaded and a text file
> containing
> > terms from a domain/ontology such that a facet is shown if a term is in
> the
> > text file and in a document (key phrase extraction).
> >
> > *The problem:*
> > When I select the facet for the term "*not necessarily*" (we see there
> is a
> > space) and I get the results for the term "*not*". The field is tokenized
> > and multivalued. This leads me to believe that I can not use a tokenized
> > field as a facet field. I tried to copy the values of the field to a text
> > field with a keywordtokenizer. I am told when checking the schema
> browser:
> > "Sorry, no Term Info available :(" This is after I delete the old index
> and
> > upload the documents again. The facet is coming from a field that is
> > already copied from another field, so I cannot copy this field to a text
> > field with a keywordtokenizer or strfield. What can I do to fix this? Is
> > there an alternate way to accomplish this?
> >
> > *Here is my configuration:*
> >
> > <copyField source="ColonCancerField" dest="cytokineField"/>
> >
> > <field name="cytokineField" indexed="true" stored="true"
> > multiValued="true" type="Cytokine_Pass"/>
> > <fieldType name="Cytokine_Pass" class="solr.TextField">
> >     <analyzer>
> >     <tokenizer class="solr.KeywordTokenizerFactory" />
> >     </analyzer>
> > </fieldType>
> >
> >   <field name="ColonCancerField" type="ColonCancer" indexed="true"
> > stored="true" multiValued="true"
> >    termPositions="true"
> >    termVectors="true"
> >    termOffsets="true"/>
> > <fieldType name="ColonCancer" class="solr.TextField"
> > sortMissingLast="true" omitNorms="true">
> > <analyzer>
> > <filter class="solr.ShingleFilterFactory"
> >             minShingleSize="2" maxShingleSize="5"
> >             outputUnigramsIfNoShingles="true"
> >     />
> >   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >       <filter class="solr.LowerCaseFilterFactory"/>
> >     <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms_ColonCancer.txt" ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >     <filter class="solr.KeepWordFilterFactory"
> >             words="prefLabels_ColonCancer.txt" ignoreCase="true"/>
> >   </analyzer>
> > </fieldType>
> > <copyField source="content" dest="ColonCancerField"/>
> >
> > Regards,
> >
> > Kevin
> >
>

Reply via email to