Solr is for search. Storing fields is to make retrieval easier. When
you hit an edge case, you need to step back and reconsider the price
you are paying for that "easier" bit. It can play at being a "NoSQL
Database", but it is not the primary use case and its behaviour at the
edge-cases is not optimal.

Storing a whole WIKI page of potentially unlimited size will cause you
all sort of grief later, not just now. You can index it for sure, if
you want to find it's content later. That (using text, not string
type) would break it up into lots of little tokens and there much
higher limit for that. But I would recommend storing the original page
somewhere else. Makes it a bit harder to return it (may need a custom
post-processor or work in the client), but easier than the kinds of
issues you are dealing with already.

Hope this helps to understand the bigger picture.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 15 September 2014 12:40, Christopher Gross <cogr...@gmail.com> wrote:
> [sorry if this double posts -- I got an error on sending so I'm trying it
> again..]
>
> I'm storing the page content in a "string" in Solr -- for display later.
> I'm indexing that content into a text field (text_en_splitting) for
> full-text searching.
>
> I'm getting an error on the "string" portion, but perhaps it is coming from
> the copy that I have that pushes the "content" field to a "text" field that
> I use for a full text search.
>
> I'm not familiar with the update request processor, and I don't think I
> want to just truncate the whole field to a set length.  Is there somewhere
> I can read up on the update request processor?  I already have a program
> that I'm using to push documents into Solr, so writing a script wouldn't
> help -- unless it is something that I can include in my current program.
>
> I'm also having trouble making sense of the error -- I don't see any errant
> UTF 8 characters, and the page doesn't appear to have any when I check it
> out in a browser.  It's all pretty basic text & URLs.
>
> -- Chris

Reply via email to