On 9/17/2010 7:22 PM, Chris Hostetter wrote:
a) not really. assuming you have no problem modifying the indexing code in the way you want, and are primarily worried about searching from various clients, then the most straight forward approach is probably to use RewriteRules (or something equivilent) to do regex replacments in your query strings before solr ever sees them.
That's an interesting idea. I am using haproxy, it might be able to do that. We don't have various clients, the index is pretty much used only by our web applications. One set of apps (the one we are phasing out) is using code actually intended for our old search engine's HTTP interface. We hacked together a shim to translate the old query syntax and use xslt to reformat Solr's output for it. The other set of apps is Java, using SolrJ.
b) i'm not sure if you realize that you can't make your index smaller by removing a field from your schema -- not unless you also reindex all of hte documents that (use to) have a value in that field. depending on your priorities, doing this twice (once to remove ft_text, and then once again later to add ft_text back and remove catchall) may not be the best use of your time/resources -- it might be more productive to accelerate your switch to using dismax, and only do the reindexing once to eliminate your catchall field.
I do know that I have to reindex. It's a process that only takes about six hours. Afterwards, instead of only a little more than half of each index fitting into the disk cache, it'll be about three quarters. As it might be a few months before we can start effectively using dismax, I'm OK with doing rebuilds twice.
Thanks, Shawn