Yonik, Could you please revert your commit, until we've reached some consensus on this discussion first?
Maybe, post alternative patches on the issue (SOLR-2519), and we can iterate there? Adding a new example field type ("text_nwd") is one way to go, and I agree is least risk/effort, a "quick fix", but I don't think we should use a quick fix here. I think it's important for Solr to have good out-of-the-box defaults for all languages, like ElasticSearch, even if that means we have to do some extra work now (ie, fixing up the wiki/tutorials) to make that change. More below: On Sun, May 15, 2011 at 12:20 PM, Yonik Seeley <yo...@lucidimagination.com> wrote: > As far as Solr defaults... perhaps way way back "text" should have > been named "text_en". > But any changes now should be comprehensive (we need to consider > impacts to the example > data, the example schema, the solr tuturial which relies on some of > the current behavior, and a ton of documentation > on the wiki related to both analysis components (multi-word synonyms, > WDF, etc) and other quickstart guides. > > Anyway, changes to the example schema (or the behavior of the example > schema) can have a large impact. I agree: we need to fix the wiki pages/examples that rely on auto-phrase. But, really, how much work is this? Can you point to an example or two in the wiki/tutorial that "advertise"/rely on auto phrase? This would help me get a sense of how much additional work I'm signing up for ;) I just went through the tutorial and didn't see one... (Also, we should add some CJK docs and queries to the tutorial... a simple pair is the test case in my patch on SOLR-2519.) We shouldn't avoid/fear good changes to our defaults just because fixing it will be more work, especially if someone (me!) is signing up to do that work.... > I personally think that adding a new field is much easier and less > disruptive, and given the potential impact I agree the quick fix is somewhat easier than doing it right, but I think in this case we should do it right. Solr really should just work well out-of-the-box on all (including non-whitespace) languages. > we should hear what others have to say about it too +1 Mike http://blog.mikemccandless.com