Yonik,

Could you please revert your commit, until we've reached some
consensus on this discussion first?

Maybe, post alternative patches on the issue (SOLR-2519), and we can
iterate there?

Adding a new example field type ("text_nwd") is one way to go, and I
agree is least risk/effort, a "quick fix", but I don't think we should
use a quick fix here.

I think it's important for Solr to have good out-of-the-box defaults
for all languages, like ElasticSearch, even if that means we have to
do some extra work now (ie, fixing up the wiki/tutorials) to make that
change.

More below:

On Sun, May 15, 2011 at 12:20 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

> As far as Solr defaults... perhaps way way back "text" should have
> been named "text_en".
> But any changes now should be comprehensive (we need to consider
> impacts to the example
> data, the example schema, the solr tuturial which relies on some of
> the current behavior, and a ton of documentation
> on the wiki related to  both analysis components (multi-word synonyms,
> WDF, etc) and other quickstart guides.
>
> Anyway, changes to the example schema (or the behavior of the example
> schema) can have a large impact.

I agree: we need to fix the wiki pages/examples that rely on
auto-phrase.

But, really, how much work is this?  Can you point to an example or
two in the wiki/tutorial that "advertise"/rely on auto phrase?  This
would help me get a sense of how much additional work I'm signing up
for ;)

I just went through the tutorial and didn't see one...

(Also, we should add some CJK docs and queries to the tutorial... a
simple pair is the test case in my patch on SOLR-2519.)

We shouldn't avoid/fear good changes to our defaults just because
fixing it will be more work, especially if someone (me!) is signing up
to do that work....

> I personally think that adding a new field is much easier and less
> disruptive, and given the potential impact

I agree the quick fix is somewhat easier than doing it right, but I
think in this case we should do it right.  Solr really should just
work well out-of-the-box on all (including non-whitespace) languages.

> we should hear what others have to say about it too

+1

Mike

http://blog.mikemccandless.com

Reply via email to