Good, I'll try.
But imagine I have 100 documents containing "go pro" and 150 documents
containing "gopro".
Suggestions of the "other" term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] <
ml-node+s472066n4182254...@n3.nabble.com>:

> I think the word break spellchecker will do what you want.  But, if I were
> you, I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a
> word into 10 parts or trying to combine 10 adjacent words.  You also need
> the "minBreakLength" to be no more than 2, if you want it to break "go"
> (length=2) off of "gopro".
>
> James Dyer
> Ingram Content Group
>
>
> -----Original Message-----
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4182254&i=0>]
> Sent: Tuesday, January 27, 2015 2:58 AM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4182254&i=1>
> Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I indexed an electronics e-commerce product catalog.
>
> This is a typical document from my collection:
>
>
> "docs": [
>       {
>         "prezzo_vendita_d": 39.9,
>         "codice_produttore_s": "DK00150020",
>         "codice_s": "5.BAT.27407",
>         "descrizione": "BATTERIA GO PRO HERO ",
>         "barcode_interno_s": "185323000958",
>         "categoria": "Batterie",
>         "prezzo_acquisto_d": 16.12,
>         "marchio": "GO PRO",
>         "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
>         "id": "27407",
>         "_version_": 1491274123542790100
>       },
>   {
>     "codice_produttore_s": "DK0052043",
>     "codice_s": "05.SP.42760",
>     "id": "42760",
>     "marchio": "SP GADGETS",
>     "barcode_interno_s": "4028017520430",
>     "prezzo_acquisto_d": 34.4,
>     "data_aggiornamento_dt": "2014-11-04T00:00:00Z",
>     "descrizione": "SP POS CASE GOPRO OLIVE LARGE",
>     "prezzo_vendita_d": 59.95,
>     "_version_": 1491274406746390500
>   }
> ...]
> I want my spellchecker to suggest "go pro" to users searching "gopro"
> (without whitespace).
>
> I also want users searching "go pro" to find "gopro" products, too.
>
> Here's a little bit of my configuration:
>
> *schema.xml*
> <field name="marchio" type="string" indexed="true" stored="true"/>
>         <field name="categoria" type="string" indexed="true"
> stored="true"/>
>         <field name="fornitore" type="string" indexed="true"
> stored="true"/>
>         <field name="descrizione" type="string" indexed="true"
> stored="true"/>
>
>         <field name="catch_all_original" type="text_general"
> indexed="true"
> stored="false" multiValued="true" />
>         <field name="catch_all" type="text_it" indexed="true"
> stored="false"
> multiValued="true" />
>
> <copyField source="marchio" dest="catch_all" />
>     <copyField source="categoria" dest="catch_all" />
>     <copyField source="descrizione" dest="catch_all" />
>     <copyField source="fornitore" dest="catch_all" />
>
>     <copyField source="marchio" dest="catch_all_original" />
>     <copyField source="categoria" dest="catch_all_original" />
>     <copyField source="descrizione" dest="catch_all_original" />
>     <copyField source="fornitore" dest="catch_all_original" />
> ...
>
>         <fieldType name="text_it" class="solr.TextField"
> positionIncrementGap="100">
>             <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>                 <filter class="solr.ElisionFilterFactory"
> ignoreCase="true"
> articles="lang/contractions_it.txt"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_it.txt" format="snowball" />
>                 <filter class="solr.ItalianLightStemFilterFactory"/>
>             </analyzer>
>             <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>                 <filter class="solr.ElisionFilterFactory"
> ignoreCase="true"
> articles="lang/contractions_it.txt"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_it.txt" format="snowball" />
>
>                 <filter class="solr.ItalianLightStemFilterFactory"/>
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>             </analyzer>
>         </fieldType>
>
> <br />
>
> *solr-config.xml*
> <requestHandler name="/select" class="solr.SearchHandler">
>
>         <lst name="defaults">
>             <str name="echoParams">explicit</str>
>             <int name="rows">10</int>
>             <str name="df">catch_all</str>
>
>             <str name="spellcheck">on</str>
>             <str name="spellcheck.dictionary">default</str>
>             <str name="spellcheck.dictionary">wordbreak</str>
>             <str name="spellcheck.extendedResults">false</str>
>             <str name="spellcheck.count">5</str>
>             <str name="spellcheck.alternativeTermCount">2</str>
>             <str name="spellcheck.maxResultsForSuggest">5</str>
>             <str name="spellcheck.collate">true</str>
>             <str name="spellcheck.collateExtendedResults">true</str>
>             <str name="spellcheck.maxCollationTries">5</str>
>             <str name="spellcheck.maxCollations">3</str>
>         </lst>
>
>         <arr name="last-components">
>             <str>spellcheck</str>
>         </arr>
>
>     </requestHandler>
> ...
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>         <str name="queryAnalyzerFieldType">text_general</str>
>
>         <lst name="spellchecker">
>             <str name="name">default</str>
>             <str name="field">catch_all_original</str>
>             <str name="classname">solr.DirectSolrSpellChecker</str>
>             <str name="distanceMeasure">internal</str>
>             <float name="accuracy">0.5</float>
>             <int name="maxEdits">2</int>
>             <int name="minPrefix">1</int>
>             <int name="maxInspections">5</int>
>             <int name="minQueryLength">4</int>
>             <float name="maxQueryFrequency">0.01</float>
>         </lst>
>
>         <lst name="spellchecker">
>             <str name="name">wordbreak</str>
>             <str name="classname">solr.WordBreakSolrSpellChecker</str>
>
>             <str name="field">catch_all_original</str>
>             <str name="combineWords">true</str>
>             <str name="breakWords">true</str>
>             <int name="maxChanges">10</int>
>             <int name="minBreakLength">3</int>
>         </lst>
>
>     </searchComponent>
>
>
> *Is the spellchecker the right solution or is this the case for something
> else, like the "more like this" feature?*
>
> Thank you
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182254.html
>  To unsubscribe from Suggesting broken words with
> solr.WordBreakSolrSpellChecker, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4182172&code=Zi5ib3p6b0AzLXcuaXR8NDE4MjE3MnwxODkyODA0NDQy>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Fabio Bozzo
SW Engineer

3W s.r.l.
Via Luisetti,7
13900-Biella ( BI )
Tel. 015.84.97.804 / 015.89.76.350
Fax 015.84.70.450

Registro imprese Biella n.01965270026
R.E.A. BI 175416

 Questo messaggio di posta elettronica contiene informazioni di carattere
confidenziale rivolte esclusivamente al destinatario sopra indicato.
E' vietato l'uso, la diffusione, distribuzione o riproduzione da parte di
ogni altra persona.
Nel caso aveste ricevuto questo messaggio di posta elettronica per errore,
siete pregati di segnalarlo immediatamente al mittente e distruggere quanto
ricevuto (compresi i file allegati) senza farne copia.
Qualsivoglia utilizzo non autorizzato del contenuto di questo messaggio
costituisce violazione dell'obbligo di non prendere cognizione della
corrispondenza tra altri soggetti, salvo piĆ¹ grave illecito, ed espone il
responsabile alle relative conseguenze.

This e-mail transmission may contain legally privileged and/or confidential
information.
Please do not read it if you are not the intended recipient(s). Any use,
distribution, reproduction or disclosure by any other person is strictly
prohibited. If you have received this e-mail in error, please notify.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182263.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to