Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Alexandre Rafalovitch
Which is what I believe Ted Sullivan is working on and presented at the latest Lucene/Solr Revolution. His presentation does not seem to be up, but he was writing about it on: http://lucidworks.com/blog/author/tedsullivan/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
I believe that very many installations of solr actually need a query expansion such as the one you describe below with an indexing of each textual fields in multiple forms (string, straight (whitespace/ideaograms), stemmed, phonetic). Thanks to edismax, I think, you would do the following

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Erick Erickson
Yeah, that's actually a tough one. You have no control over what the user types, you have to try to guess what they meant. To do that right, you really have to have some meta-data besides what the user typed in, i.e. recognize "kate" and "winslet" are proper names and "movies" is something else

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Daniel Valdivia
Perhaps q=name:("Kate AND Winslet") q=name:("Kate Winslet") Sent from my iPhone > On Oct 31, 2015, at 10:21 PM, Yangrui Guo wrote: > > Thanks for the reply. Putting the name: before the terms did the work. I > just wanted to generalize the search query because users

Invalid parsing with solr edismax operators

2015-11-01 Thread Mahmoud Almokadem
Hello, I'm using solr 4.8.1. Using edismax as the parser we got the undesirable parsed queries and results. The following is two different cases with strange behavior: Searching with these parameters "mm":"2", "df":"TotalField", "debug":"true", "indent":"true", "fl":"Title",

How turn on logging for segment merging

2015-11-01 Thread Pushkar Raste
Is segment merging information logged at level finer than INFO? I have application setup with INFO level logging and I am indexing documents at rate of about few hundred a min. I am using default merge policy parameters. However I never see logs that can give me information about segment merging.

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
Could you tell me more about the edismax approach? I'm new to it. Thanks a lot On Sunday, November 1, 2015, Erick Erickson wrote: > If your goal is to have docs with "kate" and "winslet" > in the _name_ field be scored higher, just make that > explicit as > name:(kate

logical steps to configuring file-based spell-check

2015-11-01 Thread Mark Fenbers
Greetings! I want my spell-checker to be based on a file (/usr/share/dict/linux.words should suffice). Word-breaks features would also be a benefit. I have previously indexed my docs for searching with minimal alterations to the baseline Solr configuration. My "docs" are user-typed text,

Re: How turn on logging for segment merging

2015-11-01 Thread Tomás Fernández Löbbe
You can turn on "infoStream" from the solrconfig: https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-OtherIndexingSettings Tomás On Sun, Nov 1, 2015 at 8:59 AM, Pushkar Raste wrote: > Is segment merging information logged

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
I debugged the query and found the query has been translated into _text_:Kate AND _text_:Winslet, which _text_ is the default search field. Because my documents use parent/child relation it appeared that if there's no exact match of Kate Winslet, solr will return all documents contains "Kate" and

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Erick Erickson
If your goal is to have docs with "kate" and "winslet" in the _name_ field be scored higher, just make that explicit as name:(kate AND winslet) perhaps boosting as name:(kate AND winslet)^10 or add it as a clause q=kate AND winslet OR name:(kate AND winslet)^10 or even q=kate AND winslet OR

Re: org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id

2015-11-01 Thread fabigol
hi, if i understand well. In the configuration following:

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Yangrui Guo
I've just read the post and it has addressed much of my issue. It is hard to detect phrases and disambiguate phrases but some existing approaches seem really promising. On Sunday, November 1, 2015, Paul Libbrecht wrote: > Alexandre, > > I guess you are talking about that

Re: Question on index time de-duplication

2015-11-01 Thread shamik
That's what I observed as well. Perhaps there's a way to customize SignatureUpdateProcessorFactory to support my use case. I'll look into the source code and figure if there's a way to do it. -- View this message in context:

Re: Is it possible to use JiebaTokenizer for multilingual documents?

2015-11-01 Thread Zheng Lin Edwin Yeo
Here's my configuration in schmea.xml for the JiebaTokenizerFactory. Could there be any problems that might be causing the English characters issue? Regards, Edwin On 29 October 2015 at 17:51, Zheng Lin Edwin Yeo wrote: > I would

Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Paul Libbrecht
Alexandre, I guess you are talking about that post: http://lucidworks.com/blog/2015/06/06/query-autofiltering-extended-language-logic-search/ I think it is very often impossible to solve properly. Words such as "direction" have very many meanings and would come in different fields. In IMDB,

Very high memory and CPU utilization.

2015-11-01 Thread Modassar Ather
Hi, I have a setup of 12 shard cluster started with 28gb memory each on a single server. There are no replica. The size of index is around 90gb on each shard. The Solr version is 5.2.1. When I query "network se*", the memory utilization goes upto 24-26 gb and the query takes around 3+ minutes to

warning

2015-11-01 Thread Midas A
Please explain following warning Starting log replay tlog{file=/mnt/vol1/path/data/tlog/tlog.0060544 refcount=2} active=false starting pos=0 Is there any harm with this error ?