Re: Solr Pagination

2015-10-12 Thread Jan Høydahl
Salman, You say that you optimized your index from Admin. You should not do that, however strange it sounds. 70M docs on 2 shards means 35M docs per shard. What you do when you call optimize is to force Lucene to merge all those 35M docs into ONE SINGLE index segment. You get better HW

Re: Indexing logs when using post,jar

2015-10-12 Thread Jan Høydahl
Hi The answer is no. When you run the tool you are responsible to redirect its output to file yourself if you want to keep it. Also, the tool is mostly meant as a quick way to post docs during development and testing, not for production. A tool built for production would need things like

Re: Indexing logs when using post,jar

2015-10-12 Thread Zheng Lin Edwin Yeo
Hi Jan, Thank you for your reply. I've managed to direct the output to a log file. As for production, which tool will you recommend to be used for indexing? Regards, Edwin On 12 October 2015 at 15:36, Jan Høydahl wrote: > Hi > > The answer is no. When you run the tool

Spell Check and Privacy

2015-10-12 Thread Arnon Yogev
Hi, Our system supports many users from different organizations and with different ACLs. We consider adding a spell check ("did you mean") functionality using DirectSolrSpellChecker. However, a privacy concern was raised, as this might lead to private information being revealed between users

Re: Using SimpleNaiveBayesClassifier in solr

2015-10-12 Thread Tommaso Teofili
Hi Yewint, the SNB classifier is not an online one, so you should retrain it every time you want to update it. What you pass to the Classifier is a Reader therefore you should grant that this keeps being accessible (not close it) for classification to work. Regarding performance SNB becomes

Re: Selective field query

2015-10-12 Thread Colin Hunter
Thanks Erick, I'm sure this will be valuable in implementing ngram filter factory On Fri, Oct 9, 2015 at 4:38 PM, Erick Erickson wrote: > Colin: > > Adding =all to your query is your friend here, the > parsed_query.toString will show you exactly what > is searched

How to formulate query

2015-10-12 Thread Prasanna S. Dhakephalkar
Hi, I am trying to make a solr search query to get result as under I am unable to get do I have a search term say "pit" The result should have (in that order) All docs that have "pit" as first WORD in search field (pit\ *)+ All docs that have first WORD that starts with "pit"

RE: NullPointerException

2015-10-12 Thread Duck Geraint (ext) GBJH
"When I use the Admin UI (v5.3.0), and check the spellcheck.build box" Out of interest, where is this option within the Admin UI? I can't find anything like it in mine... Do you get the same issue by submitting the build command directly with something like this instead:

Re: Solr cross core join special condition

2015-10-12 Thread Ali Nazemian
Thank you very much. Sincerely yours. On Mon, Oct 12, 2015 at 6:15 AM, Susheel Kumar wrote: > Yes, Ali. These are targeted for Solr 6 but you have the option download > source from trunk, build it and try out these features if that helps in the > meantime. > > Thanks >

Re: How to use FuzzyQuery in schema.xml

2015-10-12 Thread Upayavira
The fuzzy query does not need mentioning in schema.xml. a search for Steve~ or Steve~0.5 will trigger a fuzzy query. Upayavira On Sat, Oct 10, 2015, at 08:27 PM, vit wrote: > I am using Solr 4.2 > For some reason I cannot find an example of FuzzyQuery > filter in schema.xml. > Maybe I am in a

Re: Solr Pagination

2015-10-12 Thread Toke Eskildsen
On Mon, 2015-10-12 at 10:05 +0200, Jan Høydahl wrote: > What you do when you call optimize is to force Lucene to merge all > those 35M docs into ONE SINGLE index segment. You get better HW > utilization if you let Lucene/Solr automatically handle merging, > meaning you’ll have around 10 smaller

Re: NullPointerException

2015-10-12 Thread Mark Fenbers
On 10/12/2015 5:38 AM, Duck Geraint (ext) GBJH wrote: "When I use the Admin UI (v5.3.0), and check the spellcheck.build box" Out of interest, where is this option within the Admin UI? I can't find anything like it in mine... This is in the expanded options that open up once I put a checkmark in

Re: Solr cross core join special condition

2015-10-12 Thread Ali Nazemian
Dear Shawn, Hi, Since in Yonki's Solr blog it is mentioned that this feature is one of the Solr 5.4 features. I assume it will back-ported to the next stable release (5.4). Please correct me if it is the wrong assumption. Thank you very much. Sincerely yours. On

Re: admin-extra

2015-10-12 Thread Upayavira
Do you use it? If so, how? Upayavira On Mon, Oct 12, 2015, at 02:05 AM, Bill Au wrote: > admin-extra allows one to include additional links and/or information in > the Solr admin main page: > > https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools > > Bill > > On Wed, Oct 7,

Re: Using SimpleNaiveBayesClassifier in solr

2015-10-12 Thread Alessandro Benedetti
Hi Yewint, > > The sample test code inside seems like that classifier read the whole index > db to train the model everytime when classification happened for > inputDocument. or am I misunderstanding something here? I would suggest you to take a look to a couple of articles I wrote last summer

Re: How to use FuzzyQuery in schema.xml

2015-10-12 Thread vit
Thanks Upayavira for clarification. This works for one token query, but when I try it in a multi tokens like "Home Builders~" or "Home Builders~0.5" it does not work. -- View this message in context:

Re: catchall fields or multiple fields

2015-10-12 Thread Jack Krupansky
I think it may all depend on the nature of your application and how much commonality there is between fields. One interesting area is auto-suggest, where you can certainly suggest from the union of all fields, you may want to give priority to suggestions from preferred fields. For example, for

Re: Replication and soft commits for NRT searches

2015-10-12 Thread Erick Erickson
First of all, setting soft commit with maxDocs=1 is almost (but not quite) guaranteed to lead to problems. For _every_ document you add to Solr, all your top-level caches (i.e. the ones configured in solrconrig.xml) will be thrown away, all autowarming will be performed etc. Essentially assuming a

Replication and soft commits for NRT searches

2015-10-12 Thread MOIS Martin (MORPHO)
Hello, I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been created with replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=1 and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior. As far

Re: No live SolrServers available to handle this request

2015-10-12 Thread Steve
Thanks Mark, I rebuilt and made sure the versions matched. It works. Not sure how that happened tho.. thx. .strick On Thu, Oct 8, 2015 at 4:31 PM, Mark Miller wrote: > Your Lucene and Solr versions must match. > > On Thu, Oct 8, 2015 at 4:02 PM Steve

Re: catchall fields or multiple fields

2015-10-12 Thread Trey Grainger
Elisabeth, Yes, it will almost always be more efficient to search within a catch-all field than to search across multiple fields. Think of it this way: when you search on a single field, you are doing a single keyword search against the index per term. When you search across multiple fields, you

Re: How do I set up custom collection cores?

2015-10-12 Thread espeake
From: Shawn Heisey To: solr-user@lucene.apache.org Date: 10/09/2015 12:33 PM Subject:Re: How do I set up custom collection cores? On 10/9/2015 10:03 AM, espe...@oreillyauto.com wrote: > We are installing Alfresco One 5.0.1 with solr4 on a server that

catchall fields or multiple fields

2015-10-12 Thread elisabeth benoit
Hello, We're using solr 4.10 and storing all data in a catchall field. It seems to me that one good reason for using a catchall field is when using scoring with idf (with idf, a word might not have same score in all fields). We got rid of idf and are now considering using multiple fields. I

RE: Spell Check and Privacy

2015-10-12 Thread Dyer, James
Arnon, Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a non-zero value. This will give you re-written queries that are guaranteed to return hits, given the original query and filters. If you are using an "mm" value other than 100%, you also will want specify

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-10-12 Thread RohanaR
Has this been fixed now so that phrase queries given in double quotes work? I am trying this and encountered the same problem due to original order of tokens in the index are not preserved. How can I fix this (if not fixed yet)? RohanaR -- View this message in context:

Re: Have anyone used Automatic Phrase Tokenization (AutoPhrasingTokenFilterFactory) ?

2015-10-12 Thread RohanaR
Has this been fixed now so that phrase queries given in double quotes work? I am trying this and encountered the same problem due to original order of tokens in the index are not preserved. How can I fix this (if not fixed yet)? -- View this message in context:

Re: catchall fields or multiple fields

2015-10-12 Thread Ahmet Arslan
Hi, Catch-all field: No need to worry about how to aggregate scores coming from different fields. But you cannot utilize different analysers for different fields. Multiple-fields: You can play with edismax's parameters on-the-fly, without having to re-index. It is flexible that you can

Fwd: Grouping facets: Possible to get facet results for each Group?

2015-10-12 Thread Peter Sturge
Hello Solr Forum, Been trying to coerce Group faceting to give some faceting back for each group, but maybe this use case isn't catered for in Grouping? : So the Use Case is this: Let's say I do a grouped search that returns say, 9 distinct groups, and in these groups are various numbers of

Re: Spell Check and Privacy

2015-10-12 Thread Susheel Kumar
Hi Arnon, I couldn't fully understood your use case regarding Privacy. Are you concerned that SpellCheck may reveal user names part of suggestions which could have belonged to different organizations / ACLS OR after providing suggestions you are concerned that user may be able to click and view

Re: catchall fields or multiple fields

2015-10-12 Thread Walter Underwood
Why get rid of idf? Most often, idf is a big help in relevance. I’ve used different weights for different parts of the document, like weighting the title 8X the body. I’ve used different weights for different analysis chains. If we have three fields, one lowercased, one stemmed, and one a

Re: How to formulate query

2015-10-12 Thread Erick Erickson
Nothing exists currently that would do this. I would urge you to revisit the requirements, this kind of super-specific ordering is often not worth the effort to try to enforce, how does the _user_ benefit here? Best, Erick On Mon, Oct 12, 2015 at 12:47 AM, Prasanna S. Dhakephalkar

Re: How to formulate query

2015-10-12 Thread Mikhail Khludnev
Hello, Even number of word can be used as scoring factor, but just for beginning. You can cut the first word into separate field with _field mutating update processor_ see http://lucene.apache.org/solr/5_3_1/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html then

AutoComplete Feature in Solr

2015-10-12 Thread Salman Ansari
Hi, I have been trying to get the autocomplete feature in Solr working with no luck up to now. First I read that "suggest component" is the recommended way as in the below article (and this is the exact functionality I am looking for, which is to autocomplete multiple words)

EdgeNGramFilterFactory for phrases

2015-10-12 Thread vit
I use Solr 4.2 I creted a field with the following analyzer : for both index and search. Maybe KStem is an overkill but I do not think it is important here. On phrase search "Peak physical" it

Re: How to formulate query

2015-10-12 Thread Susheel Kumar
Hi Prassana, This is a highly custom relevancy/ordering requirement and one possible way you can try is by creating multiple fields and coming up with query for each of the searches and boost them accordingly. Thnx On Mon, Oct 12, 2015 at 12:50 PM, Erick Erickson wrote:

File-based Spelling

2015-10-12 Thread Mark Fenbers
Greetings! I'm attempting to use a file-based spell checker. My sourceLocation is /usr/share/dict/linux.words, and my spellcheckIndexDir is set to ./data/spFile. BuildOnStartup is set to true, and I see nothing to suggest any sort of problem/error in solr.log. However, in my

Re: How do I set up custom collection cores?

2015-10-12 Thread Shawn Heisey
On 10/12/2015 10:31 AM, espe...@oreillyauto.com wrote: > WARNING: A docBase /var/lib/Tomcat7/webapps/solr4.war inside the host > appBase has been specified, and will be ignored That is a Tomcat configuration problem. I googled to see what I could find. It sounds to me like you have specified

Re: File-based Spelling

2015-10-12 Thread Erick Erickson
Let's see your solrconfig entries? Doubtless something innocent seeming isn't quite right. This might provide some clues: http://lucidworks.com/blog/2015/03/04/solr-suggester/ The reference guide is the first place to look, a lot of this functionality has changed in recent years so I always try

Re: AutoComplete Feature in Solr

2015-10-12 Thread Erick Erickson
Some of the links you're looking at are quite old, and a lot has changed, assuming you're on a recent Solr version. It's usually best to look at the Solr reference guide, see: https://cwiki.apache.org/confluence/display/solr/Suggester This might also help:

Re: Grouping facets: Possible to get facet results for each Group?

2015-10-12 Thread Alexandre Rafalovitch
Could you use the new nested facets syntax? http://yonik.com/solr-subfacets/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 11 October 2015 at 09:51, Peter Sturge wrote: > Been trying to coerce Group

Re: Highlight with NGram and German S Sharp "ß"

2015-10-12 Thread Scott Stults
My guess is that the boundary scanner isn't configured right for your highlighter. Try setting the bs.language and bs.country parameters either in your request or in the requestHandler. k/r, Scott On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes wrote: > Dear Solr

Re: are there any SolrCloud supervisors?

2015-10-12 Thread Scott Stults
Something like Exhibitor for Zookeeper? Very cool! Don't worry too much about cleaning up the repo. When it comes time to integrate it with Solr or make it an Apache top-level project you can start with a fresh commit history :) -Scott On Fri, Oct 2, 2015 at 3:09 PM, r b

Re: Selective field query

2015-10-12 Thread Scott Stults
Colin, The other thing you'll want to keep in mind (and you'll find this out with debugQuery) is that the query parser is going to take your ServiceName:(Search Service) and turn it into two queries -- ServiceName:(Search) ServiceName:(Service). That's because the query parser breaks on

Re: are there any SolrCloud supervisors?

2015-10-12 Thread Trey Grainger
I'd be very interested in taking a look if you post the code. Trey Grainger Co-Author, Solr in Action Director of Engineering, Search & Recommendations @ CareerBuilder On Fri, Oct 2, 2015 at 3:09 PM, r b wrote: > I've been working on something that just monitors ZooKeeper