This was one example. Users can even add phrase searches with wildcards/proximity etc so can't really use stemming.
Sharding is definitely something we are already looking into. On Wed, Mar 19, 2014 at 6:59 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Yes, that'll be slow. Wildcards are, at best, interesting and at worst > resource consumptive. Especially when you're doing this kind of > positioning information as well. > > Consider looking at the problem sideways. That is, what is your > purpose in searching for, say, "buy*"? You want to find buy, buying, > buyers, etc? Would you get bette results if you just stemmed and > omitted the wildcards? > > Do you have a restricted vocabulary that would allow you to define > synonyms for the "important" words and all their variants at index > time and use that? > > Finally, of course, you could shard your index (or add more shards if > you're already sharding) if you really _must_ support these kinds of > queries and can't work around the problem. > > Best, > Erick > > On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram > <salman.ak...@northbaysolutions.net> wrote: > > Anyone? > > > > > > On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram < > > salman.ak...@northbaysolutions.net> wrote: > > > >> Below is one of the sample slow query that takes mins! > >> > >> ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or > >> purchase* or repurchase*)) w/10 (executive or director) > >> > >> If a filter is used it comes in fq but what can be done about plain > >> keyword search? > >> > >> > >> On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson < > erickerick...@gmail.com>wrote: > >> > >>> What are our complex queries? You > >>> say that your app will very rarely see the > >>> same query thus you aren't using caches... > >>> But, if you can move some of your > >>> clauses to fq clauses, then the filterCache > >>> might well be used to good effect. > >>> > >>> > >>> > >>> On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram > >>> <salman.ak...@northbaysolutions.net> wrote: > >>> > 1- SOLR 4.6 > >>> > 2- We do but right now I am talking about plain keyword queries just > >>> sorted > >>> > by date. Once this is better will start looking into caches which we > >>> > already changed a little. > >>> > 3- As I said the contents are not stored in this index. Some other > >>> metadata > >>> > fields are but with normal queries its super fast so I guess even if > I > >>> > change there it will be a minor difference. We have SSD and quite > fast > >>> too. > >>> > 4- That's something we need to do but even in low workload those > queries > >>> > take a lot of time > >>> > 5- Every 10 mins and currently no auto warming as user queries are > >>> rarely > >>> > same and also once its fully warmed those queries are still slow. > >>> > 6- Nops. > >>> > > >>> > On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan <solrexp...@gmail.com> > >>> wrote: > >>> > > >>> >> 1. What is your solr version? In 4.x family the proximity searches > have > >>> >> been optimized among other query types. > >>> >> 2. Do you use the filter queries? What is the situation with the > cache > >>> >> utilization ratios? Optimize (= i.e. bump up the respective cache > >>> sizes) if > >>> >> you have low hitratios and many evictions. > >>> >> 3. Can you avoid storing some fields and only index them? When the > >>> field is > >>> >> stored and it is retrieved in the result, there are couple of disk > >>> seeks > >>> >> per field=> search slows down. Consider SSD disks. > >>> >> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do > >>> you > >>> >> observe STW GC pauses? > >>> >> 5. How often do you commit & do you have the autowarming / external > >>> warming > >>> >> configured? > >>> >> 6. If you use faceting, consider storing DocValues for facet fields. > >>> >> > >>> >> some solr wiki docs: > >>> >> > >>> >> > >>> > https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29 > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram < > >>> >> salman.ak...@northbaysolutions.net> wrote: > >>> >> > >>> >> > Well some of the searches take minutes. > >>> >> > > >>> >> > Below are some stats about this particular index that I am talking > >>> about: > >>> >> > > >>> >> > Index size = 400GB (Using CommonGrams so without that the index is > >>> around > >>> >> > 180GB) > >>> >> > Position File = 280GB > >>> >> > Total Docs = 170 million (just indexed for searching - for > >>> highlighting > >>> >> > contents are stored in another index) > >>> >> > Avg Doc Size = Few hundred KBs > >>> >> > RAM = 384GB (it has other indexes too but still OS cache can have > >>> 60-80% > >>> >> of > >>> >> > the total index cached) > >>> >> > > >>> >> > Phrase queries run pretty fast with CG but complex versions of > >>> wildcard > >>> >> and > >>> >> > proximity queries can be really slow. I know using CG will make > them > >>> slow > >>> >> > but they just take too long. By default sorting is on date but > users > >>> have > >>> >> > few other parameters too on which they can sort. > >>> >> > > >>> >> > I wanted to avoid creating multiple indexes (maybe based on years) > >>> but > >>> >> > seems that to search on partial data that's the only feasible way. > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan <solrexp...@gmail.com > > > >>> >> wrote: > >>> >> > > >>> >> > > As Hoss pointed out above, different projects have different > >>> >> > requirements. > >>> >> > > Some want to sort by date of ingestion reverse, which means that > >>> having > >>> >> > > posting lists organized in a reverse order with the early > >>> termination > >>> >> is > >>> >> > > the way to go (no such feature in Solr directly). Some other > >>> projects > >>> >> > want > >>> >> > > to collect all docs matching a query, and then sort by rank, but > >>> you > >>> >> > cannot > >>> >> > > guarantee, that the most recently inserted document is the most > >>> >> relevant > >>> >> > in > >>> >> > > terms of your ranking. > >>> >> > > > >>> >> > > > >>> >> > > Do your current searches take too long? > >>> >> > > > >>> >> > > > >>> >> > > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram < > >>> >> > > salman.ak...@northbaysolutions.net> wrote: > >>> >> > > > >>> >> > > > Its a long video and I will definitely go through it but it > seems > >>> >> this > >>> >> > is > >>> >> > > > not possible with SOLR as it is? > >>> >> > > > > >>> >> > > > I just thought it would be quite a common issue; I mean > >>> generally for > >>> >> > > > search engines its more important to show the first page > results, > >>> >> > rather > >>> >> > > > than using timeAllowed which might not even return a single > >>> result. > >>> >> > > > > >>> >> > > > Thanks! > >>> >> > > > > >>> >> > > > > >>> >> > > > -- > >>> >> > > > Regards, > >>> >> > > > > >>> >> > > > Salman Akram > >>> >> > > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > -- > >>> >> > > Dmitry > >>> >> > > Blog: http://dmitrykan.blogspot.com > >>> >> > > Twitter: http://twitter.com/dmitrykan > >>> >> > > > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Regards, > >>> >> > > >>> >> > Salman Akram > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Dmitry > >>> >> Blog: http://dmitrykan.blogspot.com > >>> >> Twitter: http://twitter.com/dmitrykan > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > Regards, > >>> > > >>> > Salman Akram > >>> > >> > >> > >> > >> -- > >> Regards, > >> > >> Salman Akram > >> > >> > > > > > > -- > > Regards, > > > > Salman Akram > -- Regards, Salman Akram