This was one example. Users can even add phrase searches with
wildcards/proximity etc so can't really use stemming.

Sharding is definitely something we are already looking into.


On Wed, Mar 19, 2014 at 6:59 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Yes, that'll be slow. Wildcards are, at best, interesting and at worst
> resource consumptive. Especially when you're doing this kind of
> positioning information as well.
>
> Consider looking at the problem sideways. That is, what is your
> purpose in searching for, say, "buy*"? You want to find buy, buying,
> buyers, etc? Would you get bette results if you just stemmed and
> omitted the wildcards?
>
> Do you have a restricted vocabulary that would allow you to define
> synonyms for the "important" words and all their variants at index
> time and use that?
>
> Finally, of course, you could shard your index (or add more shards if
> you're already sharding) if you really _must_ support these kinds of
> queries and can't work around the problem.
>
> Best,
> Erick
>
> On Tue, Mar 18, 2014 at 11:21 PM, Salman Akram
> <salman.ak...@northbaysolutions.net> wrote:
> > Anyone?
> >
> >
> > On Mon, Mar 17, 2014 at 12:03 PM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> >> Below is one of the sample slow query that takes mins!
> >>
> >> ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or
> >> purchase* or repurchase*)) w/10 (executive or director)
> >>
> >> If a filter is used it comes in fq but what can be done about plain
> >> keyword search?
> >>
> >>
> >> On Sun, Mar 16, 2014 at 4:37 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> >>
> >>> What are our complex queries? You
> >>> say that your app will very rarely see the
> >>> same query thus you aren't using caches...
> >>> But, if you can move some of your
> >>> clauses to fq clauses, then the filterCache
> >>> might well be used to good effect.
> >>>
> >>>
> >>>
> >>> On Thu, Mar 13, 2014 at 7:22 AM, Salman Akram
> >>> <salman.ak...@northbaysolutions.net> wrote:
> >>> > 1- SOLR 4.6
> >>> > 2- We do but right now I am talking about plain keyword queries just
> >>> sorted
> >>> > by date. Once this is better will start looking into caches which we
> >>> > already changed a little.
> >>> > 3- As I said the contents are not stored in this index. Some other
> >>> metadata
> >>> > fields are but with normal queries its super fast so I guess even if
> I
> >>> > change there it will be a minor difference. We have SSD and quite
> fast
> >>> too.
> >>> > 4- That's something we need to do but even in low workload those
> queries
> >>> > take a lot of time
> >>> > 5- Every 10 mins and currently no auto warming as user queries are
> >>> rarely
> >>> > same and also once its fully warmed those queries are still slow.
> >>> > 6- Nops.
> >>> >
> >>> > On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan <solrexp...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> 1. What is your solr version? In 4.x family the proximity searches
> have
> >>> >> been optimized among other query types.
> >>> >> 2. Do you use the filter queries? What is the situation with the
> cache
> >>> >> utilization ratios? Optimize (= i.e. bump up the respective cache
> >>> sizes) if
> >>> >> you have low hitratios and many evictions.
> >>> >> 3. Can you avoid storing some fields and only index them? When the
> >>> field is
> >>> >> stored and it is retrieved in the result, there are couple of disk
> >>> seeks
> >>> >> per field=> search slows down. Consider SSD disks.
> >>> >> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do
> >>> you
> >>> >> observe STW GC pauses?
> >>> >> 5. How often do you commit & do you have the autowarming / external
> >>> warming
> >>> >> configured?
> >>> >> 6. If you use faceting, consider storing DocValues for facet fields.
> >>> >>
> >>> >> some solr wiki docs:
> >>> >>
> >>> >>
> >>>
> https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
> >>> >> salman.ak...@northbaysolutions.net> wrote:
> >>> >>
> >>> >> > Well some of the searches take minutes.
> >>> >> >
> >>> >> > Below are some stats about this particular index that I am talking
> >>> about:
> >>> >> >
> >>> >> > Index size = 400GB (Using CommonGrams so without that the index is
> >>> around
> >>> >> > 180GB)
> >>> >> > Position File = 280GB
> >>> >> > Total Docs = 170 million (just indexed for searching - for
> >>> highlighting
> >>> >> > contents are stored in another index)
> >>> >> > Avg Doc Size = Few hundred KBs
> >>> >> > RAM = 384GB (it has other indexes too but still OS cache can have
> >>> 60-80%
> >>> >> of
> >>> >> > the total index cached)
> >>> >> >
> >>> >> > Phrase queries run pretty fast with CG but complex versions of
> >>> wildcard
> >>> >> and
> >>> >> > proximity queries can be really slow. I know using CG will make
> them
> >>> slow
> >>> >> > but they just take too long. By default sorting is on date but
> users
> >>> have
> >>> >> > few other parameters too on which they can sort.
> >>> >> >
> >>> >> > I wanted to avoid creating multiple indexes (maybe based on years)
> >>> but
> >>> >> > seems that to search on partial data that's the only feasible way.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan <solrexp...@gmail.com
> >
> >>> >> wrote:
> >>> >> >
> >>> >> > > As Hoss pointed out above, different projects have different
> >>> >> > requirements.
> >>> >> > > Some want to sort by date of ingestion reverse, which means that
> >>> having
> >>> >> > > posting lists organized in a reverse order with the early
> >>> termination
> >>> >> is
> >>> >> > > the way to go (no such feature in Solr directly). Some other
> >>> projects
> >>> >> > want
> >>> >> > > to collect all docs matching a query, and then sort by rank, but
> >>> you
> >>> >> > cannot
> >>> >> > > guarantee, that the most recently inserted document is the most
> >>> >> relevant
> >>> >> > in
> >>> >> > > terms of your ranking.
> >>> >> > >
> >>> >> > >
> >>> >> > > Do your current searches take too long?
> >>> >> > >
> >>> >> > >
> >>> >> > > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
> >>> >> > > salman.ak...@northbaysolutions.net> wrote:
> >>> >> > >
> >>> >> > > > Its a long video and I will definitely go through it but it
> seems
> >>> >> this
> >>> >> > is
> >>> >> > > > not possible with SOLR as it is?
> >>> >> > > >
> >>> >> > > > I just thought it would be quite a common issue; I mean
> >>> generally for
> >>> >> > > > search engines its more important to show the first page
> results,
> >>> >> > rather
> >>> >> > > > than using timeAllowed which might not even return a single
> >>> result.
> >>> >> > > >
> >>> >> > > > Thanks!
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > --
> >>> >> > > > Regards,
> >>> >> > > >
> >>> >> > > > Salman Akram
> >>> >> > > >
> >>> >> > >
> >>> >> > >
> >>> >> > >
> >>> >> > > --
> >>> >> > > Dmitry
> >>> >> > > Blog: http://dmitrykan.blogspot.com
> >>> >> > > Twitter: http://twitter.com/dmitrykan
> >>> >> > >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Regards,
> >>> >> >
> >>> >> > Salman Akram
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Dmitry
> >>> >> Blog: http://dmitrykan.blogspot.com
> >>> >> Twitter: http://twitter.com/dmitrykan
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Regards,
> >>> >
> >>> > Salman Akram
> >>>
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Salman Akram
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
>



-- 
Regards,

Salman Akram

Reply via email to