On Wed, May 11, 2016 at 10:16 AM, Derek Poh <d...@globalsources.com> wrote:
> Hi Erick > > Yes we have identified and fixed the page slow loading. > Derek, Can you elaborate more? What did you fix? > > I was wondering if there are any best practices when it comes to deciding > to create a single collection that stores all information in it or create > multiple sub collections. I understand now itdepends on the use-case. > My apologies for not giving it much thoughts before asking the questions. > Thank you for your patience. > > - Derek > > > On 5/10/2016 12:10 PM, Erick Erickson wrote: > >> Not quite sure where you are at with this. It sounds >> like your slow loading is fixed and was a coding >> issue on your part, that happens to us all. >> >> bq: Is it advisable to has as less number of >> queries to solr in a page? >> >> Of course it is advisable to have as few Solr queries >> executed to display a page as possible. Every one >> costs you at least _some_ turnaround time. You can >> mitigate this (assuming your Solr server isn't running >> flat out) by issuing the subsequent queries in parallel >> threads. >> >> But it's not really a question to me of advisability, it's a >> question of what your application needs to deliver. The >> use-case drives all. You can do some tricks like display >> partial pages and fill in the rest behind the scenes to >> display when your user clicks something and the like. >> >> bq: In my case, by denormalizing,that means putting the >> product and supplier information into one collection? >> The supplier information are stored but not indexed in the collection. >> >> It Depends(tm). If all you want to do is provide supplier >> information when people do product searches then stored-only >> is fine. >> >> If you want to perform queries like "show me all the products >> supplied by supplier X", then you need to index at least >> some values too. >> >> Best, >> Erick >> >> On Sun, May 8, 2016 at 10:36 PM, Derek Poh <d...@globalsources.com> >> wrote: >> >>> Hi Erick >>> >>> In my case, by denormalizing,that means putting the product and supplier >>> information into one collection? >>> The supplier information arestored but not indexed in thecollection. >>> >>> We haveidentified itwas a combination of a loop and bad source data that >>> caused an endless loop under certain scenario. >>> >>> Is it advisable to has as less number of queries to solr in a page? >>> >>> >>> On 5/6/2016 11:17 PM, Erick Erickson wrote: >>> >>>> Denormalizing the data is usually the first thing to try. That's >>>> certainly the preferred option if it doesn't bloat the index >>>> unacceptably. >>>> >>>> But my real question is what have you done to try to figure out _why_ >>>> it's slow? Do you have some loop >>>> like >>>> for (each found document) >>>> extract all the supplier IDs and query Solr for them) >>>> >>>> ? That's a fundamental design decision that will be expensive. >>>> >>>> Have you examined the time each query takes to see if Solr is really >>>> the bottleneck or whether it's "something else"? Mind you, I have no >>>> clue what "something else" is here.... >>>> >>>> Do you ever return lots of rows (i.e. thousands)? >>>> >>>> Solr serves queries very quickly, so I'd concentrate on identifying what >>>> is slow before jumping to a solution.... >>>> >>>> Best, >>>> Erick >>>> >>>> On Wed, May 4, 2016 at 10:28 PM, Derek Poh <d...@globalsources.com> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> We have a "product" collection and a "supplier" collection. >>>>> The "product" collection contains products information and "supplier" >>>>> collection contains the product's suppliers information. >>>>> We have a subsidiary page that query on "product" collection for the >>>>> search. >>>>> The display result include product and supplier information. >>>>> This page will query the "product" collection to get the matching >>>>> product >>>>> records. >>>>> From this query a list of the matching product's supplier id is >>>>> extracted >>>>> and used in a filter query against the "supplier" collection to get the >>>>> necessary supplier's information. >>>>> >>>>> The loading of this page is very slow, it leads to timeout at times as >>>>> well. >>>>> Beside looking at tweaking the codes of the page we are also looking at >>>>> what >>>>> tweaking can be done on solr side. Reducing the number of queries >>>>> generated >>>>> bythis page was one of the optionto try. >>>>> >>>>> The main "product" collection is also use by our site main search page >>>>> and >>>>> other subsidiary pages as well. So the query load on it is substantial. >>>>> It has about 6.5 million documents and index size of 38-39 GB. >>>>> It is setup as 1 shard with 5 replicas. Each replica is on it's own >>>>> server. >>>>> Total of 5 servers. >>>>> There are other smaller collections with similar 1 shard 5 replicas >>>>> setup >>>>> residing on these servers as well. >>>>> >>>>> I am thinking of either >>>>> 1. Index supplier information into the "product" collection. >>>>> 2. Create another similar "product" collection for this page to use. >>>>> This >>>>> collection will have lesser product fields and will include the >>>>> required >>>>> supplier fields. But the number of documents in it will be the same as >>>>> the >>>>> main "product" collection. The index size will be smallerthough. >>>>> >>>>> With either 2 options we do not need to query "supplier" collection. So >>>>> there is one less query and hopefully it will improve the performance >>>>> of >>>>> this page. >>>>> >>>>> What is the advise between the 2 options? >>>>> Any other advice or options? >>>>> >>>>> Derek >>>>> >>>>> ---------------------- >>>>> CONFIDENTIALITY NOTICE >>>>> This e-mail (including any attachments) may contain confidential and/or >>>>> privileged information. If you are not the intended recipient or have >>>>> received this e-mail in error, please inform the sender immediately and >>>>> delete this e-mail (including any attachments) from your computer, and >>>>> you >>>>> must not use, disclose to anyone else or copy this e-mail (including >>>>> any >>>>> attachments), whether in whole or in part. >>>>> This e-mail and any reply to it may be monitored for security, legal, >>>>> regulatory compliance and/or other appropriate reasons. >>>>> >>>> >>>> >>> ---------------------- >>> CONFIDENTIALITY NOTICE >>> This e-mail (including any attachments) may contain confidential and/or >>> privileged information. If you are not the intended recipient or have >>> received this e-mail in error, please inform the sender immediately and >>> delete this e-mail (including any attachments) from your computer, and >>> you >>> must not use, disclose to anyone else or copy this e-mail (including any >>> attachments), whether in whole or in part. >>> This e-mail and any reply to it may be monitored for security, legal, >>> regulatory compliance and/or other appropriate reasons. >>> >> >> > > ---------------------- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>