On Wed, May 11, 2016 at 10:16 AM, Derek Poh <d...@globalsources.com> wrote:

> Hi Erick
>
> Yes we have identified and fixed the page slow loading.
>

Derek,
Can you elaborate more? What did you fix?


>
> I was wondering if there are any best practices when it comes to deciding
> to create a single collection that stores all information in it or create
> multiple sub collections. I understand now itdepends on the use-case.
> My apologies for not giving it much thoughts before asking the questions.
> Thank you for your patience.
>
> - Derek
>
>
> On 5/10/2016 12:10 PM, Erick Erickson wrote:
>
>> Not quite sure where you are at with this. It sounds
>> like your slow loading is fixed and was a coding
>> issue on your part, that happens to us all.
>>
>> bq: Is it advisable to has as less number of
>> queries to solr in a page?
>>
>> Of course it is advisable to have as few Solr queries
>> executed to display a page as possible. Every one
>> costs you at least _some_ turnaround time. You can
>> mitigate this (assuming your Solr server isn't running
>> flat out) by issuing the subsequent queries in parallel
>> threads.
>>
>> But it's not really a question to me of advisability, it's a
>> question of what your application needs to deliver. The
>> use-case drives all. You can do some tricks like display
>> partial pages and fill in the rest behind the scenes to
>> display when your user clicks something and the like.
>>
>> bq: In my case, by denormalizing,that means putting the
>> product and supplier information into one collection?
>> The supplier information are stored but not indexed in the collection.
>>
>> It Depends(tm). If all you want to do is provide supplier
>> information when people do product searches then stored-only
>> is fine.
>>
>> If you want to perform queries like "show me all the products
>> supplied by supplier X", then you need to index at least
>> some values too.
>>
>> Best,
>> Erick
>>
>> On Sun, May 8, 2016 at 10:36 PM, Derek Poh <d...@globalsources.com>
>> wrote:
>>
>>> Hi Erick
>>>
>>> In my case, by denormalizing,that means putting the product and supplier
>>> information into one collection?
>>> The supplier information arestored but not indexed in thecollection.
>>>
>>> We haveidentified itwas a combination of a loop and bad source data that
>>> caused an endless loop under certain scenario.
>>>
>>> Is it advisable to has as less number of queries to solr in a page?
>>>
>>>
>>> On 5/6/2016 11:17 PM, Erick Erickson wrote:
>>>
>>>> Denormalizing the data is usually the first thing to try. That's
>>>> certainly the preferred option if it doesn't bloat the index
>>>> unacceptably.
>>>>
>>>> But my real question is what have you done to try to figure out _why_
>>>> it's slow? Do you have some loop
>>>> like
>>>> for (each found document)
>>>>      extract all the supplier IDs and query Solr for them)
>>>>
>>>> ? That's a fundamental design decision that will be expensive.
>>>>
>>>> Have you examined the time each query takes to see if Solr is really
>>>> the bottleneck or whether it's "something else"? Mind you, I have no
>>>> clue what "something else" is here....
>>>>
>>>> Do you ever return lots of rows (i.e. thousands)?
>>>>
>>>> Solr serves queries very quickly, so I'd concentrate on identifying what
>>>> is slow before jumping to a solution....
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, May 4, 2016 at 10:28 PM, Derek Poh <d...@globalsources.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> We have a "product" collection and a "supplier" collection.
>>>>> The "product" collection contains products information and "supplier"
>>>>> collection contains the product's suppliers information.
>>>>> We have a subsidiary page that query on "product" collection for the
>>>>> search.
>>>>> The display result include product and supplier information.
>>>>> This page will query the "product" collection to get the matching
>>>>> product
>>>>> records.
>>>>>   From this query a list of the matching product's supplier id is
>>>>> extracted
>>>>> and used in a filter query against the "supplier" collection to get the
>>>>> necessary supplier's information.
>>>>>
>>>>> The loading of this page is very slow, it leads to timeout at times as
>>>>> well.
>>>>> Beside looking at tweaking the codes of the page we are also looking at
>>>>> what
>>>>> tweaking can be done on solr side. Reducing the number of queries
>>>>> generated
>>>>> bythis page was one of the optionto try.
>>>>>
>>>>> The main "product" collection is also use by our site main search page
>>>>> and
>>>>> other subsidiary pages as well. So the query load on it is substantial.
>>>>> It has about 6.5 million documents and index size of 38-39 GB.
>>>>> It is setup as 1 shard with 5 replicas. Each replica is on it's own
>>>>> server.
>>>>> Total of 5 servers.
>>>>> There are other smaller collections with similar 1 shard 5 replicas
>>>>> setup
>>>>> residing on these servers as well.
>>>>>
>>>>> I am thinking of either
>>>>> 1. Index supplier information into the "product" collection.
>>>>> 2. Create another similar "product" collection for this page to use.
>>>>> This
>>>>> collection will have lesser product fields and will include the
>>>>> required
>>>>> supplier fields. But the number of documents in it will be the same as
>>>>> the
>>>>> main "product" collection. The index size will be smallerthough.
>>>>>
>>>>> With either 2 options we do not need to query "supplier" collection. So
>>>>> there is one less query and hopefully it will improve the performance
>>>>> of
>>>>> this page.
>>>>>
>>>>> What is the advise between the 2 options?
>>>>> Any other advice or options?
>>>>>
>>>>> Derek
>>>>>
>>>>> ----------------------
>>>>> CONFIDENTIALITY NOTICE
>>>>> This e-mail (including any attachments) may contain confidential and/or
>>>>> privileged information. If you are not the intended recipient or have
>>>>> received this e-mail in error, please inform the sender immediately and
>>>>> delete this e-mail (including any attachments) from your computer, and
>>>>> you
>>>>> must not use, disclose to anyone else or copy this e-mail (including
>>>>> any
>>>>> attachments), whether in whole or in part.
>>>>> This e-mail and any reply to it may be monitored for security, legal,
>>>>> regulatory compliance and/or other appropriate reasons.
>>>>>
>>>>
>>>>
>>> ----------------------
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-mail (including any attachments) from your computer, and
>>> you
>>> must not use, disclose to anyone else or copy this e-mail (including any
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it may be monitored for security, legal,
>>> regulatory compliance and/or other appropriate reasons.
>>>
>>
>>
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Reply via email to