Hi Joshua,

what is the use-case?
Do you need only the facets for one field (for each query)?
Do you need all facet-values or only the first 10 in .sort=index 
(FACET_SORT_INDEX / numeric order) / in .sort=count (FACET_SORT_COUNT) ?
How many different facet-valuss do you have per field?
Do you only need this fields for faceted search?


Your problem will be, that solr normaly put a int[searcher.maxDoc()] array in 
main-memory for each field with facets.
You can avoid this by using .method=enum which should not fit in your case.

Because you do not have multiToken per document, your facets will compute by 
SimpleFacets#getFieldCacheCounts. In Version 3.1 you will find a TODO that fits 
your needs :-(
In this method you will also see the the method use indirectly a WeakHashMap, 
so if you only use 100 fields per hour you should not have a problem :-)
But there will be no warm up for your application (first facet search will take 
a while).

>From my point of view you should program your own solr-PlugIn for your 
>purpose. This is not so hard, I assure you.

Best regards
  Karsten



-------- Joshua

> Name equals the product name. 
> 
> Each separate product can have 1 to n prices based upon pricelist.
> 
> A single document represents that single product.
> 
> <doc>
>       <field name="id">1</field>
>       <field name="name">The product name.</field>
>       <field name="price">1.00</field>
>       <field name="priceList1Price">0.99</field>
>       <field name="priceList2Price">0.98</field>
>       <field name="priceList1500Price">0.85</field>
> </doc>
> <doc>
>       <field name="id">2</field>
>       <field name="name">The product name.</field>
>       <field name="price">1.10</field>
>       <field name="priceList1Price">1.09</field>
>       <field name="priceList2Price">1.08</field>
>       <field name="priceList1500Price">1.05</field>
> </doc>
> 
> Yes, the amount of pricelist could grow from 1000 to 5000 given the user
> base grows.
> 
> There are currently about 150,000 products.
> 
> We do need to index the products, since they change frequently.
> 
> Thanks everyone for all your responses so far!!!!!
> 
> -----Original Message-----
> From: kenf_nc [mailto:ken.fos...@realestate.com] 
> Sent: Wednesday, April 13, 2011 1:15 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Indexing Question for large dataset
> 
> Is NAME a product name? Why would it be multivalue? And why would it
> appear
> on more than one document?  Is each 'document' a package of products? And
> the pricing tiers are on the package, not individual pieces?
> 
> So sounds like you could, potentially, have a PriceListX column for each
> user. As your User base grows, the number of columns you need may grow
> (you
> already bumped up from 2000 to 5000 in the space of a couple posts :) ).
> Is
> that right?
> 
> How many products (or packages of products) do you have? Could you flip
> this
> on its ear and make a User the document. Then it could have just 3
> multivalue fields (beyond any you need to identify the user like user_id)
>     product_id
>     product_name
>     product_price
> 
> Downside is if a new product is introduced you have to re-index all users
> that have a price point on that product.  
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-Question-for-large-dataset-tp2816344p2816994.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> The recipient of this email should check this email and any attachments
> for the presence of viruses. 
> The Wasserstrom Companies accepts no liability for any damage caused by
> any virus transmitted by this email.
> 
> This footnote also confirms that this email message has been scanned for
> the presence of computer viruses.
> 
> The Wasserstrom Companies

Reply via email to