Hi Joshua, what is the use-case? Do you need only the facets for one field (for each query)? Do you need all facet-values or only the first 10 in .sort=index (FACET_SORT_INDEX / numeric order) / in .sort=count (FACET_SORT_COUNT) ? How many different facet-valuss do you have per field? Do you only need this fields for faceted search?
Your problem will be, that solr normaly put a int[searcher.maxDoc()] array in main-memory for each field with facets. You can avoid this by using .method=enum which should not fit in your case. Because you do not have multiToken per document, your facets will compute by SimpleFacets#getFieldCacheCounts. In Version 3.1 you will find a TODO that fits your needs :-( In this method you will also see the the method use indirectly a WeakHashMap, so if you only use 100 fields per hour you should not have a problem :-) But there will be no warm up for your application (first facet search will take a while). >From my point of view you should program your own solr-PlugIn for your >purpose. This is not so hard, I assure you. Best regards Karsten -------- Joshua > Name equals the product name. > > Each separate product can have 1 to n prices based upon pricelist. > > A single document represents that single product. > > <doc> > <field name="id">1</field> > <field name="name">The product name.</field> > <field name="price">1.00</field> > <field name="priceList1Price">0.99</field> > <field name="priceList2Price">0.98</field> > <field name="priceList1500Price">0.85</field> > </doc> > <doc> > <field name="id">2</field> > <field name="name">The product name.</field> > <field name="price">1.10</field> > <field name="priceList1Price">1.09</field> > <field name="priceList2Price">1.08</field> > <field name="priceList1500Price">1.05</field> > </doc> > > Yes, the amount of pricelist could grow from 1000 to 5000 given the user > base grows. > > There are currently about 150,000 products. > > We do need to index the products, since they change frequently. > > Thanks everyone for all your responses so far!!!!! > > -----Original Message----- > From: kenf_nc [mailto:ken.fos...@realestate.com] > Sent: Wednesday, April 13, 2011 1:15 PM > To: solr-user@lucene.apache.org > Subject: RE: Indexing Question for large dataset > > Is NAME a product name? Why would it be multivalue? And why would it > appear > on more than one document? Is each 'document' a package of products? And > the pricing tiers are on the package, not individual pieces? > > So sounds like you could, potentially, have a PriceListX column for each > user. As your User base grows, the number of columns you need may grow > (you > already bumped up from 2000 to 5000 in the space of a couple posts :) ). > Is > that right? > > How many products (or packages of products) do you have? Could you flip > this > on its ear and make a User the document. Then it could have just 3 > multivalue fields (beyond any you need to identify the user like user_id) > product_id > product_name > product_price > > Downside is if a new product is introduced you have to re-index all users > that have a price point on that product. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-Question-for-large-dataset-tp2816344p2816994.html > Sent from the Solr - User mailing list archive at Nabble.com. > The recipient of this email should check this email and any attachments > for the presence of viruses. > The Wasserstrom Companies accepts no liability for any damage caused by > any virus transmitted by this email. > > This footnote also confirms that this email message has been scanned for > the presence of computer viruses. > > The Wasserstrom Companies