Status of Spelt integration

2009-11-30 Thread Andrey Klochkov
Hi all I searched through the mail-list archives and saw that sometime ago Toby Cole was going to integrate a spellchecker named Spelt into Solr. Does anyone now what's the status of this? Anyone tried to use it with Solr? Does it make sense to try it instead of standard spell checker? Some

Re: Retrieving large num of docs

2009-11-28 Thread Andrey Klochkov
Hi Raghu Let me describe our use case in more details. Probably that will clarify things. The usual use case for Lucene/Solr is retrieving of small portion of the result set (10-20 documents). In our case we need to read the whole result set and this creates huge load on Lucene index, meaning a

Re: restore space between words by spell checker

2009-11-28 Thread Andrey Klochkov
For example, if we get query tommyhitfiger and have terms tommy and hitfiger in the index, how to fix the query? The usual approach to solving this is to index compound words, i.e. when producing a spellchecker dictionary add a record tommyhitfiger with a field that points to tommy

restore space between words by spell checker

2009-11-27 Thread Andrey Klochkov
Hi If a user issued a misspelled query, forgetting to place space between words, is it possible to fix it with a spell checker or by some other mechanism? For example, if we get query tommyhitfiger and have terms tommy and hitfiger in the index, how to fix the query? -- Andrew Klochkov Senior

WordDelimiterFilter and acronyms normalization

2009-11-26 Thread Andrey Klochkov
Hi all! Is there any ready-for-use filter which performs acronyms normalization such as I.N.C.-INC? I see that Lucene's StandardFilter can do this but we can't use it as we're using WhitespaceTokenizer instead of StandardTokenizer. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics

Re: Retrieving large num of docs

2009-11-26 Thread Andrey Klochkov
Hi We obtain ALL documents for every query, the index size is about 50k. We use number of stored fields. Often the result set size is several thousands of docs. We performed the following things to make it faster: 1. Use EmbeddedSolrServer 2. Patch Solr to avoid unnecessary marshalling while

Re: Get one document from each category

2009-11-24 Thread Andrey Klochkov
Hi I think you need field collapsing, look here http://wiki.apache.org/solr/FieldCollapsing 2009/11/24 Tomasz Kępski tom...@kepski.pl Hi, I have the following case: In my index I do have documents categorized (category_id - int sortable field). I would like to get three top documents

Re: Huge load and long response times during search

2009-11-23 Thread Andrey Klochkov
Tom, AFAIK Lucene performance is very much dependent on file system cache size, in case of large index. So if you see lots of IO, this probably means that your system doesn't have enough memory to hold large file system cache, suitable for your index size. In this case you don't need to give more

Re: lucid kstem group and artifact id to put in POM

2009-11-04 Thread Andrey Klochkov
Hi Just install it manually with mvn install On Wed, Nov 4, 2009 at 1:13 AM, darniz rnizamud...@edmunds.com wrote: Hello Right now we are using lucid Kstemmer and it works fine and the two jars required lucid-kstem.jar and lucid-solr-kstem.jar are present in our web app. i am trying to

Re: lock issue

2009-05-28 Thread Andrey Klochkov
Even if you point multiple embedded solr servers to the same index, you should write with only one. Once you do a commit on the writer, you'd need But it's what lucene index locking is for, isn't it? Locking should handle that issue. Solr doesn't use index locking in a proper way? -- Andrew

Re: query clause and filter query

2009-05-20 Thread Andrey Klochkov
Read Consider using filters section here: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed On Wed, May 20, 2009 at 10:24 AM, Ashish P ashish.ping...@gmail.com wrote: what is the difference between query clause and filter query?? Thanks, Ashish -- View this message in context:

Re: How to retrieve all available Cores in a static way ?

2009-05-20 Thread Andrey Klochkov
AFAIK there's no way of getting it in static way. If you look into SolrDispatchFilter.java, you'll see this lines: // put the core container in request attribute req.setAttribute(org.apache.solr.CoreContainer, cores); So later in your servlet you can get this request attribute, I do it in this

Re: Search in all the fields q=(*:test)

2009-05-19 Thread Andrey Klochkov
I suppose that when you use * as field name, Solr search in default search field as long as Lucene doesn't support searching through several fields as far as I know. Read here: http://wiki.apache.org/solr/SchemaXml#head-b80c539a0a01eef8034c3776e49e8fe1c064f496 On Tue, May 19, 2009 at 5:46 PM,

Re: Solr vs Sphinx

2009-05-14 Thread Andrey Klochkov
My most recent example of this is BooleanQuery's performance. It turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable performance gain (27% on my most recent test) for OR queries. Mike, Can you please point me to some information concerning allowDocsOutOfOrder? What's this

Re: Restarting tomcat deletes all Solr indexes

2009-05-12 Thread Andrey Klochkov
Hi, I know that when starting Solr checks index directory existence, and creates new fresh index if it doesn't exist. Does it help? If no, the next step I'd do in your case is patching SolrCore.initIndex method - insert some logging, or run EmbeddedSolrServer with debugger etc. On Mon, May 11,

Re: Creating new QParserPlugin

2009-05-07 Thread Andrey Klochkov
Hi! I agree that Solr is difficult to extend in many cases. We just patch Solr, and I guess many other users patch it too. What I propose is to create some Solr-community site (Solr incubator?) to public patches there, and Solr core team could then look there and choose patches to apply to the

Re: Last modified time for cores, taking into account uncommitted changes

2009-05-04 Thread Andrey Klochkov
Hi, I use SolrIndexReader.isCurrent() for this purpose On Fri, May 1, 2009 at 1:42 AM, James Brady james.colin.br...@gmail.comwrote: Hi, The lastModified field the Solr status seems to only be updated when a commit/optimize operation takes place. Is there any way to determine when a core

function query scoring

2009-04-29 Thread Andrey Klochkov
Hi! Base on docs in the wiki I thought that the following query should return constant score 5 for all socks in the index: http://localhost:8080/solr/select?q=name:socks _val_:5fl=name,score But in fact it finds all the products in the index and it seems that socks products have higher score

Re: function query scoring

2009-04-29 Thread Andrey Klochkov
On Wed, Apr 29, 2009 at 6:44 PM, Umar Shah u...@wisdomtap.com wrote: On Wed, Apr 29, 2009 at 7:16 PM, Andrey Klochkov akloch...@griddynamics.com wrote: Hi! Base on docs in the wiki I thought that the following query should return constant score 5 for all socks in the index: http

Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Mon, Apr 27, 2009 at 10:27 PM, Jon Bodner jbod...@blackboard.com wrote: Trying to point multiple Solrs on multiple boxes at a single shared directory is almost certainly doomed to failure; the read-only Solrs won't know when the read/write Solr instance has updated the index. I'm

Re: Solr Performance bottleneck

2009-04-28 Thread Andrey Klochkov
On Tue, Apr 28, 2009 at 3:18 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, You should probably just look at the index version number to figure out if the name changed. If you are looking at segments.gen, you are looking at a file that may not exist in Lucene in the future.

different scoring for different types of found documents

2009-04-09 Thread Andrey Klochkov
Hi, We have a quite complex requirement concerning scoring logic customization, but but I guess it's quite useful and probably something like it was done already. So we're searching through the product catalog. Product have types (i.e. Electronics, Apparel, Furniture etc). What we need is to

external fields storage

2009-03-24 Thread Andrey Klochkov
Hi Solr users Our index could be much smaller if we could store some of fields not in index directly but in some kind of external storage. All I've found until now is ExternalFileField class which shows that it's possible to implement such a storage, but I'm quite sure that the requirement is

Re: external fields storage

2009-03-24 Thread Andrey Klochkov
Our index could be much smaller if we could store some of fields not in index directly but in some kind of external storage. All I've found until now is ExternalFileField class which shows that it's possible to implement such a storage, but I'm quite sure that the requirement is common and

Re: external fields storage

2009-03-24 Thread Andrey Klochkov
On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller markrmil...@gmail.com wrote: Thats a tall order. It almost sounds as if you want to be able to not use the index to store fields, but have them still fully functional as if indexed. That would be quite the magic trick. Look here, people wanted

Re: alternative lucene directories support

2009-03-20 Thread Andrey Klochkov
://return new SolrIndexSearcher(this, schema, main, IndexReader.open(FSDirectory.getDirectory(getIndexDir()), readOnly), true, false); Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrey Klochkov akloch...@griddynamics.com

alternative lucene directories support

2009-03-19 Thread Andrey Klochkov
Hi all We want to use Solr with lucene Directory implementation which places index into Coherence data grid. I fact I managed to run Solr in such configuration although I had to patch it. I think that the issue about alternate directories support (SOLR-465) should be re-opened because there are