Re: word frequency counting

2010-08-13 Thread Greg Gershman
Absolutely! Index your documents, then open an IndexReader and take a look at the terms() method. You can grab each term, and pass it to the IndexReader using the docFreq(Term t) method and get back the number of documents that term appears in. Greg From: S

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
ase, but you could also have it update a table in a db or something.) Then on search, you could just say (regular query) +profanity:false HTH, Grant On Mar 7, 2007, at 10:07 AM, Greg Gershman wrote: > I'm attempting to create a profanity filter. I thought to use a > QueryFilter

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
One point: if you use stemming, or some other modification of the terms before indexing, you'll need to make sure the terms you create to match against are also stemmed. Greg - Original Message From: Greg Gershman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent

Re: Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
n front of a NOT query clause. Greg Gershman wrote: > I'm attempting to create a profanity filter. I thought to use a QueryFilter > created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I > have run into is that, as a pure negative query is not supported (a

Negative Filtering (such as for profanity)

2007-03-07 Thread Greg Gershman
I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I b

Re: Help with mass delete from large index

2006-02-14 Thread Greg Gershman
at least the NPE in compareTo (I don't recall the > rest of the stack). > Have you tried debugging this? I suppose the Term > field or value is null somehow... not sure why. > > Otis > P.S. > Deleting files - don't :) > > - Original Message >

Re: Size + memory restrictions

2006-02-14 Thread Greg Gershman
You may consider incrementally adding documents to your index; I'm not sure why there would be problems adding to an existing index, but you can always add additional documents. You can optimize later to get everything back into a single segment. Querying is a different story; if you are using th

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
niel Naber <[EMAIL PROTECTED]> wrote: > On Montag 13 Februar 2006 19:42, Greg Gershman > wrote: > > > I'm still wondering if anyone has any thoughts on > the > > NullPointerException and/or the delete/optimize > > problems I'm having. They seem to be ve

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
Thanks, that is the way things will be done in the future. I'm still wondering if anyone has any thoughts on the NullPointerException and/or the delete/optimize problems I'm having. They seem to be very real issues. Greg --- "Michael D. Curtin" <[EMAIL PROTECTED]> wro

Re: Help with mass delete from large index

2006-02-13 Thread Greg Gershman
I'm open to other suggestions as to how to approach this. Also, I neglected to mention I'm using version 1.4.3. Greg --- "Michael D. Curtin" <[EMAIL PROTECTED]> wrote: > Greg Gershman wrote: > > > I'm trying to delete a large number of documents &g

Help with mass delete from large index

2006-02-13 Thread Greg Gershman
onfused, and the only other option I can think of is to reindex the documents I need, which would take much longer than deleting the ones I dont. Thanks! Greg Gershman __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam

Revisting FieldCacheImpl

2005-09-29 Thread Greg Gershman
Our search engine updates frequently, adding and removing documents from the index. After an index update, we create a new Searcher in the background, and execute a search against it to "prime" the sorting by fields. The new Searcher is swapped for the old. >From my understanding, this is a fair

Re: TermDocs.freq()

2005-09-29 Thread Greg Gershman
Save user queries in a database along with number of results from last time queried, use that as suggestion base. Notice that Google's result count in Suggest differs from the actual result count. They are not computing results on the fly. Greg --- Jérôme BENOIS <[EMAIL PROTECTED]> wrote: > He