TermDocs

2010-05-12 Thread roy-lucene-user
Hi guys, I've had this code for some time but am just now questioning if it works. I have a custom filter that i've been using since Lucene 1.4 to Lucene 2.2.0 and it essentially builds up a BitSet like so: for ( int x = 0; x < fields.length; x++ ) { for ( int y = 0; y < values.length; y++

Re: Spell check of a large text

2008-12-12 Thread Lucene User no 1981
r sections of the file that have > these payloads. Definitely pushing my area of expertise, but maybe > one of the Highlighter experts can chime in. > > HTH, > Grant > > On Dec 11, 2008, at 6:18 AM, Lucene User no 1981 wrote: > >> >> Hi, >> >&

Spell check of a large text

2008-12-11 Thread Lucene User no 1981
Hi, the problem is as follows: there is a text, ca. 30kb, it has to be spellchecked automatically, there is no manual intervention, no suggestions needed. All I would like to achieve is a simple check if there are any problems with the spelling or not. It has to be rather fast cause there are ton

Re: getting started

2008-08-01 Thread roy-lucene-user
That certainly works if the intent is to grab the entire file. If all you want is that particular line to be returned in the search then that's not going to work. Let's say the files was made up of a million lines and the text was stored in the index (I know, absurd). When grabbing the Document

Re: getting started

2008-08-01 Thread roy-lucene-user
Hello Brittany, I think the easiest thing for you to do is make each line a Document. You might want a FileName and LineNumber field on top of a "Text" field, this way if you need to gather all the lines of your File back together again you can do a search on the FileName. So in your case: Docu

Lucene search time in real production use?

2008-05-31 Thread lucene user
Hi, Folks: What are some average search and retrieval times for Lucene queries in real production use? Would people include relevant stuff like the number of documents in your index, etc.? Thanks for your help!

Handeling when a field does not exist in the document

2008-05-22 Thread lucene user
We have a requirement to inform users on a regular basis of new material on which they have expressed interest. How are we to know what is "new" from the point of view of a particular user? Our idea is to tag each new item in some way (perhaps a date/time stamp in the lucene index indicating when t

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
having to reindex the article. > However, the trade-off is that by having the article and annotation in > separate documents, you'll lose the relevance boost you would otherwise > get when the search terms appear both in the annotation and in the > article. > > Pete

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
? > > Do you want to do things like phrase-search e.g. > "PERSON_ANNOTATION works for Google" > > Or is your idea of an annotation more simply a del.ici.ous-style tag > associated with the whole document? > > Cheers > Mark > > > - Original Messag

Re: Searching user-private annotations associated with indexed documents

2007-11-27 Thread lucene user
I'd be VERY grateful for your help, folks! Thanks! I really need some insight on this. THANKS!! On Nov 26, 2007 6:43 PM, lucene user <[EMAIL PROTECTED]> wrote: > Here are the three options that seem practical to us right now. > > (1) Do The annotation search in postgr

Re: Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
object add the document object to the lucene database index. Got a better idea on this? Thanks!! On Nov 26, 2007 5:33 PM, lucene user <[EMAIL PROTECTED]> wrote: > Folks > > I have some additional textual data that is user specific, basically > annotations about document

Searching user-private annotations associated with indexed documents

2007-11-26 Thread lucene user
Folks I have some additional textual data that is user specific, basically annotations about documents. I would like to be able to do **combined** searches, looking for some words in the document and some in my users' private annotations about that document. Any suggestions about how I should hand

Re: Optimizing index takes too long

2007-11-12 Thread Lucene User
what type of documents are indexing regards gaurav On 11/11/07, Barry Forrest <[EMAIL PROTECTED]> wrote: > > Hi, > > Optimizing my index of 1.5 million documents takes days and days. > > I have a collection of 10 million documents that I am trying to index > with Lucene. I've divided the colle

Comparing Two Indexes

2007-11-09 Thread Lucene User
Hi, I wanted two compare two indexes.Please recommend an algorithm which takes all the factors into accoubt such as versions of software being used by lucene and application which has an effect on the index being created.We can also compare with certain fields and the text. Regards --

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-25 Thread lucene user
been super helpful! Very grateful! Thanks! On 10/24/07, markharw00d <[EMAIL PROTECTED]> wrote: > > lucene user wrote: > > Thanks for all your help! > > > > We are using Lucene 2.1.0 and TermsFilter seems to be new in Lucene > 2.2.0. > > I have not been a

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-24 Thread lucene user
for the original suggestion: I have used thousands of primary-key terms > in a terms filter before now (i.e. terms with only one doc) and was > surprised at the speed. I can't recall exact stats so try it yourself. > > > > > > - Original Message > From: luce

Re: Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-24 Thread lucene user
k the above code has the makings > of an optimisation to CachingWrapperFilter - it could choose to cache > SortedVIntLists or BitSets depending on the sparseness of the set and > transparently handles any required conversions. > > > > - Original Message > From: lucene u

Lucene Queries Over User-Editable Dynamic Categories of Documents

2007-10-23 Thread lucene user
Folks! We are building a web-based multi-user system. Users of our system are able to categorize items that they have found into groups of related documents. We would like users to be able to search these document groups and rapidly find matches. Each user might have ten of these categories and mi

Re: Amount of RAM needed to support a growing lucene index?

2007-08-13 Thread lucene user
<[EMAIL PROTECTED]> wrote: > > > 12 aug 2007 kl. 14.01 skrev lucene user: > > > Do you know if 290k articles and 234 million words is a large > > lucene index > > or a medium one? Do people build them this big all the time? > > If the calculator in my hea

Re: Amount of RAM needed to support a growing lucene index?

2007-08-12 Thread lucene user
Thanks, Karl. Do you know if 290k articles and 234 million words is a large lucene index or a medium one? Do people build them this big all the time? Thanks! On 8/12/07, karl wettin <[EMAIL PROTECTED]> wrote: > > > 12 aug 2007 kl. 09.03 skrev lucene user: > > > If I

Amount of RAM needed to support a growing lucene index?

2007-08-12 Thread lucene user
Hi, Folks - Two quick questions - need to size a server to run our new index. If I have an index with 111k articles and 90 million words indexed, how much RAM should I have to get really fast access speeds? If I have an index with 290k articles and 234 million words indexed, how much RAM should

Re: next() not called in FilterIndexReader.FilterTermDocs

2007-06-10 Thread lucene user
ed for. Especially, if by "certain set", you mean that these are pre-defined values, you could create your filters at start-up time and use CachingWrapperFilter to keep them around. Either way, I suspect it would be much simpler. Best Erick On 6/10/07, lucene user <[EMAIL PROTECTED]&

next() not called in FilterIndexReader.FilterTermDocs

2007-06-10 Thread lucene user
I am trying to use a Filter Index Reader to provide access only to the subset of my archive for which a certain field contains one of a given list of values. The idea is to create a special term for this field that means 'any term in the list', and add an 'AND' clause to every query, to enforce t

Whats the best way to filter based on a function of an indexed term or field value

2007-06-05 Thread lucene user
We have a large and growing number of articles (< 60k but growing) and we want to divide articles from some sources into groups so that we can do queries against just members of one or two groups and not find articles from publications that are outside these publication groups. We would like to b

Re: Searching Special Characters

2005-11-16 Thread Lucene User
As we have a very large index, I'm interested in knowing what others do, before I commit to doing the below. If I do go down that route, I assume I use a StandardAnalyzer once again? In a Test, I did the following... public class TestLuceneIndexCreateAndIndex extends TestCase { public void i

Searching Special Characters

2005-11-15 Thread Lucene User
Hi Our index contains articles with special characters. For instance, the string P&O is indexed as P&O. The correct entity codes are indexed for all the special characters we use. My question is that a typical user searching for the above will enter P&O but that will not match P&O. I know I coul

Re: Lucene faster on JDK 1.5?

2005-07-08 Thread roy-lucene-user
This might be a good time to ask another question. Are there any advantages to lucene using the java.nio package? Roy On 7/8/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > Nothing significant, but I've been using 1.5 on > Simpy.com(lots of > Lucene behind it) for over

Re: new added documents not showing

2005-03-23 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do. Wasn't an issue really before. Otis, not sure what Luke is. But the documents appear after we optimize. Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: new added documents not showing

2005-03-22 Thread roy-lucene-user
Pasha, in short, that is all I'm trying to do. Wasn't an issue really before. Otis, not sure what Luke is. But the documents appear after we optimize. Roy. On Mon, 21 Mar 2005 18:20:32 -0800 (PST), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > * Replies will be sent through Spamex to java-use

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
correct, we also can't see the new documents when we open an IndexReader to the main index. Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
> When do you open the index writer? Where is the code? Ah, sorry. That last section is in a method that gets called in a loop. IndexWriter writer = null; try { writer = new IndexWriter( mainindex, new StandardAnalyzer(), false ); for ( int i = 0; i < dir

Re: new added documents not showing

2005-03-21 Thread roy-lucene-user
On Sat, 19 Mar 2005 22:43:44 +0300, Pasha Bizhan <[EMAIL PROTECTED]> wrote: > Could you provide the code snippets for your process? > Sure (thanx for helping, btw) I just realized that the way I described our process was off a little bit. Here's the process again: 1. grab all index Directorys

Re: new added documents not showing

2005-03-18 Thread roy-lucene-user
Hi guys, Just trying to understand where problems can occur. Maybe I need to describe our indexing process some more. We create new indexes as "index parts" with Documents that are supposed to contain unique ID key fields. These "index parts" get merged into two separate indexes: a main index

Re: new added documents not showing

2005-03-18 Thread roy-lucene-user
> > However, after optimizing, suddenly those new documents > > appear. Its almost as if the new segments are not being read > > by the IndexReader. > > You need to close IndexWriter before open IndexReader. Or reopen > IndexReader. > > See TestIndexReader.java:: private void deleteReaderWriterC

new added documents not showing

2005-03-17 Thread roy-lucene-user
Hi guys, We were noticing some odd behavior recently with our indexes. We have a process that adds new documents into our indexes. When we iterate through all the documents through an IndexReader, we're not seeing the new documents and we're not seeing the new documents when we run a search.