Adding synonym-index to an other index

2006-07-11 Thread Ramesh Salla
Hi, can we ever add the WordNet Synonym-Index to an other Index.? I think this is a bit painful process. For now, I retrieve the Synonyms of the words from the Search-Query and hence reform the Search-Query. Will the AddIndexes(indexes) do this for us? Does the Merged Index give meaningful resul

Storing Part of Speech information in Lucene Indices

2006-07-11 Thread Amit Kumar
Hi, A new project that I am investigating lucene for needs the Parts of speech information for the tokens. I can get that information using NLP techniques (GATE etc.), by pre processing the documents but I would like to store that information in the Indices. Something along the lines of

Re: combined filesystem and web search

2006-07-11 Thread Tomi NA
On 7/12/06, Steven Rowe <[EMAIL PROTECTED]> wrote: Tomi NA wrote: > I wish people would start selling .pdf books online... :( Your wish is granted: Wow, that was fast! Thanks for the link. >> Then there's IndexMergeTool which I haven't used, but looks

Re: combined filesystem and web search

2006-07-11 Thread Steven Rowe
Tomi NA wrote: I wish people would start selling .pdf books online... :( Your wish is granted: Then there's IndexMergeTool which I haven't used, but looks interesting. I haven't ran into it. Can you direct me to a document or two? It's in contrib und

Re: combined filesystem and web search

2006-07-11 Thread Tomi NA
On 7/11/06, Erick Erickson <[EMAIL PROTECTED]> wrote: I can answer a few of these. If you haven't yet, you'd do yourself a favor to pick up the book "Lucene in Action". It's written to the 1.4 code-base, the examples compile but give deprecated warnings for the 1.9 code base, and need a few more

Re: modify existing non-indexed field

2006-07-11 Thread Doron Cohen
> I've tried changing to one indexing thread > (instead of 5) but still get the same problem. can't figure out why this > happens. The program as listed seems to accesss an existing index - since 'create' is always for both 'FSDirectory.getDirectoy(,)' and 'new IndexWriter(,,)'. Perhaps an old lo

Re: Searching for a phrase which spans on 2 pages

2006-07-11 Thread Erick Erickson
I can think of several approaches, but the experts will no doubt show me up .. 1> index the entire book as a single document. Also, index the beginning and ending offset of each page in separate "documents". Assuming you can find the offset in the big doc of each matching phrase, you can also fin

RangeQuery question?

2006-07-11 Thread Van Nguyen
Is there a RangeQuery equivalent that can query date range on two different fields? Term startTerm = new Term("startDate", "20060710"); Term endTerm = new Term("endDate", "20060711"); RangeQuery q = new RangeQuery(startTerm, endTerm, true);

Re: Missing fields used for a sort

2006-07-11 Thread Erick Erickson
On 7/11/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote: I can't thank you enough, Yonik :-) send money .

SortComparatorSources and ScoreDocComparators

2006-07-11 Thread James Pine
Hey Everyone, I've had success in the past creating my own SortComparatorSources and ScoreDocComparators (basing my code on sec 6.1 from LIA); however, I'm starting to run into some performance issues with large indexes. When I started to probe deeper it seems that enumerating through the TermDocs

Re: Missing fields used for a sort

2006-07-11 Thread Yonik Seeley
Oh, and here is how Solr uses it to construct the correct lucene Sort objects: http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/search/Sorting.java?view=markup -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server --

RE: Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
I can't thank you enough, Yonik :-) -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 11 July 2006 18:05 To: java-user@lucene.apache.org Subject: Re: Missing fields used for a sort On 7/11/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote: > Thanks for the info both of

Re: Missing fields used for a sort

2006-07-11 Thread Yonik Seeley
On 7/11/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote: Thanks for the info both of you. Of course Lucene obeys Murphy's law that the missing ones appear first when you reverse sort, which is what Murphy's law says you want to do. Does solr have a custom build of Lucene in it, or is the functi

RE: Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
Thanks for the info both of you. Of course Lucene obeys Murphy's law that the missing ones appear first when you reverse sort, which is what Murphy's law says you want to do. Does solr have a custom build of Lucene in it, or is the functionality required to required to get the missing ones to the

Searching for a phrase which spans on 2 pages

2006-07-11 Thread Mile Rosu
Hello, I am working on an application similar to google books which allows searching on documents which represent a scanned page. Of course, one might search for a phrase starting at the end of one page and ending at the beginning of the next one. In this case I do not know how I might treat

RE: Query?

2006-07-11 Thread WATHELET Thomas
Ok now I have UN_TOKENIZED this field and now in LUKE I see the entire term(SEC(2006) 0123) instead before I only see SEC. And the wonderfull thing now that it's working. Thank's a lot to Erik and Erick. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 11 July 2006

Re: Query?

2006-07-11 Thread Erick Erickson
What tokenizer did you use to index the document number? Just about all tokenizers split on spaces, so you'd have indexed this as at least two separate terms because of the space before the 0350. I'd really recommend downloading a copy of Luke so you can examine your index and see exactly what got

Re: Missing fields used for a sort

2006-07-11 Thread Yonik Seeley
On 7/11/06, Erick Erickson <[EMAIL PROTECTED]> wrote: So I guess all the documents without a particular field all get defaulted for you. Which end of the list they get placed at I guess you'll find out ... For lucene, it depends on what direction you are sorting. Solr gives control over this i

Re: Query using parenthesis

2006-07-11 Thread Erik Hatcher
On Jul 11, 2006, at 9:28 AM, WATHELET Thomas wrote: I have an index with this field: stored/uncompressed,indexed,tokenized. I'm using LukeAll to query myIndex and when I try to search in the docnumber field with this query COM\(2005\) 0123 in the query detail panel I retrive this: docnumber:sec

RE: Lucene WordExtractor

2006-07-11 Thread mcarcelen
Thanks suba Sorry -Mensaje original- De: Suba Suresh [mailto:[EMAIL PROTECTED] Enviado el: martes, 11 de julio de 2006 15:51 Para: java-user@lucene.apache.org Asunto: Re: Lucene WordExtractor There is a separate user mailing list for poi. Use it. There are three jar files. Check the scr

Re: Missing fields used for a sort

2006-07-11 Thread Erick Erickson
Quote from Chris... "you can only sort on fields with 0 or 1 terms per doc" from a post of his today even So I guess all the documents without a particular field all get defaulted for you. Which end of the list they get placed at I guess you'll find out ... Erick

Re: Some obvious questions that I'll be happy to put on the WIKI

2006-07-11 Thread Furash Gary
Thanks. It sounds like putting tokens in the same spot for names makes sense, so I end up with: GaryFurash [Soundex Gary] [Soundex Furash] is the way to go. I had seen a quote that mention positioning at the same spot but it didn't make any sense at t

Re: How do you use a different analyzer by field?

2006-07-11 Thread Furash Gary
In my defense I assumed it would be more obscurely named ;-) G - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: General Approach: Analyzer versus Query

2006-07-11 Thread Furash Gary
Disk is cheap, and I can always put the field elsewhere if I need to sort on it, but user response time ... Priceless. I'll give #1 a shot. I'm going to try encoding them in the same position, see what happens. Thanks! - To unsu

RE: Query?

2006-07-11 Thread WATHELET Thomas
I have an index with this field: stored/uncompressed,indexed,tokenized. I'm using RangeQuery to query the index:TermQuery termQuery = new TermQuery(new Term("docnumber", "SEC(2006) 0350")); combinedQueries.add(termQuery, MUST); The query that I send is: SEC(2006) 0350 The resulti

Re: combined filesystem and web search

2006-07-11 Thread Erick Erickson
I can answer a few of these. If you haven't yet, you'd do yourself a favor to pick up the book "Lucene in Action". It's written to the 1.4 code-base, the examples compile but give deprecated warnings for the 1.9 code base, and need a few more tweaks for the 2.0 code base. Also, download a copy of

Re: Lucene WordExtractor

2006-07-11 Thread Suba Suresh
There is a separate user mailing list for poi. Use it. There are three jar files. Check the scratchpad jar. You have to send in a FileInputStream(not the filename) as an argument to the WordExtractor class. suba suresh. mcarcelen wrote: Hi all! I´m working with poi-bin-3.0-alpha2-20060616 I

Re: Query?

2006-07-11 Thread Erick Erickson
Could you provide a bit more information? What's important or not about this query? And how does that import relate to what you've indexed? In other words, what do you *want* it to mean? Best Erick

RE: Query using parenthesis

2006-07-11 Thread WATHELET Thomas
I have an index with this field: stored/uncompressed,indexed,tokenized. I'm using LukeAll to query myIndex and when I try to search in the docnumber field with this query COM\(2005\) 0123 in the query detail panel I retrive this: docnumber:sec () Do you know LukeAll? -Original Message- Fro

Re: Query using parenthesis

2006-07-11 Thread Erik Hatcher
On Jul 11, 2006, at 8:57 AM, WATHELET Thomas wrote: How to parse this query COM(2005) 0123 in LukeAll? I have this result cocnumber: com Your question is not clear. But I'm always happy to lend a hand... Try the query: COM\(2005\) 0123 Parentheses are special characters with Luc

Query using parenthesis

2006-07-11 Thread WATHELET Thomas
How to parse this query COM(2005) 0123 in LukeAll? I have this result cocnumber: com

Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
If I want to sort on a field that doesn't exist in all documents in my index, can I have a default value for documents which lack that field (e.g. MAXINT or 0)? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,

combined filesystem and web search

2006-07-11 Thread Tomi NA
I plan to make lucene (and nutch) a key element in an intranet solution, but I only know about lucene what I've read in the last couple of days. Here's what I'd like opinions about. I would like to build a single point of access to data on intranet web pages and LAN shared documents. I've looked

Query?

2006-07-11 Thread WATHELET Thomas
How to parse this kind of query? COM(2006) 0001

Compressed fields

2006-07-11 Thread Rob Staveley (Tom)
What's a sensible guideline for length of an un-indexed field and whether to store it compressed or not? I have a 300 character document synopsis, which I store. Would there be any saving having it compressed? Can you have an index with a stored un-indexed field which is sometimes compressed and s

Re: modify existing non-indexed field

2006-07-11 Thread dan2000
Thanks for your advice Doron. I've tried changing to one indexing thread (instead of 5) but still get the same problem. can't figure out why this happens. -- View this message in context: http://www.nabble.com/modify-existing-non-indexed-field-tf1905726.html#a5266343 Sent from the Lucene - Java

Lucene WordExtractor

2006-07-11 Thread mcarcelen
Hi all! I´m working with poi-bin-3.0-alpha2-20060616 I´m trying to extract text from a Word document using the class org.apache.poi.hwpf.extractor.WordExtractor but I get the following bugs "Exception in thread main java.lang.NoSuchMethodError" I have also tried with the parameter -doc and the nam