Re: regaridng Reader.terms()

2007-05-23 Thread Mohammad Norouzi
Hi Walter, let me explain my problem in detail I have a web page let user to create his own query simple for example a user want to locate a service with specific value. so he/she doesnt know exactly the name of the service so I have to provide a list of services available (say in a combo box)

Re: search result problem

2007-05-23 Thread Stefan Colella
Hello, I used the setMaxFieldLength() and it works now thx all. Doron Cohen wrote: Stefan Colella wrote: I tried to only add the content of the page where that expression can be found (instead of the whole document) and then the search works. Do i have to split my pdf text into more

CAD files, Images

2007-05-23 Thread jim shirreffs
Is it possibe to index CAD formats such as AutoCad or CGM? I know some commecail products (excalaber) claim to be able to do that? If so what about TIFF? thanks jim s - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: regaridng Reader.terms()

2007-05-23 Thread Erick Erickson
You may have to index things twice, once for searching and once UN_TOKENIZED for display. Say you have a bunch of service names you want to display service one service two service three If you use WhitespaceAnalyzer, TOKENIZED you index the tokens service (note, there are three of these) one

Re: CAD files, Images

2007-05-23 Thread Erick Erickson
No, you can only index text. It's the same thing as indexing HTML documents or XML documents. You index important stuff from the guts of the doc. So, if you can get some text out of these docs, you can index *that*, possibly along with page information that allows you to, say, display the

Re: regaridng Reader.terms()

2007-05-23 Thread Mohammad Norouzi
Wow, very nice comments Thank you so much Erick. You really showed me the way -- Regards, Mohammad -- see my blog: http://brainable.blogspot.com/

Re: MoreLikeThis?

2007-05-23 Thread Donna L Gresh
Thank you-- Donna L. Gresh Services Research, Mathematical Sciences Department IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donnagresh [EMAIL PROTECTED] Otis Gospodnetic [EMAIL PROTECTED] 05/22/2007 05:33 PM Please respond to java-user@lucene.apache.org

Re: CAD files, Images

2007-05-23 Thread jim shirreffs
thank you for the reply, I knew the answer but was compelled to ask anyway. CAD files like AutoCad/ProE/CaTia do contain some useful text and it is possible to get at that and index it. But mostly it's vectors and there is not much a text engine can do with a vectors. thanks again. jim s

How to avoid score calculation completely?

2007-05-23 Thread Zhang, Lisheng
Hi, We have been using lucene for years and it serves us well. Sometimes when we issue a query, we only what to know how many hits it leads, not want any docs back. Is it possible to completely avoid score calculation to get total count back? I understand score calculation needs a loop for all

Re: How to avoid score calculation completely?

2007-05-23 Thread Yonik Seeley
On 5/23/07, Zhang, Lisheng [EMAIL PROTECTED] wrote: We have been using lucene for years and it serves us well. Sometimes when we issue a query, we only what to know how many hits it leads, not want any docs back. Is it possible to completely avoid score calculation to get total count back? I

WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-23 Thread Steven Rowe
Hi Mohammad, WhitespaceAnalyzer uses Java's Character.isWhitespace(char) method to determine whether or not a character should be part of a token. As far as I know, this method is problematic only for characters outside of the Basic Multilingual Plane (BMP). I think Lucene should switch to

How to filter fields with hits from result set

2007-05-23 Thread Andreas Guther
Hi, If a search returns a document that has multiple fields with the same name, is there a way to filter only those fields that contain hits? Background: I am indexing documents and we store all content in our index for display reasons. We want to show only those pages containing hits. My

Re: How to filter fields with hits from result set

2007-05-23 Thread Erick Erickson
As luck would have it, I've done something very similar. What I had to do is index a special token at the end of each page. Then I could get the term offsets for each page Then I used one of the SpanQuery.getSpans to get all of the offsets of the hits throughout all of the pages. now I have

Who has sample code of remote multiple servers multiple indexes searching?

2007-05-23 Thread Su.Cheng
Hi, I studied 5.6 Searching across multiple Lucene indexes 178 in Lucene in action. I have 2 remote serarch computers(SearchServer) work as index servers and search requests from a search client(SearchClient,the 3rd computer). An error message, Exception in thread main

RE: How to filter fields with hits from result set

2007-05-23 Thread Andreas Guther
Eric, Thank you very much for your response. That sounds very interesting. Let me do some experimenting to see if I fully understood your solution. Otherwise I have to come back to you with more questions. Andreas -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED]

Re: How to filter fields with hits from result set

2007-05-23 Thread Erick Erickson
Two things to watch... 1 Think about indexing the special page-end token with an increment gap of 0 (see SynonymAnalyzer in Lucene In Action). That preserves the sense of phrases across page breaks. 2 Assembling the span query is tricky. Search the mail archive for SpanQuery to see an exchange

Highlighting fast and highlighting all text

2007-05-23 Thread Michael Mitiaguin
I browsed this list and contributions and have a difficulty to determine whether there is anything which may be used straightforwardly to highlight all hits ( no fragmenting ) for large chunk of text. Probably my query should be sent as 3 separate ones : 1. The fastest possible fragment

HitCollector or Hits

2007-05-23 Thread Carlos Pita
Hi folks, I need to collect some global information from my first 1000 search results in order to build up some search refining components containing only relevant values (those which correspond to at least one of the first 1000 hits). For example, the results are products and there is a store

WITH_POSITIONS_OFFSETS versus WITH_OFFSETS

2007-05-23 Thread Michael Mitiaguin
What practical of using WITH_POSITIONS_OFFSETS ? Aren't WITH_OFFSETS sufficient and if iterate getStartOffset effectively gives the value from array element of getTermPositions ? - To unsubscribe, e-mail: [EMAIL PROTECTED]

Integrate Lucene search facilities with existing databases

2007-05-23 Thread Huajing Li
Hi all, I am working on an application that must deal with ranking on highly dynamic metadata. For example, suppose I want to provide ranking based on the number of downloads of hit documents. A user may log-in to the system and send a query, which will be answered by Lucene in a traditional

Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-23 Thread Mohammad Norouzi
Hi Steven Thank you so much for your thorough comments about Analyzer I write that class a couple of months ago, now I take a look at my customized Analyzer the only change I've made as follows: the original class has this method: protected boolean isTokenChar(char c) { return

Re: WhitespaceAnalyzer [was: Re: regaridng Reader.terms()]

2007-05-23 Thread Mohammad Norouzi
Sorry Steven that change is in WhitespaceTokenizer not WhiteSpaceAnalyzer but in Analyzer I had to call the tokenizer On 5/24/07, Mohammad Norouzi [EMAIL PROTECTED] wrote: Hi Steven Thank you so much for your thorough comments about Analyzer I write that class a couple of months ago, now I