RE: How do i prevent the HTML tags being added to Lucene Index..

2004-05-19 Thread Karthik N S
Hey Look at the file Test.java under lucene1.4 ,it strips out html tagsand gives u content... with regards Karthik -Original Message- From: root [mailto:root]On Behalf Of Mahesh Sent: Thursday, May 20, 2004 11:13 AM To: [EMAIL PROTECTED] Subject: How do i prevent the HTML tags being add

How do i prevent the HTML tags being added to Lucene Index..

2004-05-19 Thread Mahesh
I am using the lucene 1.4 to index the information. I have lot of HTML tags in the information that i will be indexing ,so let me know if their is any way of removing the HTML tags from being indexed.. MAHESH - To unsubscribe

Re: Lucene and MVC (was Re: Bad file descriptor (IOException) using SearchBean contribution)

2004-05-19 Thread petite_abeille
On May 20, 2004, at 04:38, Erik Hatcher wrote: OffTopic: havoc and Struts go well together ;) Pick up Tapestry instead! Nah. Keep it really Simple [1] instead :o) http://simpleweb.sourceforge.net/ PA. - To unsubscribe, e-mail: [E

Lucene and MVC (was Re: Bad file descriptor (IOException) using SearchBean contribution)

2004-05-19 Thread Erik Hatcher
On May 19, 2004, at 8:04 AM, Timothy Stone wrote: Could you elaborate on what you mean by MVC here? A value list handler piece has been developed and links posted to it on this list - if this is the type of thing you're referring to. Again, maybe I was naively associating the "SearchBean" with s

Re: Internal full content store within Lucene

2004-05-19 Thread Kevin Burton
Morus Walter wrote: Kevin Burton writes: How much interest is there for this? I have to do this for work and will certainly take the extra effort into making this a standard Lucene feature. Sounds interesting. How would you handle deletions? They aren't a requirement in our scenario

Re: Possible to fetch a document without all fields for performance?

2004-05-19 Thread Kevin Burton
Morus Walter wrote: I don't understand that. You get the document object which does not contain the documents field contents. It just provides access to this data. It's up to you which fields you access. And remember that you don't have to store fields at all, if you don't need to retrieve them (e

RE: org.apache.lucene.search.highlight.Highlighter

2004-05-19 Thread Bruce Ritchie
> Thanks for "highlighting" the problem with the Javadocs... Groan. :) Regards, Bruce Ritchie smime.p7s Description: S/MIME cryptographic signature

Re: org.apache.lucene.search.highlight.Highlighter

2004-05-19 Thread markharw00d
>>Was Investigating,found some Compile time error.. I see the code you have is taken from the example in the javadocs. Unfortunately that example wasn't complete because the class didnt include the method defined in the Formatter interface. I have updated the Javadocs to correct this oversight.

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread wallen
Here is an example method in org.apache.lucene.demo.html HTMLParser that uses a different buffered reader for a different encoding. public Reader getReader() throws IOException { if (pipeIn == null) { pipeInStream = new MyPip

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread Martin Remy
The tokenizers deal with unicode characters (CharStream, char), so the problem is not there. This problem must be solved at the point where the bytes from your source files are turned into CharSequences/Strings, i.e. by connecting an InputStreamReader to your FileReader (or whatever you're using)

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
Hi, I had a quick look at the sandbox but my problem is that I don't need a spanish stemmer. However there must be a replacement tokenizer that supports foreign characters to go along with the foreign language snowball stemmers. Does anyone know where I could find one? In answer to Peters quest

AW: Problem indexing Spanish Characters

2004-05-19 Thread PEP AD Server Administrator
Hi Hannah, Otis I cannot help but I have excatly the same problems with special german charcters. I used snowball analyser but this does not help because the problem (tokenizing) appears before the analyser comes into action. I just posted the question "Problem tokenizing UTF-8 with geman umlauts"

Re: Problem indexing Spanish Characters

2004-05-19 Thread Otis Gospodnetic
It looks like Snowball project supports Spanish: http://www.google.com/search?q=snowball spanish If it does, take a look at Lucene Sandbox. There is a project that allows you to use Snowball analyzers with Lucene. Otis --- Hannah c <[EMAIL PROTECTED]> wrote: > > Hi, > > I am indexing a numb

Re: Possible to fetch a document without all fields for performance?

2004-05-19 Thread Otis Gospodnetic
Hi Kevin, There is no API for this, and I agree it would be handy. Otis --- Kevin Burton <[EMAIL PROTECTED]> wrote: > Say I have a query result for the term Linux... now I just want the > TITLE of these documents not the BODY. > > To further this scenario imagine the TITLE is 500 bytes but the

Problem tokenizing UTF-8 with geman umlauts

2004-05-19 Thread PEP AD Server Administrator
Hello, I have HTML-documents which are UTF-8 encoded and contain english and/or german content. I have written my own Analyser and Filter to replace the german umlauts with the commonly used pair of character (ü=ue, ä=ae, ö=oe) to avoid any problems. Still in the HTML-code the german umlauts are sh

Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
Hi, I am indexing a number of English articles on Spanish resorts. As such there are a number of spanish characters throught the text, most of these are in the place names which are the type of words I would like to use as queries. My problem is with the StandardTokenizer class which cuts the w

Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses

2004-05-19 Thread Claude Devarenne
Thanks, I will look at the sorting code. Sorting results by date is next on list. For now, I only have a small number of documents but the set is to grow to over 8 million documents for the collection I am working on. Another collection we have is 40 million documents or so. From what you

Re: Bad file descriptor (IOException) using SearchBean contribution

2004-05-19 Thread Timothy Stone
Erik Hatcher wrote: On May 18, 2004, at 1:43 PM, Timothy Stone wrote: Erik Hatcher wrote: Lucene 1.4 (now in release candidate stage) includes built-in sorting capabilities, so I definitely recommend you have a look at that. SearchBean is effectively deprecated based on this new much more po

RE: about search and update one index simultaneously

2004-05-19 Thread David Townsend
There is no problem with updating and searching simultaneously. Two threads updating simultaneously on the same index on NFS can be a problem, as the locking does not work reliably. Have a look through the archives for NFS, there are some solutions scattered about. David -Original Messag

RE: SELECTIVE Indexing

2004-05-19 Thread Karthik N S
Hey Lucene Users My original intension for indexing was to index certain portions of HTML [ not the whole Document ], if Jtidy is not supporting this then what are my optionals Karthik -Original Message- From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 19, 200

RE: SELECTIVE Indexing

2004-05-19 Thread Viparthi, Kiran (AFIS)
I doubt if it can be used as a plug in. Would be good to know if it can be used as a plug in. Regards, Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 12:30 To: Lucene Users List Subject: RE: SELECTIVE Indexing Hi Can I Use TIDY [as plug in ] wi

org.apache.lucene.search.highlight.Highlighter

2004-05-19 Thread Karthik N S
Hey Guys Found some Highlighter Package on CVS Directory Was Investigating,found some Compile time error..   Please some body tell me what this       The Code:-   private IndexReader reader=null; private Highlighter highlighter = null;  public SearchFiles() { }  public void searchIndex0(S