Terry,
These are really not Lucene questions. Lucene will let you index text,
but you need to figure out how to parse your XHTML files.
Take a look at Jtidy on sf.net, I think Jtidy can help you with parsing
XHTML, or perhaps Xerces from xml.apache.org can.
Otis
--- Terry McGregor <[EMAIL PROT
Hi,
I will be working on a project with Lucene. I was wondering if anyone knows of a
consultant with Lucene experience that might be interested in some work. If so,
please give me a ring.
Sincerely,
Eric Thoman
Director
The Manchester
Java Users' Group
http://www.manjug.org
603.627.8419
--
Hi,
I'm new to Lucene, and I was wondering how I should parse XHTML files.
Should I name them with the .HTML file extention and use
org.apache.lucene.demo.IndexHTML or name them with the .XML file extention
and use an XML parser?
Also, I would like to keep my XHTML files with a .XHTML file e
Ahh. As it stands Lucene is more of a *search api* then drop in indexing
application like htDig.
Myself and some other developers will be working on adding application
extensions to Lucene (still in the planning stage):
http://jakarta.apache.org/lucene/docs/luceneplan.html
At current there i
Grim,
>I am looking at using lucene to index a large set of documents. In
>order to be able to search a subset of documents, I've added a
>"path"-field to each document (indexed, not stored, not tokenized).
>Using a prefix-query seems to work fine.
>
>My problem: Our documents can have several d
Hello Andy,
I have actually stepped through the getting started docs on apache (with success!). I
am fairly new to Java and find it difficult configuring Lucene to securely search my
company's intranet (including .htm, .jsp, .pdf, .doc, ...). I'm just taking a shot in
the dark here hoping fo
http://jakarta.apache.org/lucene/docs/gettingstarted.html
If this isn't good enough, please let me know what I can do to make it
better. Documenting Lucene is something I have an big interest in.
-Andy
>On Tue, 05 Mar 2002 08:43:59 -0600 "Ryan Ogaard" <[EMAIL PROTECTED]> wrote.
>Hello All,
>
Hello,
I think you should just try your two suggestions and see.
The answer depends on how exactly you do it, OS configuration, etc.
Does this happen on an optimized index, too?
Otis
--- Tihon One <[EMAIL PROTECTED]> wrote:
> Hi all;
>
> I've tried to index a 100K text file on a empty Index fo
Hello All,
I am in the process of testing Lucene for our intranet, and having a difficult time
finding good documentation. Any recommendations on good Web sites with tips, how-tos,
code examples, etc. for Lucene?
Thank you for your time and consideration...
Ryan
--
To unsubscribe, e-mail:
You have to do it yourself, at at least find code that does this. The
Lucene sample code has an HTML parser, and I've posted (to lucene-dev) an
alternative way of using JTidy to do this.
Erik
- Original Message -
From: "Melissa Mifsud" <[EMAIL PROTECTED]>
To: "Lucene User" <[EMAIL P
Hi,
Is it necessary to strip the HTML tags from HTML documents BEFORE telling Lucene to
index them? Does Lucene do this or will it index the tags too?!
Melissa
Hi!
Can anyone tell me what kind of indexer Lucene is? Statistical, Probabilistic,
Boolean, Extended Boolean?
I can't seem to find the answer in any documentation or article and it's really
important that I know the type before I use Lucene in for application!
Thanks!
Melissa
I am looking at using lucene to index a large set of documents. In
order to be able to search a subset of documents, I've added a
"path"-field to each document (indexed, not stored, not tokenized).
Using a prefix-query seems to work fine.
My problem: Our documents can have several different pat
Hi all;
I've tried to index a 100K text file on a empty Index folder (0 MB of
indexed file) and it took 0.77 second. However, when my index folder get
larger (~20MB of Indexed files) the same 100K text file would take up to 30
seconds.
Im using EJB to do the index processing and my SessionB
Hi all
In my application if I serach on alphbet "e" only then I get right result but if I
search on "t" then it is not showing any result. Now there will be many common words
like "to" or "the" which should be shown in the searching. and more intrestingly if i
search with wildcard like "t*" th
15 matches
Mail list logo