We read from the database and parse the data into a valid XML then I
hand over the XML file to lucene which in turn digests it and indexes
the information
N.
-Original Message-
From: Jagdip Singh [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 08, 2003 10:39 AM
To: 'Lucene Users List'; [EM
Hi Nader,
As you talked about using Lucene for your http://www.bayt.com web site.
Do you convert CV's or any other documents to XML format before
submitting to Lucene for indexing?
Regards,
Jagdip
-Original Message-
From: Nader S. Henein [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 08
I will try coding this.
Thanks,
Jagdip
-Original Message-
From: Nader S. Henein [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 08, 2003 1:55 AM
To: 'Lucene Users List'
Subject: RE: converting text/doc to XML
XML is an organized, standardized format so let's say your document has
the follo
XML is an organized, standardized format so let's say your document has
the following characteristics
File name : foobar.doc
Firt line title : Foo Bar
File content :
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Blah blah blah blah
Then you have t
Hi,
How can I convert text/doc to XML?
Please help.
Regards,
Jagdip
What I am doing through the index process is basically dump the database via
a select all statement.
Once selected the record sets are looped through and the relevant fields as
well as the records key are indexed, so then when I need to retrieve the
data I do a select on teh relevant record based
Please check the Lucene's jGuru FAQ, your question is answered there.
Otis
--- Flavio Eduardo de Cordova <[EMAIL PROTECTED]> wrote:
> People...
>
> I've created a custom analyser that uses the StandardTokenizer class
> to get the tokens from the reader.
> It seemed to work fine but I
People...
I've created a custom analyser that uses the StandardTokenizer class
to get the tokens from the reader.
It seemed to work fine but I just noticed that some large documents
are not having all their content properly indexed, but just [the starting]
part of them.
Aft
Is this a JDBC issue? If so, see http://archives.postgresql.org/pgsql-jdbc/
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 07, 2003 3:22 PM
> To: Lucene Users List
> Subject: Re: Postgres and lucene
>
>
> I'm trying to index in a single
Wonderful! I can't wait to try this. I'll try to provide some
comparisons as I get to it, but I'd love to hear from anyone else that
tries this...
Thanks,
Scott
Francesco Bellomi wrote:
Hi,
I developed a Directory implementation that accesses an index stored on the
filesystem using memory-ma
People...
Do you know any non-native API to extract text from PPT/PPS files ?
I've been trying POI but it seems to me they just support .xls (well) and
.doc (a little) files by now...
Flavio Cordova
-
To unsubscribe, e-m
I'm trying to index in a single table with multiple fields. I understand
how lucene does that and I have some code that I think will work, except
that I don't think it's finding that database to index it in the first
place, and therein lies my problem.
Quoting Jeff Linwood <[EMAIL PROTECTED]>:
>
I think you might need to post a little bit more detail about what you are
trying to solve. Are you trying to index one field in one table in your
database, several fields, several tables?
As a general idea, you will need to create a Lucene Document object for each
record you put into the search
Thanks, I'll look into it. It looks like Dspace mgiht be compatable with
what I need. I'm also looking for how to implement this myself, since
the scope is fairly small, and Dspace might be too much for what I need.
Quoting Xuheng Xu <[EMAIL PROTECTED]>:
> DSpace is using Postgres and lucene.
>
DSpace is using Postgres and lucene.
http://www.dspace.org
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
There's also a WAR that's already built, that's available at
http://www.brownsite.net/docsearch.htm
It works with OpenOffice documents, Word doc, Excel, PDF, XML, RTF, TXT, etc.
It can work via a servlet interface or a standalone application.
Eric Anderson
LanRx Network Solutions
Quoting "Wi
Hi,
I'm new to lucene and I have had a lot of trouble finding
information
on how exactly to use lucene to search a postgres database. I've
searched the archives for this list, but found nothing specific enough
to help me. Has anyone used Lucene to search a postgres database who
could help?
The Lucene FAQ on Java Guru gives some hints on this:
http://www.jguru.com/faq/Lucene
-Original Message-
From: Maurice Coyle [mailto:[EMAIL PROTECTED]
Sent: Monday, July 07, 2003 9:07 AM
To: [EMAIL PROTECTED]
Subject: lucene handling different docum
could anyone tell me if there's some sort of repository somewhere that contains parsers for document types such as .doc, .pdf, .xls? or how i'd begin to go about thinking to write one (tutorials etc much appreciated)
thanks,
maurice
__
Jagdip,
You could try using the .NET port of Lucene. I haven't tried doing
any asp.net yet but I imagine it should be fairly easy to connect to.
http://sourceforge.net/projects/nlucene
I've been using it for several months now. It seems pretty stable and
most of the functionality is there.
>>// just remove invalid characters: in php
>>$pattern ="/[\x-\x8\xb-\xc\xe-\x1f]/";
>>$string = preg_replace($pattern,'',$string);
- Original Message -
From: "Jagdip Singh" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Monday, July 07, 20
21 matches
Mail list logo