RE: converting text/doc to XML

2003-07-07 Thread Nader S. Henein
We read from the database and parse the data into a valid XML then I hand over the XML file to lucene which in turn digests it and indexes the information N. -Original Message- From: Jagdip Singh [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 08, 2003 10:39 AM To: 'Lucene Users List'; [EM

RE: converting text/doc to XML

2003-07-07 Thread Jagdip Singh
Hi Nader, As you talked about using Lucene for your http://www.bayt.com web site. Do you convert CV's or any other documents to XML format before submitting to Lucene for indexing? Regards, Jagdip -Original Message- From: Nader S. Henein [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 08

RE: converting text/doc to XML

2003-07-07 Thread Jagdip Singh
I will try coding this. Thanks, Jagdip -Original Message- From: Nader S. Henein [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 08, 2003 1:55 AM To: 'Lucene Users List' Subject: RE: converting text/doc to XML XML is an organized, standardized format so let's say your document has the follo

RE: converting text/doc to XML

2003-07-07 Thread Nader S. Henein
XML is an organized, standardized format so let's say your document has the following characteristics File name : foobar.doc Firt line title : Foo Bar File content : Blah blah blah blah Blah blah blah blah Blah blah blah blah Blah blah blah blah Then you have t

converting text/doc to XML

2003-07-07 Thread Jagdip Singh
Hi, How can I convert text/doc to XML? Please help. Regards, Jagdip

RE: Postgres and lucene

2003-07-07 Thread Stephen Eaton
What I am doing through the index process is basically dump the database via a select all statement. Once selected the record sets are looped through and the relevant fields as well as the records key are indexed, so then when I need to retrieve the data I do a select on teh relevant record based

Re: Problems with StandardTokenizer

2003-07-07 Thread Otis Gospodnetic
Please check the Lucene's jGuru FAQ, your question is answered there. Otis --- Flavio Eduardo de Cordova <[EMAIL PROTECTED]> wrote: > People... > > I've created a custom analyser that uses the StandardTokenizer class > to get the tokens from the reader. > It seemed to work fine but I

Problems with StandardTokenizer

2003-07-07 Thread Flavio Eduardo de Cordova
People... I've created a custom analyser that uses the StandardTokenizer class to get the tokens from the reader. It seemed to work fine but I just noticed that some large documents are not having all their content properly indexed, but just [the starting] part of them. Aft

RE: Postgres and lucene

2003-07-07 Thread Jon Crowell
Is this a JDBC issue? If so, see http://archives.postgresql.org/pgsql-jdbc/ > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, July 07, 2003 3:22 PM > To: Lucene Users List > Subject: Re: Postgres and lucene > > > I'm trying to index in a single

Re: Directory implementation using NIO

2003-07-07 Thread Scott Ganyo
Wonderful! I can't wait to try this. I'll try to provide some comparisons as I get to it, but I'd love to hear from anyone else that tries this... Thanks, Scott Francesco Bellomi wrote: Hi, I developed a Directory implementation that accesses an index stored on the filesystem using memory-ma

Text Extracting

2003-07-07 Thread Flavio Eduardo de Cordova
People... Do you know any non-native API to extract text from PPT/PPS files ? I've been trying POI but it seems to me they just support .xls (well) and .doc (a little) files by now... Flavio Cordova - To unsubscribe, e-m

Re: Postgres and lucene

2003-07-07 Thread jessica . maryott
I'm trying to index in a single table with multiple fields. I understand how lucene does that and I have some code that I think will work, except that I don't think it's finding that database to index it in the first place, and therein lies my problem. Quoting Jeff Linwood <[EMAIL PROTECTED]>: >

Re: Postgres and lucene

2003-07-07 Thread Jeff Linwood
I think you might need to post a little bit more detail about what you are trying to solve. Are you trying to index one field in one table in your database, several fields, several tables? As a general idea, you will need to create a Lucene Document object for each record you put into the search

Re: Postgres and lucene

2003-07-07 Thread jessica . maryott
Thanks, I'll look into it. It looks like Dspace mgiht be compatable with what I need. I'm also looking for how to implement this myself, since the scope is fairly small, and Dspace might be too much for what I need. Quoting Xuheng Xu <[EMAIL PROTECTED]>: > DSpace is using Postgres and lucene. >

Re: Postgres and lucene

2003-07-07 Thread Xuheng Xu
DSpace is using Postgres and lucene. http://www.dspace.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: lucene handling different document formats

2003-07-07 Thread Eric Anderson
There's also a WAR that's already built, that's available at http://www.brownsite.net/docsearch.htm It works with OpenOffice documents, Word doc, Excel, PDF, XML, RTF, TXT, etc. It can work via a servlet interface or a standalone application. Eric Anderson LanRx Network Solutions Quoting "Wi

Postgres and lucene

2003-07-07 Thread jessica . maryott
Hi, I'm new to lucene and I have had a lot of trouble finding information on how exactly to use lucene to search a postgres database. I've searched the archives for this list, but found nothing specific enough to help me. Has anyone used Lucene to search a postgres database who could help?

RE: lucene handling different document formats

2003-07-07 Thread Wilton, Reece
The Lucene FAQ on Java Guru gives some hints on this: http://www.jguru.com/faq/Lucene -Original Message- From: Maurice Coyle [mailto:[EMAIL PROTECTED] Sent: Monday, July 07, 2003 9:07 AM To: [EMAIL PROTECTED] Subject: lucene handling different docum

lucene handling different document formats

2003-07-07 Thread Maurice Coyle
could anyone tell me if there's some sort of repository somewhere that contains parsers for document types such as .doc, .pdf, .xls?  or how i'd begin to go about thinking to write one (tutorials etc much appreciated)   thanks, maurice __

Re: accessing Lucene functionality from asp.net

2003-07-07 Thread Brian Mila
Jagdip, You could try using the .NET port of Lucene. I haven't tried doing any asp.net yet but I imagine it should be fairly easy to connect to. http://sourceforge.net/projects/nlucene I've been using it for several months now. It seems pretty stable and most of the functionality is there.

Re: making XML from articles

2003-07-07 Thread Che Dong
>>// just remove invalid characters: in php >>$pattern ="/[\x-\x8\xb-\xc\xe-\x1f]/"; >>$string = preg_replace($pattern,'',$string); - Original Message - From: "Jagdip Singh" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Monday, July 07, 20