Re: Document Get question

2006-08-24 Thread Suba Suresh
Index the "filename" when you are indexing as you did the "path". You can get it back with doc.get("filename"); suba suresh. Mag Gam wrote: Is it possible to get Document Name, instead of its entire path? Currently, i have something like this: out.println (doc.g

Re: How to combine multiple fields to a single field for indexing

2006-08-24 Thread Suba Suresh
Thanks for everyone's help. I understand how it works now. I can get rid of MultiFieldQueryParser in search. thanks suba suresh. Erik Hatcher wrote: Yeah, I used a cruder form by appending all the text together into a single string with a space separator in that LIA example. Give

How to combine multiple fields to a single field for indexing

2006-08-23 Thread Suba Suresh
n someone give me an example of how to do it? thanks, suba suresh. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Search and Hits

2006-08-23 Thread Suba Suresh
o it? When the Hits comes back from both the search how will it be organized? Is there another way to approach it. Any suggestion? thanks, suba suresh. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Information: For Indexing Emails

2006-08-22 Thread Suba Suresh
I thought I would pass it along if anyone is interested. If the emails are in mbox format "Tropo" as suggested in lucene's faq works perfectly if it is in an imap store. For archived emails stored in mbox format I used mstor.jar from mstor.sourceforge.net with "Tropo&quo

Re: Indexing existing email archives

2006-08-15 Thread Suba Suresh
The mail archives and saved mail folders can be opened up and read as a text file using textpad or wordpad. Will it be possible to index them as it is? The mail client is Thunderbird Mailbox. If I have to use third party software is there anything you can suggest? suba suresh. Rob Staveley

Re: Indexing existing email archives

2006-08-14 Thread Suba Suresh
Hi! Can someone help me? suba suresh Suba Suresh wrote: I was looking at "http://www.tropo.com/techno/java/lucene/imap.html"; and my understanding is it is used to retrieve and index the emails that is on the email server. I have some stored emails in folders in my local

Indexing existing email archives

2006-08-14 Thread Suba Suresh
y I could index them? I have to have this working in next couple of days. Any help and suggestions are appreciated. thanks, suba suresh. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: email libraries

2006-07-31 Thread Suba Suresh
Thanks for all the response. I am going to investigate java mail api along with Michael's "Email Analyzer" code that was posted in this group. thanks, suba suresh. John Haxby wrote: Andrzej Bialecki wrote: Just for the record - I've been using javamail POP and IMAP pr

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-31 Thread Suba Suresh
retrieval. thanks, suba suresh. Michael J. Prichard wrote: Kewl :) I updated the Filter(for anyone interested). Actually..if anyone wants I can zip it up and send it to them...let me know. EmailFilter import org.apache.lucene.analysis.TokenStream; import

Re: email libraries

2006-07-26 Thread Suba Suresh
Ok. I will try it. I am a little stupid. When you said go down POP or IMAP route what did you mean? Is it for Unix/Linux alone that path? thanks, suba suresh. John Haxby wrote: Suba Suresh wrote: Anyone know of good free email libraries I can use for lucene indexing for Windows Outlook

Re: Out of memory error

2006-07-26 Thread Suba Suresh
PDFTextStripper.writeText((org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) did not work for us. Thanks for all the help. suba suresh. Rob Staveley (Tom) wrote: Let us know how you get on. There are a lot of people fighting very similar battles on this list. -Original Message- From: Suba Suresh

email libraries

2006-07-26 Thread Suba Suresh
Anyone know of good free email libraries I can use for lucene indexing for Windows Outlook Express and Unix emails?? suba suresh. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Out of memory error

2006-07-13 Thread Suba Suresh
Definitely. Thanks for both the suggestions. Yes it is 300MB.(typo) suba suresh. Rob Staveley (Tom) wrote: Let us know how you get on. There are a lot of people fighting very similar battles on this list. -Original Message- From: Suba Suresh [mailto:[EMAIL PROTECTED] Sent: 13 July

Re: Out of memory error

2006-07-13 Thread Suba Suresh
Thanks. I am using the getText(PDDocument) method of the PDFTextStripper. I will try the other suggestion. suba suresh. Rob Staveley (Tom) wrote: If you are using http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o rg.pdfbox.pdmodel.PDDocument), you are going to get

Out of memory error

2006-07-13 Thread Suba Suresh
in my code. writer.setMergeFactor(1000); writer.setMaxMergeDocs(999); writer.setMaxBufferedDocs(1000); writer.setMaxFieldLength(Integer.MAX_VALUE); I would like any help and suggestions. thanks, suba suresh. - To unsubscribe, e-ma

Re: Lucene WordExtractor

2006-07-11 Thread Suba Suresh
There is a separate user mailing list for poi. Use it. There are three jar files. Check the scratchpad jar. You have to send in a FileInputStream(not the filename) as an argument to the WordExtractor class. suba suresh. mcarcelen wrote: Hi all! I´m working with poi-bin-3.0-alpha2-20060616

Re: Lucene indexing RDF

2006-06-27 Thread Suba Suresh
I used java libraries for rtf file formats. Refer to Mannning's Lucene In Action book. It is helpful and gives pointers where you can access differentlibraries. suba suresh. mcarcelen wrote: Hi, Do you know another library for indexing RDF? Thanks a lot for your help Teresa -Me

Re: Lucene indexing pdf

2006-06-27 Thread Suba Suresh
I used PDFBox library as mentioned in Lucene in Action. It works for me. You can access it from www.pdfbox.org suba suresh mcarcelen wrote: Hi, I´m new with Lucene and I´m trying to index a pdf but when I query everything it returns nothing. Can anyone help me? Thans a lot Teresa