Index the "filename" when you are indexing as you did the "path". You
can get it back with doc.get("filename");
suba suresh.
Mag Gam wrote:
Is it possible to get Document Name, instead of its entire path?
Currently, i have something like this:
out.println (doc.g
Thanks for everyone's help. I understand how it works now. I can get rid
of MultiFieldQueryParser in search.
thanks
suba suresh.
Erik Hatcher wrote:
Yeah, I used a cruder form by appending all the text together into a
single string with a space separator in that LIA example.
Give
n
someone give me an example of how to do it?
thanks,
suba suresh.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
o it? When the Hits comes back from both
the search how will it be organized?
Is there another way to approach it. Any suggestion?
thanks,
suba suresh.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I thought I would pass it along if anyone is interested.
If the emails are in mbox format "Tropo" as suggested in lucene's faq
works perfectly if it is in an imap store.
For archived emails stored in mbox format I used mstor.jar from
mstor.sourceforge.net with "Tropo&quo
The mail archives and saved mail folders can be opened up and read as a
text file using textpad or wordpad. Will it be possible to index them as
it is? The mail client is Thunderbird Mailbox.
If I have to use third party software is there anything you can suggest?
suba suresh.
Rob Staveley
Hi!
Can someone help me?
suba suresh
Suba Suresh wrote:
I was looking at "http://www.tropo.com/techno/java/lucene/imap.html"; and
my understanding is it is used to retrieve and index the emails that is
on the email server. I have some stored emails in folders in my local
y I
could index them?
I have to have this working in next couple of days. Any help and
suggestions are appreciated.
thanks,
suba suresh.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks for all the response. I am going to investigate java mail api
along with Michael's "Email Analyzer" code that was posted in this group.
thanks,
suba suresh.
John Haxby wrote:
Andrzej Bialecki wrote:
Just for the record - I've been using javamail POP and IMAP pr
retrieval.
thanks,
suba suresh.
Michael J. Prichard wrote:
Kewl :)
I updated the Filter(for anyone interested). Actually..if anyone
wants I can zip it up and send it to them...let me know.
EmailFilter
import org.apache.lucene.analysis.TokenStream;
import
Ok. I will try it. I am a little stupid. When you said go down POP or
IMAP route what did you mean? Is it for Unix/Linux alone that path?
thanks,
suba suresh.
John Haxby wrote:
Suba Suresh wrote:
Anyone know of good free email libraries I can use for lucene indexing
for Windows Outlook
PDFTextStripper.writeText((org.pdfbox.pdmodel.PDDocument,%20java.io.Writer)
did not work for us.
Thanks for all the help.
suba suresh.
Rob Staveley (Tom) wrote:
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-Original Message-
From: Suba Suresh
Anyone know of good free email libraries I can use for lucene indexing
for Windows Outlook Express and Unix emails??
suba suresh.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Definitely. Thanks for both the suggestions. Yes it is 300MB.(typo)
suba suresh.
Rob Staveley (Tom) wrote:
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-Original Message-
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July
Thanks.
I am using the getText(PDDocument) method of the PDFTextStripper. I will
try the other suggestion.
suba suresh.
Rob Staveley (Tom) wrote:
If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getText(o
rg.pdfbox.pdmodel.PDDocument), you are going to get
in my code.
writer.setMergeFactor(1000);
writer.setMaxMergeDocs(999);
writer.setMaxBufferedDocs(1000);
writer.setMaxFieldLength(Integer.MAX_VALUE);
I would like any help and suggestions.
thanks,
suba suresh.
-
To unsubscribe, e-ma
There is a separate user mailing list for poi. Use it.
There are three jar files. Check the scratchpad jar. You have to send in
a FileInputStream(not the filename) as an argument to the WordExtractor
class.
suba suresh.
mcarcelen wrote:
Hi all!
I´m working with poi-bin-3.0-alpha2-20060616
I used java libraries for rtf file formats. Refer to Mannning's Lucene
In Action book. It is helpful and gives pointers where you can access
differentlibraries.
suba suresh.
mcarcelen wrote:
Hi,
Do you know another library for indexing RDF?
Thanks a lot for your help
Teresa
-Me
I used PDFBox library as mentioned in Lucene in Action. It works for me.
You can access it from www.pdfbox.org
suba suresh
mcarcelen wrote:
Hi,
I´m new with Lucene and I´m trying to index a pdf but when I query
everything it returns nothing. Can anyone help me?
Thans a lot
Teresa
19 matches
Mail list logo