Re: Zip Files

2005-03-01 Thread Luke Shannon
e, etc. > //get properly parser for current entry > //use parser with zis (ZipInputStream) > } > > good luck > Ernesto > > Luke Shannon escribió: > > >Hello; > > > >Anyone have an ideas on how to index the contents within zip files? > > >

Zip Files

2005-03-01 Thread Luke Shannon
Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Filtering Question

2005-02-23 Thread Luke Shannon
Hello; I'm trying to create a Filter that only retrieves documents with a path field containing a sub string(s). I can get the Filter to work if the BooleanQuery below (used to create the Filter) contains only TermQueries (this requires me to know the exact path). But not if it contains Wildcard?

Re: MultiField Queries without the QueryParser

2005-02-22 Thread Luke Shannon
Responding to this posts. Please disreguard. Sorry. - Original Message - From: "Luke Shannon" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, February 22, 2005 5:16 PM Subject: MultiField Queries without the QueryParser > Hello

MultiField Queries without the QueryParser

2005-02-22 Thread Luke Shannon
Hello; The book meantions the MultiFieldQueryParser as one way of dealing with multifield queries. Can someone point me in the direction of other ways? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional c

Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
. Thanks! Luke - Original Message - From: "Todd VanderVeen" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Monday, February 21, 2005 6:26 PM Subject: Re: Optional Terms in a single query > Luke Shannon wrote: > > >The API I'm working with combi

Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
Sent: Monday, February 21, 2005 5:33 PM Subject: Re: Optional Terms in a single query > Luke Shannon wrote: > > >Hi; > > > >I'm trying to create a query that look for a field containing type:181 and > >name doesn't contain tim, bill or harry. >

Re: Optional Terms in a single query

2005-02-21 Thread Luke Shannon
1 PM Subject: Re: Optional Terms in a single query > On Monday 21 February 2005 23:23, Luke Shannon wrote: > > Hi; > > > > I'm trying to create a query that look for a field containing type:181 and > > name doesn't contain tim, bill or harry. > > typ

Optional Terms in a single query

2005-02-21 Thread Luke Shannon
Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)

Handling Synonyms

2005-02-21 Thread Luke Shannon
Hello; Does anyone see a problem with the following approach? For synonyms, rather than putting them in the index, I put the original term and all the synonyms in the query. Every time I create a query, I check if the term has any synonyms. If it does, I create Boolean Query OR'ing one Query obj

More Analyzer Question

2005-02-18 Thread Luke Shannon
return result; } } Luke Shannon | Software Developer FutureBrand Toronto 207 Queen's Quay, Suite 400 Toronto, ON, M5J 1A7 416 642 7935 (office) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzing Advise

2005-02-18 Thread Luke Shannon
This is exactly what I was looking for. Thanks - Original Message - From: "Steven Rowe" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 18, 2005 4:41 PM Subject: Re: Analyzing Advise > Luke Shannon wrote: > > But now that I'

Analyzing Advise

2005-02-18 Thread Luke Shannon
Hi; I'm having a situation where my synonyms weren't working for a particular field. When I looked at the indexing I noticed it was a Keyword, thus not tokenized. The problem is when I switched that field to Text (now tokenized with my SynonymAnalyzer) a bunch of query queires broke that where t

Re: Lucene in the Humanties

2005-02-18 Thread Luke Shannon
Nice work Eric. I would like to spend more time playing with it, but I saw a few things I really liked. When a specific query turns up no results you prompt the client to preform a free form search. Less sauvy search users will benefit from this strategy. I also like the display of information when

Re: Query Question

2005-02-18 Thread Luke Shannon
Thanks Erik. Option 2 sounds like the path of least resistance. Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 17, 2005 9:05 PM Subject: Re: Query Question > On Feb 17, 2005, at 5:51 P

Re: Query Question

2005-02-17 Thread Luke Shannon
hen you call the toString(): +(type:203) +(name:*home\**) This looks right to me. Any theories as to why the it would not match: Document (relevant fields): Keyword Keyword Is the \ escaping both * characters? Thanks, Luke ----- Original Message - From: "Luke Shannon"

Re: Query Question

2005-02-17 Thread Luke Shannon
That is a query toString(). I created the Query using a Wildcard Query object. Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 17, 2005 3:00 PM Subject: Re: Query Question > > O

Query Question

2005-02-17 Thread Luke Shannon
Hello; Why won't this query find the document below? Query: +(type:203) +(name:*home\**) Document (relevant fields): Keyword Keyword I was hoping by escaping the * it would be treated as a string. What am I doing wrong? Thanks, Luke -

Searches Contain Special Characters

2005-02-17 Thread Luke Shannon
Hi All; How could I handle doing a wildcard search on the input *mario? Basically I would be interested in finding all the Documents containing *mario Here is an example of such a Query generated: +(type:138) +(name:**mario*) How can I let Lucene know that the star closest to Mario on the left

Re: Negative Match

2005-02-11 Thread Luke Shannon
Thanks Eric. This is indeed the way to go. - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 11, 2005 10:25 AM Subject: Re: Negative Match > > On Feb 11, 2005, at 9:52 AM, Luke Shannon wro

Re: Negative Match

2005-02-11 Thread Luke Shannon
e - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 10, 2005 7:23 PM Subject: Re: Negative Match > > On Feb 10, 2005, at 4:06 PM, Luke Shannon wrote: > > > I think I found a pretty good way

Negative Match

2005-02-10 Thread Luke Shannon
I think I found a pretty good way to do a negative match. In this query I am looking for all the Documents that have a kcfileupload field with any value except for jpg. Query negativeMatch = new WildcardQuery(new Term("kcfileupload", "*jpg*")); BooleanQuery typeNegAll = new Boole

Re: Problem searching Field.Keyword field

2005-02-10 Thread Luke Shannon
Are there any issues with having a bunch of boolean queries and than adding them to one big boolean queries (making them all required)? Or should I be looking at Query.combine()? Thanks, Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tu

Re: Problem searching Field.Keyword field

2005-02-10 Thread Luke Shannon
Are there any issues with having a bunch of boolean queries and than adding them to one big boolean queries (making them all required)? Or should I be looking at Query.combine()? Thanks, Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tu

Re: Starts With x and Ends With x Queries

2005-02-07 Thread Luke Shannon
I implemented this concept for my ends with query. It works very well! - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 04, 2005 9:37 PM Subject: Re: Starts With x and Ends With x Queries > > : Also keep in mind that QueryP

Re: RangeQuery With Date

2005-02-07 Thread Luke Shannon
Bingo. Thanks! Luke - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Monday, February 07, 2005 5:10 PM Subject: Re: RangeQuery With Date > : Your dates need to be stored in lexicographical order for the RangeQuery > : to work. > : > : Inde

RangeQuery With Date

2005-02-07 Thread Luke Shannon
Hi; I am working on a set of queries that allow you to find modification dates before, after and equal to a given date. Here are some of the before queries I have been playing with. I want a query that pull up dates modified before Nov 11 2004: Query query = new RangeQuery(null, new Term("modifi

Starts With x and Ends With x Queries

2005-02-04 Thread Luke Shannon
Hello; I have these two documents: Text Keyword Text Text Text Text Text Text Text Text Text Text Text Text Keyword Keyword Text Text Text Text Text Text Text Text Text Text I would like to be able to match a name fields that starts with testing (specifically) and those that end with it. I th

Re: Parsing The Query: Every document that doesn't have a field containing x (but still has the field)

2005-02-04 Thread Luke Shannon
Hello; I think Chris's approach might be helpfull, but I can't seems to get it to work. So since I running out of time and I still need to figure out "starts with" and "ends with" queries, I have implemented a hacky solution to getting all documents with a kcfileupload field present that does not

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
Hi Chris; So the result would contain all documents that don't have field f containing x? What I need to figure out how to do is return all documents that have a field f, but does not contain x. Thanks for your post. Luke - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
ord("owner", "jake")); document.add(Field.Text("keywords", "jakes sensitive info")); writer.addDocument(document); writer.close(); } public void testSecurityFilter() throws Exception { TermQuery query = new TermQuery(new Term("keywords",

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-04 Thread Luke Shannon
rtEquals(1, hits.length()); assertEquals("elwood is safe", "jakes sensitive info", hits.doc(0).get("keywords")); } } On Thu, 3 Feb 2005 13:04:50 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Hello; > > I have a query that finds documen

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
ex term of "stillhere" in that field. It depends on how you built the index (index and stored fields are different), but I would check on that. Also maybe try out TermQuery and see if that does anything for you. > -Original Message- > From: Luke Shannon [mailto:[EMAIL

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
ng x First thing that jumps out is case-sensitivity. Does your olFaithFull field contain "stillHere" or "stillhere"? --Leto > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > This works: > > query1 = QueryParser.parse("jpg&q

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
This works: query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillHere", "olFaithFull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, false); typeNegativeSearch.add(query2,

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
I did, I have ran both queries in Luke. kcfileupload:ppt returns 1 olFaithfull:stillhere returns 119 Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 4:55 PM Subject: Re: Parsing The Query: Every document

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Yes. There should be 119 with stillHere, and if I run a query in Luke on kcfileupload = ppt, it returns one result. I am thinking I should at least get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello, Still working on the same query, here is the code I am currently working with. I am thinking this should bring up all the documents that have olFaithFull=stillHere and kcfileupload!=jpg (so anything else) query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 =

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Ok. I have added the following to every document: doc.add(Field.UnIndexed("olFaithfull", "stillHere")); The plan is a query that says: olFaithull = stillHere and kcfileupload!=jpg. I have been experimenting with the MultiFieldQueryParser, this is not working out for me. From a syntax how is thi

Re: Synonyms Not Showing In The Index

2005-02-03 Thread Luke Shannon
Thanks! I can wait for the release. Luke - Original Message - From: "Andrzej Bialecki" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 2:53 PM Subject: Re: Synonyms Not Showing In The Index > Andrzej Bialecki

Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello; I have a query that finds document that contain fields with a specific value. query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); This works well. I would like a query that find documents containing all kcfileupload fields that don't contain jpg. The example I fou

Re: Lock failure recovery

2005-02-03 Thread Luke Shannon
The indexing process is totally synchronized in our system. Thus if an Indexing thread starts up and the index exists, but is locked, I know this to be the only indexing processing running so the lock must be from a process that got stopped before it could finish. So right before I begin writing t

Synonyms Not Showing In The Index

2005-02-02 Thread Luke Shannon
Hello; It seems my Synonym analyzer is working (based on some successful queries). But I can't see the synonyms in the index using Luke. Is this correct? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: QueryParser Help

2005-02-02 Thread Luke Shannon
Actually now that I am looking at it, I think I am already accomplishing it. I wanted all the documents with Mario in either field to show up. There are two, but one has them in both fields in the Document. This is correct. Thanks for the help. It would have taken me a while to catch that. Luke

Re: QueryParser Help

2005-02-02 Thread Luke Shannon
This is it. Thank Maik. One of the docs had the result in both name and desc. Not sure how to handle this yet, I still don't know enough about QueryParsing. Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Wednesday, February 02, 2005 6

QueryParser Help

2005-02-02 Thread Luke Shannon
Hello; Getting squinted with Query Parsing. I have a questions: Query query = MultiFieldQueryParser .parse("mario", new String[] { "name", "desc" }, new int[] { MultiFieldQueryParser.NORMAL_FIELD, MultiFieldQueryParser.NORMAL_FIELD }, new StandardAnalyzer()); Inde

Re: which HTML parser is better?

2005-02-02 Thread Luke Shannon
In our application I use regular expressions to strip all tags in one situation and specific ones in another situation. Here is sample code for both: This strips all html 4.0 tags except , , , , , , : html_source = Pattern.compile("", Pattern.CASE_INSENSITIVE).matcher(html_source).replaceAll("");

Combining Documents

2005-02-01 Thread Luke Shannon
Hello; I have a situation where I need to combine the fields returned from one document to an existing document. Is there something in the API for this that I'm missing or is this the best way: //add the fields contained in the PDF document to the existing doc Document Document attachedDoc = Luc

Re: How to get document count?

2005-02-01 Thread Luke Shannon
Not sure if the API provides a method for this, but you could use Luke: http://www.getopt.org/luke/ It gives you a count and lets you step through each Doc looking at their fields. - Original Message - From: "Jim Lynch" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, Februar

Re: Boosting Questions

2005-01-27 Thread Luke Shannon
the top. > You should check out the Explanation class, which can dump all scoring > factors in text or HTML format. > > Otis > > > --- Luke Shannon <[EMAIL PROTECTED]> wrote: > > > Hi All; > > > > I just want to make sure I have the right idea about

Boosting Questions

2005-01-27 Thread Luke Shannon
Hi All; I just want to make sure I have the right idea about boosting. So if I boost a document (Document A) after I index it (lets say a score of 2.0) Lucene will now consider this document relativly more important than other documents in the index with a boost factor less than 2.0. This boost f

Re: Getting Into Search

2005-01-26 Thread Luke Shannon
+range > > The grayed-out text has the section name and page number, so you can > quickly locate this stuff in your ebook. > > Otis > P.S. > Do you know if Indigo/Chapters has Lucene in Action on their book > shelves yet? > > > --- Luke Shannon <[EMAIL PROTEC

Getting Into Search

2005-01-26 Thread Luke Shannon
Hello; My lucene application has been performing well in our company's CMS application. The plan now is too offer "advanced searching". I just bought the eBook version of Lucene in Action to help with my research (it is taking Amazon for ever to ship the printed version to Canada). The book look

Re: FOP Generated PDF and PDFBox

2005-01-21 Thread Luke Shannon
just for debugging > purposes like this. > > java org.pdfbox.searchengine.lucene.LucenePDFDocument > > and it should print out the fields of the lucene Document object. Is the > url there and is it correct? > > Ben > > On Fri, 21 Jan 2005, Luke Shannon wrote: >

Re: FOP Generated PDF and PDFBox

2005-01-21 Thread Luke Shannon
ePDFDocument.getDocument() > method? > > Ben > > On Fri, 21 Jan 2005, Luke Shannon wrote: > > > Hello; > > > > Our CMS now allows users to create PDF documents (uses FOP) and than search > > them. > > > > I seem to be able to index these documents o

FOP Generated PDF and PDFBox

2005-01-21 Thread Luke Shannon
Hello; Our CMS now allows users to create PDF documents (uses FOP) and than search them. I seem to be able to index these documents ok. But when I am generating the results to display I get a Null Pointer Exception while trying to use a variable that should contain the url keyword for one of thes

Re: where to place the index directory

2005-01-14 Thread Luke Shannon
gt; String indexLocation = "/home/quilombo/indexHtmlCapoeira/index"; > > thanks for your help > > philippe > > > On Friday 14 January 2005 18:33, Luke Shannon wrote: > > The jsp is having some trouble locating the index folder. It is probably > > the path y

Re: where to place the index directory

2005-01-14 Thread Luke Shannon
sage has been > "unable to open the directory" > > thanks > philippe > > On Friday 14 January 2005 17:56, Luke Shannon wrote: > > Does it give some sort of error message? > > > > Luke > > > > - Original Message - > > From: "phil

Re: where to place the index directory

2005-01-14 Thread Luke Shannon
Does it give some sort of error message? Luke - Original Message - From: "philippe" <[EMAIL PROTECTED]> To: Sent: Friday, January 14, 2005 11:39 AM Subject: where to place the index directory > Hi everybody, > > can someone help me ? > > i have a problem with my index ? > > on my loca

Re: what if the IndexReader crashes, after delete, before close.

2005-01-11 Thread Luke Shannon
lt;[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, January 11, 2005 3:24 AM Subject: RE: what if the IndexReader crashes, after delete, before close. -Oorspronkelijk bericht- Van: Luke Shannon [mailto:[EMAIL PROTECTED] Verzonden: maandag 10 januari 2005 15:46 Aan:

Re: How do you handle dynamic html pages?

2005-01-10 Thread Luke Shannon
I run the indexer in our CMS everytime a content change has occured. It is an incremental update so only documents that generate a different UID than the coresponding UID in the index get processed. Luke - Original Message - From: "Kevin L. Cobb" <[EMAIL PROTECTED]> To: "Lucene Users Lis

Re: what if the IndexReader crashes, after delete, before close.

2005-01-10 Thread Luke Shannon
One thing that will happen is the lock file will get left behind. This means when you start back up and try to create another Reader you will get a file lock error. Our system is threaded and synchronized. Thus when a Reader is being created I know it is the only one (the Writer comes after the re

Re: Check to see if index is optimized

2005-01-07 Thread Luke Shannon
This may not be a simple way, but you could just do a quick check on the folder to see if there is more than one file containing the name segment. Luke - Original Message - From: "Crump, Michael" <[EMAIL PROTECTED]> To: Sent: Friday, January 07, 2005 2:24 PM Subject: Check to see if ind

Re: questions

2005-01-07 Thread Luke Shannon
Hello Jac; If you have verified that the index folder is indeed being create and their is a segment(s) file(s) in it, check that the IndexSearcher in the demo is pointing to that location. This is a easy error to make and would account for the error message no segments folder. Luke - Origin

Re: how to create a long lasting unique key?

2005-01-04 Thread Luke Shannon
This is taken from the example code writen by Doug Cutting that ships with Lucene. It is the key our system uses. It also comes in handy when incrementally updating. Luke public static String uid(File f) { // Append path and date into a string in such a way that lexicographic // sorting give

Re: Problems...

2005-01-04 Thread Luke Shannon
I had a similar situation with the same problem. I found the previous system was creating all the object (including the Searcher) and than updating the Index. The result was the Searcher was not able to find any of the data just added to the Index. The solution for me was to move the creation of

Re: Deleting an index

2005-01-04 Thread Luke Shannon
If you opened an IndexReader was has it also been closed before you attempt to delete? - Original Message - From: "Scott Smith" <[EMAIL PROTECTED]> To: Sent: Monday, January 03, 2005 7:39 PM Subject: Deleting an index I'm writing some junit tests for my search code (which layers on top

Re: LIMO problems

2004-12-13 Thread Luke Shannon
This is a good place to start for extracting the content from power point files: http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04809.html Luke - Original Message - From: "Daniel Cortes" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, December 1

Re: Lucene in Action e-book now available!

2004-12-10 Thread Luke Shannon
Nice Work! Congratulations Guys. - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene User" <[EMAIL PROTECTED]>; "Lucene List" <[EMAIL PROTECTED]> Sent: Friday, December 10, 2004 3:52 AM Subject: Lucene in Action e-book now available! > The Lucene in Action e-book

Re: LIMO problems

2004-12-09 Thread Luke Shannon
I use "Luke". It is pretty good. http://www.getopt.org/luke/ Luke - Original Message - From: "Daniel Cortes" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, December 09, 2004 8:32 AM Subject: LIMO problems > Hi, I'm tying Limo (Index Monitor of Lucene) and I have a problem,

Re: Help to remove document

2004-12-08 Thread Luke Shannon
Hi; The indexReader has a delete method that can do this: public final void delete(int docNum) throws IOException Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the document(int) metho

Re: Weird Behavior On Windows

2004-12-07 Thread Luke Shannon
there be logic in the flaw (swap that), or could you be catching an > Exception that is thrown only on Winblows due to Windows not letting > you do certain things with referenced files and dirs? > > Otis > > --- Luke Shannon <[EMAIL PROTECTED]> wrote: > > > Hello

Re: Weird Behavior On Windows

2004-12-07 Thread Luke Shannon
ed, so you need a new IndexSearcher. Could > there be logic in the flaw (swap that), or could you be catching an > Exception that is thrown only on Winblows due to Windows not letting > you do certain things with referenced files and dirs? > > Otis > > --- Luke Shannon <[EM

Weird Behavior On Windows

2004-12-07 Thread Luke Shannon
Hello All; Things have been running smoothly on Linux for sometime. We set up a version of the site on a Win2K machine, this is when all the "fun" started. A pdf would be added to the system. The indexer would run, find the new file, index it and successfully complete the update of the index fold

Re: Read locks on indexes

2004-12-07 Thread Luke Shannon
I think the read locks are preventing you from deleting from the index with your reader and writing to the index with a writer at the same time. If you never use a writer than I guess you don't need to worry about this. But how do you create the indexes? Luke - Original Message - From:

Re: PDF Indexing Error

2004-12-03 Thread Luke Shannon
fully support PDF security. > > Ben > > On Thu, 2 Dec 2004, Luke Shannon wrote: > > > Hello All; > > > > Perhaps this should be on the PDFBox forum but I was curious if anyone has > > seen this error parsing PDF documents using packages other than PDFBox. >

PDF Indexing Error

2004-12-02 Thread Luke Shannon
Hello All; Perhaps this should be on the PDFBox forum but I was curious if anyone has seen this error parsing PDF documents using packages other than PDFBox. /usr/tomcat/fb_hub/GM/Administration/Document/java/java_io.pdf java.io.IOException: You do not have permission to extract text The weird t

Re: Optimized??

2004-11-22 Thread Luke Shannon
As I understand it optimization is when you merge several segments into one allowing for faster queries. The FAQs and API have further details. http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q24 Luke - Original Message - From: "Miguel Angel" <[EM

Re: How much time indexing doc ??

2004-11-22 Thread Luke Shannon
PDF(s) can definitely slow things down, depending on their size. If there are a few larger PDF documents that time is definitely possible. Luke - Original Message - From: "Miguel Angel" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, November 20, 2004 11:25 AM Subject: How m

False Locking Conflict?

2004-11-19 Thread Luke Shannon
Hey All; Is it possible for there to be a situation where the locking file is in place after the reader has been closed? I have extra logging in place and have followed the code execution. The reader finishes deleting old content and closes (I know this for sure). This is the only reader insta

Re: DOC, PPT index???

2004-11-18 Thread Luke Shannon
Check out: http://jakarta.apache.org/poi/ - Original Message - From: "Miguel Angel" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, November 18, 2004 4:49 PM Subject: DOC, PPT index??? > Hi !!! > Lucene can index the files (do, ppt the MS OFFICE ??) > How do you can this in

Re: version documents

2004-11-18 Thread Luke Shannon
Thank you for the suggestion. I ended up biting the bullet and re-working my indexing logic. Luckily the system itself knows what the "current" version of a document is (otherwise it won't know which one to display to the user) for any given folder. I was able to get a static method I could call

Re: PDF Index Time

2004-11-18 Thread Luke Shannon
I have not extensively tried the snowtide package but they have a trial > download and the docs show that it should be just as easy to integrate as > PDFBox is. They list pricings on there site as well, which is nice that > it is not hidden as some software companies do. > > Ben &

PDF Index Time

2004-11-18 Thread Luke Shannon
Hi; I am using the PDFBox's getLuceneDocument method to parse my PDF documents. It returns good results and was very easy to integrate into the project. However it is slow. Does anyone know of a faster package? Someone mentioned snowtide on an earlier post. Anyone have experience with this pac

Re: urgent help needed

2004-11-18 Thread Luke Shannon
These are the ones I think. They were the first things I read on Lucene and were very helpful. http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html - Original Message - From: "Neelam Bhatnagar" <[EMAIL PROTECTED]> To: "Otis G

Re: version documents

2004-11-17 Thread Luke Shannon
efilename" and "version" and make each a keyword. > > Sort your query by version descending, and only use the first > "basefile" you encounter. > > On Wed, 17 Nov 2004 15:05:19 -0500, Luke Shannon > <[EMAIL PROTECTED]> wrote: > > Hey all; > >

version documents

2004-11-17 Thread Luke Shannon
Hey all; I have ran into an interesting case. Our system has notes. These need to be indexed. They are xml files called default.xml and are easily parsed and indexed. No problem, have been doing it all week. The problem is if someone edits the note, the system doesn't update the default.xml.

Re: index document pdf

2004-11-17 Thread Luke Shannon
Hello; Hopfully I understand the question. 1. Modify the indexDoc(file) method to consider the file type pdf: else if (file.getPath().endsWith(".html") || file.getPath().endsWith(".pdf")) { 2. Create a specific branch of code to create the lucene document from the file type and than add it to t

Re: tool to check the index field

2004-11-17 Thread Luke Shannon
Try this: http://www.getopt.org/luke/ Luke - Original Message - From: "lingaraju" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, November 17, 2004 10:00 AM Subject: tool to check the index field > HI ALL > > I am having index file created by other pe

Index Locking Issues Resolved...I hope

2004-11-16 Thread Luke Shannon
sing > 'Concurrent' and 'updates' in the same sentence sounds like a possible > source of the problem. You have to use a single IndexWriter and it > should not overlap with an IndexReader that is doing deletes. > > Otis > > --- Luke Shannon <[EMAIL PR

Re: _4c.fnm missing

2004-11-16 Thread Luke Shannon
e been indexed correctly. > > my two cents > > Nader > > > > Otis Gospodnetic wrote: > > >'Concurrent' and 'updates' in the same sentence sounds like a possible > >source of the problem. You have to use a single IndexWriter and it &g

Re: _4c.fnm missing

2004-11-16 Thread Luke Shannon
index, the IndexWriter runs one instance at a time, so what kind of increments are we talking about it takes a bit of doing to overwhelm Lucene. > > What's your update schedule, how big is the index, and after how many updates does the system crash? > > Nader Henein > > > &

Re: _4c.fnm missing

2004-11-16 Thread Luke Shannon
of increments are we talking about it takes a bit of doing to overwhelm Lucene. > > What's your update schedule, how big is the index, and after how many updates does the system crash? > > Nader Henein > > > > Luke Shannon wrote: > > >It conistantly breaks when

Re: _4c.fnm missing

2004-11-16 Thread Luke Shannon
fs file (cfs files are compound files > that contain all index files described at the above URL). Maybe you > can provide the code that causes this error in Bugzilla for somebody to > look at. Does it consistently break? > > Otis > > > --- Luke Shannon <[EMAIL PROTECTED]

_4c.fnm missing

2004-11-16 Thread Luke Shannon
I received the error below when I was attempting to over whelm my system with incremental update requests. What is this file it is looking for? I checked the index. It contains: _4c.del _4d.cfs deletable segments Where does _4c.fnm come from? Here is the error: Unable to create the create the

Re: IndexSearcher Refresh

2004-11-16 Thread Luke Shannon
rta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#getCurrentVersion(org.apache.lucene.store.Directory) > > Otis > > > --- Luke Shannon <[EMAIL PROTECTED]> wrote: > > > It would nice if the IndexerSearcher contained a method that could > &

Re: how do you work with PDF

2004-11-16 Thread Luke Shannon
www.pdfbox.org Once you get the package installed the code you can use is: Document doc = LucenePDFDocument.getDocument(file); writer.addDocument(doc); This method returns the PDF in Lucene document format. Luke - Original Message - From: "Miguel Angel" <[EMAIL PROTECTED]> To: <

Re: IndexSearcher Refresh

2004-11-16 Thread Luke Shannon
It would nice if the IndexerSearcher contained a method that could return the last modified date of the index folder it was created with. This would make it easier to know when you need to create a new Searcher. - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucen

Re: Is opening IndexReader multiple times safe?

2004-11-15 Thread Luke Shannon
Hi Satoshi; (B (BI troubled shooted a problem similar to this by moving around a (BIndexReader.isLocked(indexFileLocation) to determine exactly when the reader (Bwas closed. (B (BNote: the method throws an error if the index file doesn't exist that you (Bare checking on. (B (BLuke (B (B-

Re: Lucene : avoiding locking (incremental indexing)

2004-11-15 Thread Luke Shannon
itional load that might be caused by locking off pieces of the database rather then the whole database. I think I need to look in the developer archives. > > JohnE > > > > - Original Message - > From: Luke Shannon <[EMAIL PROTECTED]> > Date: Monday, November 15,

  1   2   >