Re: Numbers in the Query String

2005-02-03 Thread åç
I agree their viewpoint! On Thu, 3 Feb 2005 14:29:13 -0800 (PST), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Using different analyzers for indexing and searching is not > recommended. > Your numbers are not even in the index because you are using > StandardAnalyzer. Use Luke to look at your i

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread åç
I think you may can use a filter to get right result! See examlples below package lia.advsearching; import junit.framework.TestCase; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.

Re: Optimize not deleting all files

2005-02-03 Thread åç
Your understanding is right! The old existing files should be deleted,but it will build new files! On Thu, 03 Feb 2005 17:36:27 -0800 (PST), [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi, > > When I run an optimize in our production environment, old index are > left in the directory and ar

Optimize not deleting all files

2005-02-03 Thread yahootintin . 1247688
Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows.

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Bingo! Nice catch. That was it. Made everything lower case when I set the field. Works great now. Thanks! Luke - Original Message - From: "Kauler, Leto S" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 6:48 PM Subject: RE: Parsing The Query: Every documen

RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
Because you are build from QueryParser rather than a TermQuery, all search terms in the query are being lowercased by StandardAnalyzer. So your query of "olFaithFull:stillhere" requires that there is an exact index term of "stillhere" in that field. It depends on how you built the index (index an

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
"stillHere" Capital H. - Original Message - From: "Kauler, Leto S" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 6:40 PM Subject: RE: Parsing The Query: Every document that doesn't have a field containing x First thing that jumps out is case-sensitivity

RE: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kauler, Leto S
First thing that jumps out is case-sensitivity. Does your olFaithFull field contain "stillHere" or "stillhere"? --Leto > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > This works: > > query1 = QueryParser.parse("jpg", "kcfileupload", new > StandardAnalyzer()); qu

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
This works: query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillHere", "olFaithFull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, false); typeNegativeSearch.add(query2,

Re: Numbers in the Query String

2005-02-03 Thread Otis Gospodnetic
Using different analyzers for indexing and searching is not recommended. Your numbers are not even in the index because you are using StandardAnalyzer. Use Luke to look at your index. Otis --- Hetan Shah <[EMAIL PROTECTED]> wrote: > Hello, > > How can one search for a document based on the qu

Re: Numbers in the Query String

2005-02-03 Thread Andrzej Bialecki
Hetan Shah wrote: Hello, How can one search for a document based on the query which has numbers in the query srting. e.g. query = Java 2 Platform J2EE What do I need to do so that the numbers do not get neglected. I am using StandardAnalyzer to index the pages and using StopAnalyzer to search th

Numbers in the Query String

2005-02-03 Thread Hetan Shah
Hello, How can one search for a document based on the query which has numbers in the query srting. e.g. query = Java 2 Platform J2EE What do I need to do so that the numbers do not get neglected. I am using StandardAnalyzer to index the pages and using StopAnalyzer to search the documents. Would

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
I did, I have ran both queries in Luke. kcfileupload:ppt returns 1 olFaithfull:stillhere returns 119 Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 4:55 PM Subject: Re: Parsing The Query: Every document

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
Yes. There should be 119 with stillHere, You have double-checked that, haven't you? :) and if I run a query in Luke on kcfileupload = ppt, it returns one result. I am thinking I should at least get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? You really should. -- Maik Schreiber

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Yes. There should be 119 with stillHere, and if I run a query in Luke on kcfileupload = ppt, it returns one result. I am thinking I should at least get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
-kcfileupload:jpg +olFaithFull:stillhere This looks right to me. Why the 0 results? Looks good to me, too. You sure all your documents have olFaithFull:stillhere and there is at least a document with kcfileupload not being "jpg"? -- Maik Schreiber * http://www.blizzy.de <-- Get GMail invites

RE: google mini? who needs it when Lucene is there

2005-02-03 Thread Hauck, William B.
Jian, I disagree that the Google Mini is useless. $5000 is quite inexpensive for a commercial search engine. I know of search engines where the cost is practically 20 cents per document. Heck, a decent server capable of running a heavily loaded search engine costs $3000. Also, don't forget you

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello, Still working on the same query, here is the code I am currently working with. I am thinking this should bring up all the documents that have olFaithFull=stillHere and kcfileupload!=jpg (so anything else) query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 =

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Ok. I have added the following to every document: doc.add(Field.UnIndexed("olFaithfull", "stillHere")); The plan is a query that says: olFaithull = stillHere and kcfileupload!=jpg. I have been experimenting with the MultiFieldQueryParser, this is not working out for me. From a syntax how is thi

Re: which HTML parser is better?

2005-02-03 Thread Ian Soboroff
One which we've been using can be found at: http://www.ltg.ed.ac.uk/~richard/ftp-area/html-parser/ We absolutely need to be able to recover gracefully from malformed HTML and/or SGML. Most of the nicer SAX/DOM/TLA parsers out there failed this criterion when we started our effort. The above one

Re: Searching for doc without a field

2005-02-03 Thread Paul Elschot
On Thursday 03 February 2005 20:18, Bill Tschumy wrote: > Is there any way to construct a query to locate all documents without a > specific field? By this I mean the Document was created without ever > having that field added to it. One way is to add an extra document field containing the fiel

Re: Synonyms Not Showing In The Index

2005-02-03 Thread Luke Shannon
Thanks! I can wait for the release. Luke - Original Message - From: "Andrzej Bialecki" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 2:53 PM Subject: Re: Synonyms Not Showing In The Index > Andrzej Bialecki wrote: > > Luke Shannon wrote: > > > >> Hell

Re: Synonyms Not Showing In The Index

2005-02-03 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Luke Shannon wrote: Hello; It seems my Synonym analyzer is working (based on some successful queries). But I can't see the synonyms in the index using Luke. Is this correct? Did you use the combined JAR to run? It contains an oldish version of Lucene... Other than that, I

Searching for doc without a field

2005-02-03 Thread Bill Tschumy
Is there any way to construct a query to locate all documents without a specific field? By this I mean the Document was created without ever having that field added to it. -- Bill Tschumy Otherwise -- Austin, TX http://www.otherwise.com --

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Kelvin Tan
Alternatively, add a dummy field-value to all documents, like doc.add(Field.Keyword("foo", "bar")) Waste of space, but allows you to perform negated queries. On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote: >> Negating a term must be combined with at least one nonnegated >> term to retu

Re: Rewrite causes BooleanQuery to loose required terms

2005-02-03 Thread Paul Elschot
On Thursday 03 February 2005 11:38, Nick Burch wrote: > Hi All > > I'm using lucene from CVS, and I've discovered the rewriting a > BooleanQuery created with the old style (Query,boolean,boolean) method, > the rewrite will cause the required parameters to get lost. > > Using old style (Query,boo

Re: Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Maik Schreiber
Negating a term must be combined with at least one nonnegated term to return documents; in other words, it isn't possible to use a query like NOT term to find all documents that don't contain a term. So does that mean the above example wouldn't work? Exactly. You cannot search for "-kcfileupload:jp

Parsing The Query: Every document that doesn't have a field containing x

2005-02-03 Thread Luke Shannon
Hello; I have a query that finds document that contain fields with a specific value. query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); This works well. I would like a query that find documents containing all kcfileupload fields that don't contain jpg. The example I fou

Re: when indexing, java.io.FileNotFoundException

2005-02-03 Thread Chris Lu
Thank you for your reply. I am already using compound file format, and the minMergeDocs is already increased to 50. As my understanding and observation, files are compounded at the end of indexing. The error happens when indexing, so compound file format should not matter. Chris Lu Will Allen

RE: when indexing, java.io.FileNotFoundException

2005-02-03 Thread Will Allen
Increase the minMergeDocs and use the compact file format when creating your index. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html#setUseCompoundFile(boolean) -Original Mes

when indexing, java.io.FileNotFoundException

2005-02-03 Thread Chris Lu
Hi, I am getting this exception now and then when I am indexing content. It doesn't always happen. But when it happens, I have to delete the index and start over again. This is a serious problem for us. In this email, Doug was say it has something to do with win32's lack of atomic renaming. http://

Re: Lock failure recovery

2005-02-03 Thread Luke Shannon
The indexing process is totally synchronized in our system. Thus if an Indexing thread starts up and the index exists, but is locked, I know this to be the only indexing processing running so the lock must be from a process that got stopped before it could finish. So right before I begin writing t

Lock failure recovery

2005-02-03 Thread Claes Holmerson
Hello A commit.lock can get left by a process that dies in the middle of reading the index, for example because of an OutOfMemoryError. How can I handle such a left lock gracefully the next time the process runs? Checking if there is a lock is straight forward - but how can I be sure that it is

Re: Subversion conversion

2005-02-03 Thread John Haxby
Kevin L. Cobb wrote: We recently started using SVN for SCM, were using VSS. We're trying out approach A, branching off for each release. Development always develops on the trunk, except when a bug is discovered that needs to be patched to a previous version of the product. When that scenario comes

Hits and HitCollector performance

2005-02-03 Thread aurora
I am trying to do some filtering and rearrangement of search result. Two possiblity come into mind are iterating though the Hits or making custom HitCollector. All documentation invaribly warn about the performance impact of using HitCollector with large result set. The scenario that google

Re: which HTML parser is better?

2005-02-03 Thread aurora
For all parser suggestion I think there is one important attribute. Some parsers returns data provide that the input HTML is sensible. Some parsers is designed to be most flexible as tolerant as it can be. If the input is clean and controlled the former class is sufficient. Even some regular

Re: Right way to make analyzer

2005-02-03 Thread Erik Hatcher
On Feb 3, 2005, at 9:26 AM, Owen Densmore wrote: Is this the right way to make a porter analyzer using the standard tokenizer? I'm not sure about the order of the filters. Owen class MyAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) {

Right way to make analyzer

2005-02-03 Thread Owen Densmore
Is this the right way to make a porter analyzer using the standard tokenizer? I'm not sure about the order of the filters. Owen class MyAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) { return new PorterStemFilter( new Sto

RE: Subversion conversion

2005-02-03 Thread Kevin L. Cobb
We recently started using SVN for SCM, were using VSS. We're trying out approach A, branching off for each release. Development always develops on the trunk, except when a bug is discovered that needs to be patched to a previous version of the product. When that scenario comes up (and it never has)

RE: Getting Search Results Pharse

2005-02-03 Thread Pasha Bizhan
Hi, > From: mahaveer jain [mailto:[EMAIL PROTECTED] > I am using lucene to index and search my app. Till date I am > just showing file name or title based on my application. We > want to show, pharse that contain the keyword searched. > Has anybody tried this ? Can someone help me start this

Getting Search Results Pharse

2005-02-03 Thread mahaveer jain
Hi All, I am using lucene to index and search my app. Till date I am just showing file name or title based on my application. We want to show, pharse that contain the keyword searched. Has anybody tried this ? Can someone help me start this ? Thanks Mahaveer __

Re: Subversion conversion

2005-02-03 Thread Erik Hatcher
We can work the 1.x and 2.0 lines of code however we need to. We can branch (a branch or tag in Subversion is inexpensive and a constant time operation). How we want to manage both versions of Lucene is open for discussion. Nothing about Subversion changes how we manage this from how we'd do

Re: Has anyone tried indexing xml files: DigesterXMLHandler.java file before?

2005-02-03 Thread Erik Hatcher
You're missing the Commons Digester JAR, which is in the lib directory of the LIA download. Check the build.xml file for the build details of how the compile class path is set. You'll likely need some other JAR's at runtime too. Erik On Feb 3, 2005, at 2:12 AM, jac jac wrote: Hi, I ju

Rewrite causes BooleanQuery to loose required terms

2005-02-03 Thread Nick Burch
Hi All I'm using lucene from CVS, and I've discovered the rewriting a BooleanQuery created with the old style (Query,boolean,boolean) method, the rewrite will cause the required parameters to get lost. Using old style (Query,boolean,boolean): query = +contents:test* +(class:1.2 class:1.2.*) rewr

Re: which HTML parser is better? - Thread closed

2005-02-03 Thread Karl Koch
Thank you, I will do that. > Karl Koch wrote: > > >I appologise in advance, if some of my writing here has been said before. > >The last three answers to my question have been suggesting pattern > matching > >solutions and Swing. Pattern matching was introduced in Java 1.4 and > Swing > >is somet

Re: which HTML parser is better?

2005-02-03 Thread Dawid Weiss
Karl, Two things, try to experiment with both: 1) I would try to write a lexical scanner that strips HTML tags, much like the regular expression does. Java lexical scanner packages produce nice pure Java classes that seldom use any advanced API, so they should work on Java 1.1. They are simple s

Re: which HTML parser is better?

2005-02-03 Thread sergiu gordea
Karl Koch wrote: I appologise in advance, if some of my writing here has been said before. The last three answers to my question have been suggesting pattern matching solutions and Swing. Pattern matching was introduced in Java 1.4 and Swing is something I cannot use since I work with Java 1.1 on a

Re: which HTML parser is better?

2005-02-03 Thread Karl Koch
I am using Java 1.1 with a Sharp Zaurus PDA. I have very limited memory constraints. I do not think CPU performance is a big issues though. But I have other parts in my application which use quite a lot of memory and soemthing run short. I therefore do not look into solutions which build up tag tre

Re: which HTML parser is better?

2005-02-03 Thread sergiu gordea
Karl Koch wrote: Unfortunaltiy I am faithful ;-). Just for practical reason I want to do that in a single class or even method called by another part in my Java application. It should also run on Java 1.1 and it should be small and simple. As I said before, I am in control of the HTML and it will b

Re: which HTML parser is better?

2005-02-03 Thread Karl Koch
I appologise in advance, if some of my writing here has been said before. The last three answers to my question have been suggesting pattern matching solutions and Swing. Pattern matching was introduced in Java 1.4 and Swing is something I cannot use since I work with Java 1.1 on a PDA. I am wonde

Re: Subversion conversion

2005-02-03 Thread Miles Barr
On Wed, 2005-02-02 at 22:11 -0500, Erik Hatcher wrote: > I've seen both of these types of procedures followed on Apache > projects. It really just depends. Lucene's codebase is not being > modified frequently, so it is not necessary to branch and merge back. > Rather we simply develop off of

Re: which HTML parser is better?

2005-02-03 Thread sergiu gordea
Karl Koch wrote: Hello Sergiu, thank you for your help so far. I appreciate it. I am working with Java 1.1 which does not include regular expressions. Why are you using Java 1.1? Are you so limited in resources? What operating system do you use? I asume that you just need to index the html files

Re: which HTML parser is better?

2005-02-03 Thread Karl Koch
Unfortunaltiy I am faithful ;-). Just for practical reason I want to do that in a single class or even method called by another part in my Java application. It should also run on Java 1.1 and it should be small and simple. As I said before, I am in control of the HTML and it will be well formated,

Re: which HTML parser is better?

2005-02-03 Thread Karl Koch
Hello Sergiu, thank you for your help so far. I appreciate it. I am working with Java 1.1 which does not include regular expressions. Your turn ;-) Karl > Karl Koch wrote: > > >I am in control of the html, which means it is well formated HTML. I use > >only HTML files which I have transformed

Re: Synonyms Not Showing In The Index

2005-02-03 Thread Andrzej Bialecki
Luke Shannon wrote: Hello; It seems my Synonym analyzer is working (based on some successful queries). But I can't see the synonyms in the index using Luke. Is this correct? Did you use the combined JAR to run? It contains an oldish version of Lucene... Other than that, I'm not sure - if you can't