RE: test case - RE: Slash Problem

2002-11-25 Thread Spencer, Dave
Good point though I thought the rule was you were supposed to use the same Analyzer on your Query as you built the index with. Of course I suspect that this will break down if the Field.Keyword text has spaces in it. But: it gets past all reasonable uri/url/filename cases so thanks. -Origin

Re: Query Syntax Continued.

2002-11-25 Thread Otis Gospodnetic
To answer your question: I haven't heard of this idea before. Otis --- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote: > I've also been working on the idea of a Generic Query Markup Language > > (QML), that describes any search query in XML format, this allows one > to > use a SAX Parser or and XS

Re: test case - RE: Slash Problem

2002-11-25 Thread Otis Gospodnetic
Maybe there is a good reason for using WhitespaceAnalyzer in TestQueryParser.java :). Try it. public void testEscaped() throws Exception { Analyzer a = new WhitespaceAnalyzer(); assertQueryEquals("\\[brackets", a, "\\[brackets"); assertQueryEquals("\\[brackets", null,

test case - RE: Slash Problem

2002-11-25 Thread Spencer, Dave
I'm sure there's something that I'm missing here. Let's say we have an index of a web site with 2 fields, "body", and "url". Body is formed via Field.Text(...,Reader) and the url field by Field.Keyword(), thus the URL is not tokenized but is searchable. I use StandardAnalyzer and I want to find

RE: Slash Problem

2002-11-25 Thread Spencer, Dave
OK, sorry for the noise then. If I can reproduce I'll be more precise. -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 12:13 PM To: Lucene Users List Subject: Re: Slash Problem Dave, My recent testing suggests that when the field is no

Re: Slash Problem

2002-11-25 Thread Terry Steichen
Dave, My recent testing suggests that when the field is not tokenized, it is not split as you suggest. When I search the "path" field using "path:1102/A*" I get precisely what I am looking for (though I discovered the lowercase mechanism isn't applied to this field and the query is case-sensitive

RE: Book

2002-11-25 Thread Spencer, Dave
I didn't see anyone mention my favorite text, "Managing Gigabytes". My amazon link is: http://www.amazon.com/exec/obidos/ASIN/1558605703/tropoA -Original Message- From: William W [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 20, 2002 12:14 PM To: [EMAIL PROTECTED] Subject: Book

RE: PDF parser

2002-11-25 Thread Spencer, Dave
I've tried all 3 of those and none have worked out for me. Our intranet has 802 PDFs from lots of (vendor) sources and all the pure java parsers have trouble w/ some of them. I've since gone to pdftotext from xpdf at the link below. True, not pure java, but it works on all platforms w/ my doc set a

RE: Slash Problem

2002-11-25 Thread Spencer, Dave
Funny, I have more or less the same question I've been meaning to post. I think the answer is going to be that the analyzer applies to all parts of a query, even to untokenized fields, which to me seems wrong. So I think if you have a query like body:foo uri:"/alpha/beta" With 'body' bei

Re: Slash Problem

2002-11-25 Thread Terry Steichen
I confirmed that I can search properly when I replace all my backslashes with forward slashes. (I had the terminology reversed in my last message.) I now have to check if the Windows OS will accept the replacement or if I'll have to do dynamic conversions when I use the content. It appears that th

Re: Is "id" a special case?

2002-11-25 Thread Terry Steichen
I was in error about the structures. The "pub_date" field is indexed, stored but *not* tokenized. When I changed the "id" field to that form, it then appeared to worked fine (based on a very small sample - will test more soon). Can anyone help me understand why the search takes such a huge amoun

Is "id" a special case?

2002-11-25 Thread Terry Steichen
I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1, StandardAnalyzer, QueryParser). My indexed documents have, among other fields, two Text fields (indexed, tokenized, stored) called "pub_date" and "id". These two fields have similar values. A typical pub_date value is "

Re: Slash Problem

2002-11-25 Thread Terry Steichen
Rob, I presume that means that you used backslashes (in the url) rather than forward slashes (in the path). I had planned to test that as a workaround and it's good to know that you've already tested that successfully. But why is this necessary? Why doesn't the escape ('\') allow the use of a b

RE: Slash Problem

2002-11-25 Thread Rob Outar
I don't know if this helps but I had exact same problem, I then stored the URI instead of the path, I was then able to search on the URI. Thanks, Rob -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED]] Sent: Monday, November 25, 2002 11:53 AM To: Lucene Users Group Subjec

Slash Problem

2002-11-25 Thread Terry Steichen
I've got a Text field (tokenized, indexed, stored) called 'path' which contains a string in the form of '1102\A3345-12RT.XML'. When I submit a query like "path:1102*" it works fine. But, when I try to be more specific (such as "path:1102\a*" or "path:1102*a*") it fails. I've tried escaping th

RE: Searches are not case insensitive

2002-11-25 Thread Otis Gospodnetic
Ah, field NAME, not field VALUE. Normally when people refer to field they tend to refer to field value. Yes, field name does not get touched in an Analyzer. I have not tested that, but apparently so (case sensitivity). Don't worry about finding out, just use lower case field names everywhere. O

RE: Searches are not case insensitive

2002-11-25 Thread Rob Outar
>From briefly looking at the code it looks like the "field" does not get touched it seems like the only part that gets converted to lower case is the value, so I am assuming that the field name is case sensitive but the value is not? Thanks, Rob -Original Message- From: Otis Gospodneti

Re: Searches are not case insensitive

2002-11-25 Thread Otis Gospodnetic
Why not add print statements to your analyzer to ensure that what you think is happening really is happening? Token has an attribute called 'text' that you could print, I believe. Otis --- Rob Outar <[EMAIL PROTECTED]> wrote: > Hello all, > > I created the following analyzer so that clien

Searches are not case insensitive

2002-11-25 Thread Rob Outar
Hello all, I created the following analyzer so that clients could pose case insensitive searches but queries are still case sensitive: // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) {

Re: problems with search on Russian content

2002-11-25 Thread Andrey Grishin
I got the noghtly build from the CVS When I am trying to use IndexWriter this way: writer = new IndexWriter(indexDirectory, new RussianAnalyzer("Cp1251".toCharArray()), true); I got the following exception ---