Good point though I thought the rule was you were supposed
to use the same Analyzer on your Query as you built the
index with.
Of course I suspect that this will break down if the
Field.Keyword text has spaces in it.
But: it gets past all reasonable uri/url/filename cases so thanks.
-Origin
To answer your question: I haven't heard of this idea before.
Otis
--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote:
> I've also been working on the idea of a Generic Query Markup Language
>
> (QML), that describes any search query in XML format, this allows one
> to
> use a SAX Parser or and XS
Maybe there is a good reason for using WhitespaceAnalyzer in
TestQueryParser.java :). Try it.
public void testEscaped() throws Exception {
Analyzer a = new WhitespaceAnalyzer();
assertQueryEquals("\\[brackets", a, "\\[brackets");
assertQueryEquals("\\[brackets", null,
I'm sure there's something that I'm missing here.
Let's say we have an index of a web site with 2 fields,
"body", and "url".
Body is formed via Field.Text(...,Reader) and the url field by
Field.Keyword(), thus the URL is not tokenized but is searchable.
I use StandardAnalyzer and I want to find
OK, sorry for the noise then.
If I can reproduce I'll be more precise.
-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Monday, November 25, 2002 12:13 PM
To: Lucene Users List
Subject: Re: Slash Problem
Dave,
My recent testing suggests that when the field is no
Dave,
My recent testing suggests that when the field is not tokenized, it is not
split as you suggest. When I search the "path" field using "path:1102/A*" I
get precisely what I am looking for (though I discovered the lowercase
mechanism isn't applied to this field and the query is case-sensitive
I didn't see anyone mention my favorite text, "Managing Gigabytes".
My amazon link is:
http://www.amazon.com/exec/obidos/ASIN/1558605703/tropoA
-Original Message-
From: William W [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 20, 2002 12:14 PM
To: [EMAIL PROTECTED]
Subject: Book
I've tried all 3 of those and none have worked out for me.
Our intranet has 802 PDFs from lots of (vendor) sources and
all the pure java parsers have trouble w/ some of them.
I've since gone to pdftotext from xpdf at the link below.
True, not pure java, but it works on all platforms
w/ my doc set a
Funny, I have more or less the same question I've been meaning to post.
I think the answer is going to be that the analyzer applies to all parts
of
a query, even to untokenized fields, which to me seems wrong.
So I think if you have a query like
body:foo uri:"/alpha/beta"
With 'body' bei
I confirmed that I can search properly when I replace all my backslashes
with forward slashes. (I had the terminology reversed in my last message.)
I now have to check if the Windows OS will accept the replacement or if I'll
have to do dynamic conversions when I use the content.
It appears that th
I was in error about the structures. The "pub_date" field is indexed,
stored but *not* tokenized. When I changed the "id" field to that form, it
then appeared to worked fine (based on a very small sample - will test more
soon).
Can anyone help me understand why the search takes such a huge amoun
I've encountered some very puzzling Lucene behavior (I'm using 1.3dev1,
StandardAnalyzer, QueryParser).
My indexed documents have, among other fields, two Text fields (indexed, tokenized,
stored) called "pub_date" and "id". These two fields have similar values. A typical
pub_date value is "
Rob,
I presume that means that you used backslashes (in the url) rather than
forward slashes (in the path). I had planned to test that as a workaround
and it's good to know that you've already tested that successfully.
But why is this necessary? Why doesn't the escape ('\') allow the use of a
b
I don't know if this helps but I had exact same problem, I then stored the
URI instead of the path, I was then able to search on the URI.
Thanks,
Rob
-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: Monday, November 25, 2002 11:53 AM
To: Lucene Users Group
Subjec
I've got a Text field (tokenized, indexed, stored) called 'path' which contains a
string in the form of '1102\A3345-12RT.XML'. When I submit a query like "path:1102*"
it works fine. But, when I try to be more specific (such as "path:1102\a*" or
"path:1102*a*") it fails. I've tried escaping th
Ah, field NAME, not field VALUE. Normally when people refer to field
they tend to refer to field value.
Yes, field name does not get touched in an Analyzer.
I have not tested that, but apparently so (case sensitivity). Don't
worry about finding out, just use lower case field names everywhere.
O
>From briefly looking at the code it looks like the "field" does not get
touched it seems like the only part that gets converted to lower case is the
value, so I am assuming that the field name is case sensitive but the value
is not?
Thanks,
Rob
-Original Message-
From: Otis Gospodneti
Why not add print statements to your analyzer to ensure that what you
think is happening really is happening? Token has an attribute called
'text' that you could print, I believe.
Otis
--- Rob Outar <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I created the following analyzer so that clien
Hello all,
I created the following analyzer so that clients could pose case
insensitive searches but queries are still case sensitive:
// do not tokenize any field
TokenStream t = new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
I got the noghtly build from the CVS
When I am trying to use IndexWriter this way:
writer = new IndexWriter(indexDirectory, new
RussianAnalyzer("Cp1251".toCharArray()), true);
I got the following exception
---
20 matches
Mail list logo