: Stored = as-is value stored in the Lucene index
:
: Tokenized = field is analyzed using the specified Analyzer - the tokens
: emitted are indexed
:
: Indexed = the text (either as-is with keyword fields, or the tokens
: from tokenized fields) is made searchable (aka inverted)
:
: Vectored = term
On Jan 6, 2005, at 6:23 PM, Ross Rankin wrote:
Could you explain this piece further, Erik "BooleanQuery and AND in
TermQuery for resellerId"
Your code did a textual concatenation (and I'm paraphrasing as I don't
have your previous e-mail handy) of "id:" + resellerId. And then it
parsed the expre
: Hoss, could you tell me what to exceptions I'm missing? Thanks!
anytime you have a "catch" block, you should be doing something with that
exception. If possible, you can recover from an exception, but no matter
what you should log the exception in some way so that you know it
happened.
Your
stetter
Sent: Tuesday, January 04, 2005 6:48 PM
To: Lucene Users List
Subject: Re: Problems...
To start with, there has to be more to the "search" side of things then
what you included. this search function is not static, which means it's
getting called on an object, which obviously
To start with, there has to be more to the "search" side of things then
what you included. this search function is not static, which means it's
getting called on an object, which obviously has some internal state
(paramOffset, hits, and pathToIndex are a few that jump out at me) what
are the va
On Jan 4, 2005, at 10:53 AM, Ross Rankin wrote:
I'm not sure where or how to troubleshoot. Can I examine the indexes
with
anything to see what is there and that it's meaningful. Is there
something
simple I can do to track down what doesn't work in the process?
Thanks.
Echoing a previous sugge
I had a similar situation with the same problem.
I found the previous system was creating all the object (including the
Searcher) and than updating the Index.
The result was the Searcher was not able to find any of the data just added
to the Index.
The solution for me was to move the creation of
[EMAIL PROTECTED] writes:
>
> this solution was the first that i tried.. but this does not run correctly..
> because:
>
> when we try to sort this number in alphanumeric order we obtain that number
> -0010 is higher than -0001
>
right. I failed to see that.
So you would have to use a complemen
hi morus & company;
On Thursday 18 November 2004 12:49, Morus Walter wrote:
> [EMAIL PROTECTED] writes:
> > i need to solve this search:
> > number: -10
> > range: -50 TO 5
> >
> > i need help..
> > i dont find anything using google..
>
> If your numbers are in the interval MIN/MAX and MIN<0 you c
[EMAIL PROTECTED] writes:
>
> i need to solve this search:
> number: -10
> range: -50 TO 5
>
> i need help..
> i dont find anything using google..
>
If your numbers are in the interval MIN/MAX and MIN<0 you can shift
that to a positive interval 0 ... (MAX-MIN) by subtracting MIN from
each numb
Paul,
We are doing similar stuff. We actually do create a hash of database
name, table name and id to form a unique id. So far I have not had any
problems with it.
Cheers,
Aad
Hi,
I'm creating an index from several database tables. Every item within
every table has a unique id which is saved in
Try setUseCompoundFile(false) on your IndexWriter as soon as you create
it or before you call optimize
-Original Message-
From: Christian Rodriguez [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 21, 2004 1:10 PM
To: Lucene Users List
Subject: Re: Problems with Lucene + BDB (Berkeley
Andy, you are right.
I tried with Lucene 1.3 and it worked perfectly. This should be added
to a README in the Lucene + BDB sandbox (or somewhere) so people dont
spend days struggling with those weird non - deterministic bugs I am
getting...
Now, I do need to use version 1.4, so Id like to see if
I used BDB + lucene successfully using the lucene 1.3 distribution,
but it broke in my application with the 1.4 distribution. The 1.4
dist uses a different file system by default, the "cluster file
system", so maybe that is the source of the issues.
good luck,
andy g
On Mon, 20 Sep 2004 19:36:5
27;Lucene Users List'
Subject: RE: Problems indexing Japanese with CJKAnalyzer ... Or French with
UTF-8 and MetaData
I don't think I understand correctly your proposal.
As a basis, I am using Demo3 with indexHTML, HTMLDocument and HTMLParser.
Inside HTML parser, I am calling getMetaTags (ca
4 15:12
À : Lucene Users List
Objet : Re: Problems indexing Japanese with CJKAnalyzer
If its a web application, you have to cal request.setEncoding("UTF-8")
before reading any parameters. Also make sure html page encoding is
specified as "UTF-8" in the metatag. most web app server
If u call above
method, I think it will solve ur problem.
Praveen
- Original Message -
From: "Bruno Tirel" <[EMAIL PROTECTED]>
To: "'Lucene Users List'" <[EMAIL PROTECTED]>
Sent: Thursday, July 15, 2004 6:15 AM
Subject: RE: Problems indexin
Any help available?
Best regards,
Bruno
-Message d'origine-
De : Jon Schuster [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 14 juillet 2004 22:51
À : 'Lucene Users List'
Objet : RE: Problems indexing Japanese with CJKAnalyzer
Hi all,
Thanks for the help on indexing Japanes
Hi all,
Thanks for the help on indexing Japanese documents. I eventually got things
working, and here's an update so that other folks might have an easier time
in similar situations.
The problem I had was indeed with the encoding, but it was more than just
the encoding on the initial creation of
Jon,
Java expects your files to be in the encoding of the Native Locale. In most cases in
the U.S., this will be English. If you want to read files in that are in a different
encoding, you have to tell Java what your encoding is, in this case, Shift JIS. See
the javadocs for java.io.InputStr
Hi Jon,
It sounds to me like you have a character encoding problem. The
native2ascii tool is designed to produce input for the Java compiler;
the "\u7aef" notation you're seeing is understood by Java string
interpreters to mean the corresponding hexadecimal Unicode code point.
Other Java progr
How about creating a special-char-converting-reader like this?
public class LuceneReader extends Reader {
private Reader source = null;
private char buffer = (char) 0;
public LuceneReader( Reader sourceReader ) {
this.source = sourceReader;
}
public int read() {
char result = (char) 0;
I had a similar problem.
I don't know whether there is a more intelligent solution, but the quickest I had in
mind was to
convert the special characters I needed to look up into a fixed random character
string. For
example: prior to indexing I replace all occurences of '+' by 'PLUSsdfaEGsgfAE'.
--- Doug Cutting <[EMAIL PROTECTED]> wrote: > Jayant
Kumar wrote:
> > Thanks for the patch. It helped in increasing the
> > search speed to a good extent.
>
> Good. I'll commit it. Thanks for testing it.
>
> > But when we tried to
> > give about 100 queries in 10 seconds, then again
> we
> > f
Jayant Kumar wrote:
Thanks for the patch. It helped in increasing the
search speed to a good extent.
Good. I'll commit it. Thanks for testing it.
But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased.
This
Thanks for the patch. It helped in increasing the
search speed to a good extent. But when we tried to
give about 100 queries in 10 seconds, then again we
found that after about 15 seconds, the response time
per query increased. Enclosed is the dump which we
took after about 30 seconds of starting t
Doug Cutting wrote:
Please tell me if you are able to simplify your queries and if that
speeds things. I'll look into a ThreadLocal-based solution too.
I've attached a patch that should help with the thread contention,
although I've not tested it extensively.
I still don't fully understand why
Jayant Kumar wrote:
Please find enclosed jvmdump.txt which contains a dump
of our search program after about 20 seconds of
starting the program.
Also enclosed is the file queries.txt which contains
few sample search queries.
Thanks for the data. This is exactly what I was looking for.
"Thread-14"
We conducted a test on our search for 500 requests
given in 27 seconds. We noticed that in the first 5
seconds, the results were coming in 100 to 500 ms. But
as the queue size kept increasing, the response time
of the search increased drastically to approx 80-100
seconds.
Please find enclosed jvm
I noticed delays when concurrent threads query an IndexSearcher too.
our index is about 550MB with about 850,000 docs. each doc with 20-30
fields of which only 3 are indexed. Our queries are not very complex --
just 3 required term queries.
this is what my test did:
intialilize an array of terms
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds s
On Apr 30, 2004, at 8:52 AM, Terry Steichen wrote:
Erik,
Maybe you could donate some of those demo modules (and the accompanying
article/text) to Lucene, so they'd be incorporated officially in the
website?
Sure... and in fact that has been my intention all along. One idea
that I had with the Lu
" <[EMAIL PROTECTED]>
Sent: Friday, April 30, 2004 8:48 AM
Subject: Re: Problems From the Word Go
> Unfortunately the demo that comes with Lucene is harder to run than it
> really should be. My suggestion is to just get the Lucene JAR, and try
> out examples from the many art
Unfortunately the demo that comes with Lucene is harder to run than it
really should be. My suggestion is to just get the Lucene JAR, and try
out examples from the many articles available. My intro Lucene article
at java.net should be easy to get up and running in only a few minutes
of having
Hi Alex,
I just installed Lucene one week ago on a W2K box and it took me some time to get it
running. But
now Lucene is fully integrated in our intranet. I index and search with Lucene our
menu, all
documents and the user profiles and display them according to user's access rights.
By habit I
Alex,
Could you send along whatever error messages you are
receiving?
Thanks,
Jim
--- Alex Wybraniec <[EMAIL PROTECTED]>
wrote:
> I'm sorry if this is not the correct place to post
> this, but I'm very
> confused, and getting towards the end of my tether.
>
> I need to install/compile and run L
Alex,
What kind of errors are you getting? Is the Lucene JAR in your classpath? Have you
read http://jakarta.apache.org/lucene/docs/gettingstarted.html?
-Grant
>>> [EMAIL PROTECTED] 04/29/04 11:53AM >>>
I'm sorry if this is not the correct place to post this, but I'm very
confused, and getti
Hi,
when you index your field as a keyword, it's not indexed and thus the
analyzer is not used for this field during the indexation.
But, if you make a search using the query parser with the
StandardAnalyzer, it will be used for the parsing.
So, I suppose that in your query 'fieldname:Rev*', the
Maurice,
Please look at the tool Luke,
http://www.getopt.org/luke
And that can help you see into your index.
Maybe there is some trouble with spaces
or trimming of strings...but Luke can help
you there!
Good luck,
Maurits
- Original Message -
From: "Maurice Coyle" <[EMAIL PROTECTED]
Please check the Lucene's jGuru FAQ, your question is answered there.
Otis
--- Flavio Eduardo de Cordova <[EMAIL PROTECTED]> wrote:
> People...
>
> I've created a custom analyser that uses the StandardTokenizer class
> to get the tokens from the reader.
> It seemed to work fine but I
Hi
I also got the IndexOutOfBoundException while optimizing the index (index-
size about 1GB, 50 Docs with 25 fields each).
(optimizing was called via merging of RamDirectoy to FSDirectory).
The problem was that the FieldsReader tried to read more fields than
existed ... .I've no glue how to fi
verything works fine - I can search (and find :)) russian words...
Am I doing something wrong?
Regards, Andrey
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, November 21, 2002
Sorry, my bad! Didn't read this informative post :-)
mvh karl øie
On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote:
Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the
Hi i took a look at Andrey Grishin russian character problem and found
something strange happening while we tried to debug it. It seems that
he has avoided the usual "querying with different encoding than
indexed" problem as he can dump out correctly encoded russian at all
points in his applica
Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the nightly build...
Otis
--- Andrey Grishin <[EMAIL PROTECTED]> wrote:
> Hi All,
> I have a problems with searching on Russian content using luce
> Doesn't that one do just that - treats fields differently, based on
> their name?
yes it does, but look at the question's title
"How do I write my own Analyzer?"
if someone has a problem with a non-tokenized field (which was the
problem of the mail thread that started this) then he doesn't kno
Not sure which FAQ entry you are refering to.
This one http://www.jguru.com/faq/view.jsp?EID=1006122 ?
Doesn't that one do just that - treats fields differently, based on
their name?
Otis
--- Stefanos Karasavvidis <[EMAIL PROTECTED]> wrote:
> I came accross the same problem and I think that the
I came accross the same problem and I think that the faq entry you
(Otis) propose should get a better title so that users can find more
easily an answer to this problem.
Correct me if I'm wrong (and please forgive any wrong assumptions I may
have made), put the problem is on "how to query on a
Thanks, it's a FAQ entry now:
How do I write my own Analyzer?
http://www.jguru.com/faq/view.jsp?EID=1006122
Otis
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> karl øie wrote:
> > I have a Lucene Document with a field named "element" which is
> stored
> > and indexed but not tokenized. The val
it works :-) when i see this i understand that the term being parsed by
the queryparser is sent trough the analyzer as well... thanks!
mvh karl øie
On torsdag, sep 26, 2002, at 18:44 Europe/Oslo, Doug Cutting wrote:
> karl øie wrote:
>> I have a Lucene Document with a field named "element" whi
, September 27, 2002 2:24 PM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...
lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand:
> My query (Query: +element:POST +nr:3) is supposed to match only one
> re
lex Murzaku wrote:
> I was trying this as well but now I get something I can't understand:
> My query (Query: +element:POST +nr:3) is supposed to match only one
> record. Indeed Lucene returns that record with the highest score but it
> also returns others that shouldn't be there at all even if it
ord
0.63916886 Keyword Keyword
0.6044586 Keyword Keyword
0.5773442 Keyword Keyword
0.56318253 Keyword Keyword
0.54449975 Keyword Keyword
0.5247468 Keyword Keyword
0.45054603 Keyword Keyword
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26,
karl øie wrote:
> I have a Lucene Document with a field named "element" which is stored
> and indexed but not tokenized. The value of the field is "POST"
> (uppercase). But the only way i can match the field is by entering
> "element:POST?" or "element:POST*" in the QueryParser class.
There ar
I have also observed this behavior.
- Original Message -
From: "karl øie" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Thursday, September 26, 2002 4:50 AM
Subject: Problems with exact matces on non-tokenized fields...
Hi, i have a problem with getting a exact m
Message-
From: karl øie [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26, 2002 8:22 AM
To: Lucene Users List
Subject: Re: Problems with exact matces on non-tokenized fields...
Hm.. a misunderstanding: i don't create the field with the value
"POST?" i create it with "POS
Hm.. a misunderstanding: i don't create the field with the value
"POST?" i create it with "POST". "element:POST?" or "element:POST*" are
the strings i send to the QueryParser for searching.
mvh Karl Øie
On torsdag, sep 26, 2002, at 14:13 Europe/Oslo, Alex Murzaku wrote:
> But indeed "POST" do
But indeed "POST" does not match to "POST?". If you are not tokenizing
the field, the character "?" remains there together with everything
else.
-Original Message-
From: karl øie [mailto:[EMAIL PROTECTED]]
Sent: Thursday, September 26, 2002 7:50 AM
To: Lucene Users List
Subject: Problems
Angre,
It sounds like there is more to this than compiling Lucene :)
Look inside those jars to see if that is compiled code (.class files).
If you don't know how to do that you need to learn how to use jar tool.
If you have .class files in there, you don't need to compile anything,
just put that
Hi Otis,
Thanks, I have followed your suggestion about
installing Ant but I tried to test Ant by running
build.xml but in vain. Instead, it said the file
couldn't be found although I put the file in the bin
folder. can help me how to build ant successfully?
thanks..
Hmm.. I have jar files of Luc
I don't see any mention of using Ant, in your email, so I assume you're
not using Ant to compile Lucene. The first place to look then is
http://jakarta.apache.org/ant/.
Read about it, download and install it, and then go to the directory
where you unpacked Lucene and type: ant jar
That will vrea
If your parsing html files have a check in lucene
to see the terms that are index and see if you can
spot any joined terms.
The PDF parser as you can see from the other mail is from
www.pdfbox.org and i highly recommend it (thanks again Ben!)
On Wed, 14 Aug 2002, Maurits van Wijland wrote:
>
Maurits,
You can get a PDF parser from http://www.pdfbox.org
-Ben
On Wed, 14 Aug 2002, Maurits van Wijland wrote:
> Keith,
>
> I haven't noticed the problem with the Parser...but you trigger me
> by saying that you have a PDFParser!!!
>
> Are you able to contribute this PDFParser??
>
> Maurit
Keith,
I haven't noticed the problem with the Parser...but you trigger me
by saying that you have a PDFParser!!!
Are you able to contribute this PDFParser??
Maurits.
- Original Message -
From: "Keith Gunn" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday,
thank you, that works! :-) and saves my day!
mvh karl øie
-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: 10. august 2002 18:29
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: Problems understanding RangeQuery...
Hi Karl,
I have discovered that with
Hi Karl,
I have discovered that with range queries you *must* ensure there is a space
on either side of the dash.
That is, [1971 - 1979] rather than [1971-1979]. If you don't, Lucene will
interpret it as [1979 - null].
To illustrate a bit more, here are some result totals that I get on my
inde
Kent,
I can't tell from this code and without exception stack traces with
line numbers...
Looks fine to me.
I think I added some methods to IndexWriter that allow you to check if
the index is locked, maybe that can help you write more robust code...
Nice service there. I created a service much l
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> It is most effiecient to batch deletions and insertions, i.e., perform a
> bunch of deletions on a single IndexReader, close it, then perform a bunch
> of insertions on a single IndexWriter. Usually the IndexReader that you do
> the deletions on
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
>
> Problem solved, thanks!
Great!
> BTW, is the way I'm doing the deletion the correct one? I
> reckon I can't use a cached reader, since I have to close it after the
> deletion to release the write lock. Does it make sense?
Yes. Looks good to
or the fix and, most important, for Lucene)
--Daniel
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: domingo, 10 de fevereiro de 2002 19:55
> To: 'Lucene Users List'
> Subject: RE: problems with last patch (obtain write.lock while deleting
&g
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
>
> I've just updated my version (via CVS) and now I'm having
> problems with document deletion. I'm trying to delete a document using
> IndexReader's delete(Term) method and I'm getting an IOException:
>
> java.io.IOException: Index locked for wr
Hi,
I forgot to mention that during this deletion there's no index writer opened and no
write lock in the index. The lock that's causing
the problem is created by the reader when invoking delete(docNum).
--Daniel
> -Original Message-
> From: Daniel Calvo [mailto:[EMAIL PROTECTED]]
> Se
ft side
of a BooleanQuery subtract. Sure, it works, but it ain't pretty...
Scott
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, November 01, 2001 10:49 AM
> To: 'Lucene Users List'
> Subject: RE: Problems with prohibi
> From: Scott Ganyo [mailto:[EMAIL PROTECTED]]
>
> How difficult would it be to get BooleanQuery to do a
> standalone NOT, do you
> suppose? That would be very useful in my case.
It would not be that difficult, but it would make queries slow. All terms
not containing a term would need to be e
How difficult would it be to get BooleanQuery to do a standalone NOT, do you
suppose? That would be very useful in my case.
Scott
> -Original Message-
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, October 31, 2001 2:36 PM
> To: 'Lucene Users Li
Lucene does not implement a standalone "NOT" query. (Probably BooleanQuery
should throw an exception if all clauses are prohibited clauses.) Negation
is only implemented with respect to other non-negated clauses.
So you cannot directly model your query tree as a Lucene query tree. NOT
nodes mu
76 matches
Mail list logo