Hello,
I used the setMaxFieldLength() and it works now thx all.
Doron Cohen wrote:
Stefan Colella wrote:
I tried to only add the content of the page where that expression can be
found (instead of the whole document) and then the search works.
Do i have to split my pdf text into more fiel
Hi Walter,
let me explain my problem in detail
I have a web page let user to create his own query simple
for example a user want to locate a service with specific value. so he/she
doesnt know exactly the name of the service so I have to provide a list of
services available (say in a combo box) and
Hi Steve,
No I didn't make any change on WhiteSpaceAnalyzer I just extends my classes
from the original classes and then override my new changes. so I dont think
I should to contribute my classes.
and my language is Persian, and only change I've made is not to ignoring
unicode characters in Persi
The indexation part of Hibernate Search relies on Java Persistence
API to triggers the index update transparently. Otherwise you can
trigger it manually to follow the crawling approach (not transparent).
Event driven vs crawling driven index update have both use cases, I
would not say that on
You're right, I am suggesting that you use the Lucene
caching and see if it is adequate.
Mind you, I have no clue whether your application will be well
served by this or not, I've just seen too many examples of folks
(includeing me) jumping into a solution to a problem that doesn't
exist to be ab
Hoss,
My Lucene scaling strategy involves creating
numerous indexes, so I was looking for a way
to read them in together for quickness.
For those interested, your suggestion of using a
single IndexSearcher on a MultiReader works well
by itself.
Or, you can still place in memory like this:
Inde
Erick,
Thanks for the reply, this is a web application.
If you want to serve image files in a scalable fashion
on the Internet you make Apache serve them from
memory, not the filesystem.
For databases, some sites use a distributed object
memory caching system such as memcached.
I was hoping th
Donna, this is what you need to do to get the jar, and after that you just use
MLT according to its API.
$ cd lucene-trunk
otis:~/dev/workspace/lucene-trunk otis$ cd contrib/queries/
otis:~/dev/workspace/lucene-trunk/contrib/queries otis$ ff MoreLikeThis.java
./src/java/org/apache/lucene/search/s
Hello,
I'm sorry if this is a naive question, but I have implemented my own
"MoreLikeThis" functionality, and
in re-reading the FAQ saw that it looks like something like this is
already built, so I wanted to try it out and see
if it would simplify my code:
How do I find similar documents?
See
This is actually more for java-dev, but anyway.
On Tuesday 22 May 2007 11:04, Mark Miller wrote:
> Sorry, didn't mean to imply that that whole spiel was a technical
> explanation...just a "how I like to think of it" to get my head around
> the BooleanQuery system. If your reading that, think hig
Hi Mohammad,
May I ask what your language is? And what kind of changes to
WhitespaceAnalyzer were required to make it work with your language?
If you have made modifications to WhitespaceAnalyzer that are generally
useful, please consider contributing your changes back to the Lucene
project. Th
You have to turn on term vectors when indexing. Take a look at the
Field constructor that passes in TermVector.
-Grant
On May 22, 2007, at 8:09 AM, Mohammad Norouzi wrote:
I would use a term vector to get this. See
IndexReader.getTermFreqVector. You can get the term vector for just
field
I would use a term vector to get this. See
IndexReader.getTermFreqVector. You can get the term vector for just
field 3.
Grant, thanks, in my case, getTermFreqVector returns null, I dont know why
it accepts a docnumber as parameter, what is it? is that the same doc id?
if yes it restrict the r
I would use a term vector to get this. See
IndexReader.getTermFreqVector. You can get the term vector for just
field 3.
-Grant
On May 22, 2007, at 5:29 AM, Mohammad Norouzi wrote:
Hi all
consider following index
field1 field2 field3
text1
Let's suppose you modify your WhitespaceAnalyzer not to use a
WhitespaceTokenizer, but a modified version of the Tokenizer which
token-ize not by space but by something else, like '/'. (this is just an
example of course).
So suppose your real txt document contain :
/text2 text3/text4 text5/text6
Wh
Walter,
Yes I am using a customized WhiteSpaceAnalyzer while indexing.
I said customized because I realized that standard WhiteSpaceAnalyzer dont
accept unicode terms in my language so I make some change to support that.
but for reading no Analyzer is used
if I want to get that result, which ana
If Reader.terms() gives you:
text3
text4
while you expect
text3 text4
you should change, I presume, the Analyzer, maybe writing your own one.
Mohammad Norouzi wrote:
> Hi all
>
> consider following index
>
> field1 field2 field3
> text1 text1 text2
Hi,
I need to execute a query on a subset of documents (I know their ids)
and it has to be very fast. I've made a Filter that set the bitset only
for needed docids. The point is, the subset is very small versus a index
which is very big (subset size is always below the 0.05% of the total
numbers of
Hi all
consider following index
field1 field2 field3
text1 text1 text2 text3 text4
text4 text2 text2 text3 text5
I want to get all terms in filed3
if I use Reader.terms() it will returns
Sorry, didn't mean to imply that that whole spiel was a technical
explanation...just a "how I like to think of it" to get my head around
the BooleanQuery system. If your reading that, think high level overview
more than technically accurate. I'll be more specific in the future --
as always, the
: I'd *strongly* recommend, if you haven't, just using the regular
: FSDirectories rather than RAMDirectories and only getting
: complex if that's too slow...
...and if you are "Multi Searching" over a bunch of local directories
anyway, then use a single INdexSearcher on a MultiReader instead ...
: BooleanQuery.Occur.SHOULD for C, D and E. However the javadocs for
: BooleanClause.Occur.SHOULD states:
:
: "Use this operator for clauses that /should/ appear in the matching
: documents. For a BooleanQuery with two |SHOULD| subqueries, at least one
: of the clauses must appear in the matching
: Each doc is going to get a score -- if the score is positive the doc
: will be a hit, if the score is 0 the doc will not be a hit.
that's actually a fairly missleading statement ... the guts of Lucene
doesn't prevent documents from "matching" with a negative score
(specificly: a HitCollector ca
23 matches
Mail list logo