Hi Walter,
let me explain my problem in detail
I have a web page let user to create his own query simple
for example a user want to locate a service with specific value. so he/she
doesnt know exactly the name of the service so I have to provide a list of
services available (say in a combo box)
Hello,
I used the setMaxFieldLength() and it works now thx all.
Doron Cohen wrote:
Stefan Colella wrote:
I tried to only add the content of the page where that expression can be
found (instead of the whole document) and then the search works.
Do i have to split my pdf text into more
Is it possibe to index CAD formats such as AutoCad or CGM? I know some
commecail products (excalaber) claim to be able to do that? If so what about
TIFF?
thanks
jim s
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
You may have to index things twice, once for searching and once
UN_TOKENIZED for display. Say you have a bunch of service names
you want to display
service one
service two
service three
If you use WhitespaceAnalyzer, TOKENIZED you index the tokens
service (note, there are three of these)
one
No, you can only index text. It's the same thing as indexing HTML
documents or XML documents. You index important stuff from
the guts of the doc.
So, if you can get some text out of these docs, you can index *that*,
possibly along with page information that allows you to, say, display
the
Wow, very nice comments
Thank you so much Erick. You really showed me the way
--
Regards,
Mohammad
--
see my blog: http://brainable.blogspot.com/
Thank you--
Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]
Otis Gospodnetic [EMAIL PROTECTED]
05/22/2007 05:33 PM
Please respond to
java-user@lucene.apache.org
thank you for the reply, I knew the answer but was compelled to ask anyway.
CAD files like AutoCad/ProE/CaTia do contain some useful text and it is
possible to get at that and
index it. But mostly it's vectors and there is not much a text engine can do
with a vectors.
thanks again.
jim s
Hi,
We have been using lucene for years and it serves us well.
Sometimes when we issue a query, we only what to know
how many hits it leads, not want any docs back. Is it possible
to completely avoid score calculation to get total count back?
I understand score calculation needs a loop for all
On 5/23/07, Zhang, Lisheng [EMAIL PROTECTED] wrote:
We have been using lucene for years and it serves us well.
Sometimes when we issue a query, we only what to know
how many hits it leads, not want any docs back. Is it possible
to completely avoid score calculation to get total count back?
I
Hi Mohammad,
WhitespaceAnalyzer uses Java's Character.isWhitespace(char) method to
determine whether or not a character should be part of a token. As far
as I know, this method is problematic only for characters outside of the
Basic Multilingual Plane (BMP). I think Lucene should switch to
Hi,
If a search returns a document that has multiple fields with the same
name, is there a way to filter only those fields that contain hits?
Background:
I am indexing documents and we store all content in our index for
display reasons. We want to show only those pages containing hits. My
As luck would have it, I've done something very similar. What I had
to do is index a special token at the end of each page. Then I could
get the term offsets for each page
Then I used one of the SpanQuery.getSpans to get all of the
offsets of the hits throughout all of the pages.
now I have
Hi,
I studied 5.6 Searching across multiple Lucene indexes 178 in Lucene
in action.
I have 2 remote serarch computers(SearchServer) work as index servers
and search requests from a search client(SearchClient,the 3rd
computer).
An error message, Exception in thread main
Eric,
Thank you very much for your response. That sounds very interesting.
Let me do some experimenting to see if I fully understood your solution.
Otherwise I have to come back to you with more questions.
Andreas
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Two things to watch...
1 Think about indexing the special page-end token with an
increment gap of 0 (see SynonymAnalyzer in Lucene In
Action). That preserves the sense of phrases across
page breaks.
2 Assembling the span query is tricky. Search the mail archive
for SpanQuery to see an exchange
I browsed this list and contributions and have a difficulty to
determine whether there is anything which may be used
straightforwardly to highlight all hits ( no fragmenting ) for large
chunk of text. Probably my query should be sent as 3 separate ones :
1. The fastest possible fragment
Hi folks,
I need to collect some global information from my first 1000 search results
in order to build up some search refining components containing only
relevant values (those which correspond to at least one of the first 1000
hits). For example, the results are products and there is a store
What practical of using WITH_POSITIONS_OFFSETS ? Aren't WITH_OFFSETS
sufficient and if iterate getStartOffset effectively gives the value
from array element of getTermPositions ?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Hi all,
I am working on an application that must deal with ranking on highly dynamic
metadata. For example, suppose I want to provide ranking based on the number
of downloads of hit documents. A user may log-in to the system and send a
query, which will be answered by Lucene in a traditional
Hi Steven
Thank you so much for your thorough comments about Analyzer
I write that class a couple of months ago, now I take a look at my
customized Analyzer
the only change I've made as follows:
the original class has this method:
protected boolean isTokenChar(char c) {
return
Sorry Steven
that change is in WhitespaceTokenizer not WhiteSpaceAnalyzer but in Analyzer
I had to call the tokenizer
On 5/24/07, Mohammad Norouzi [EMAIL PROTECTED] wrote:
Hi Steven
Thank you so much for your thorough comments about Analyzer
I write that class a couple of months ago, now I
22 matches
Mail list logo