Hi,
We are having an issue while indexing Chinese Documents in Lucene.
Some background first:
Since CJK languages doesn't have space between words, we first have to
determine the words from sentences. e.g.
a sentence containing characters ABC, it may be segmented into AB, C or A, BC.
the proble
Very interesting the link you suggest me Mr Grant Ingersoll.
Let see if I understand how the ranking issue in lucene could be implemented:
1. First I must create my own query class extending the abstract Query
class. The only method I must implement from this class is toString.
Is right this ?
Hi Tobias,
I had the similar problem with lucene custom sorting about two years ago.
Please take a look at these two email threads:
http://www.gossamer-threads.com/lists/lucene/java-dev/39100
http://www.gossamer-threads.com/lists/lucene/java-user/38016
It seems like they did not make any patch f
Hi,
We have implemented a custom sort following the pattern in Lucene in Action.
Unfortunately this has led to quite serious memory problems. When analyzing
those (with a profiler) it seems that there are as many remaining instances of
our SortComparatorSource as there have been queries against th
What about http://www.openxml4j.org/? Any experience there?
On Nov 8, 2007, at 10:04 AM, jm wrote:
I havent upgraded yet but I think I read in the aperture list that
they already had some extractors for some office 2007 stuff in trunk
some ago.
On Nov 8, 2007 3:09 PM, Grant Ingersoll <[EMAIL
I havent upgraded yet but I think I read in the aperture list that
they already had some extractors for some office 2007 stuff in trunk
some ago.
On Nov 8, 2007 3:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> You might also consider asking on the Tika (a Lucene subproject
> currently in Incu
I'm using Lucene core 2.2.0
The field i'm trying to sort is not presente in each documents into the
index. I add the field only if I have a metadata for it
Maybee it's the problem?
My field is untokenized store and indexed.
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
S
You might also consider asking on the Tika (a Lucene subproject
currently in Incubation) and Aperture project sites (http://aperture.sourceforge.net
). Not sure if you will have any luck, but they are also focused on
the extraction problem and may have thought more about it.
-Grant
On Nov
Hello,
I know this has gone around a bit but anyone had any success with
pulling text from Office 2007 files? Any recommendations?
Thanks,
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
Any other info or code snippets? I sort on multisearchers all the time
and have never seen that behavior.
- Mark (sorting on multisearchers since Lucene 1.4 )
WATHELET Thomas wrote:
Hi,
I have few Indexes with the same structure.
I'm using MultiSearcher to search into those indexes and when I
Hi,
I have few Indexes with the same structure.
I'm using MultiSearcher to search into those indexes and when I try to
sort the result by field the result is sort by field and by index (we
have all results from index1 and then index2,...) but I would like to
have the result sorted on the all result
OK, I've opened https://issues.apache.org/jira/browse/LUCENE-1048 to this issue.
Thanks Nikolay!
Mike
"Nikolay Diakov" <[EMAIL PROTECTED]> wrote:
> In Lucene 2.x, in method Lock#obtain(long lockWaitTimeout) I see the
> following line:
>
> int maxSleepCount = (int)(lockWaitTimeout / LOCK_POLL_
On Nov 8, 2007 12:44 AM, Chris Lu <[EMAIL PROTECTED]> wrote:
> In Term object, there are variables "field" and "text".
> My question is, why variable "text" can not be intern() ?
>
> Wouldn't it save some memory, especially in the FieldCache?
The FieldCache already stores only one string per term,
I suppose you have for about 5 minutes to display a single search ? :-)
Perhaps before pointing out your possible solutions, you might better
start describing your functional requirements, because your suggested
solution is headed for problems. So you need custom ordering, check out
lucene scoring
14 matches
Mail list logo