ok, well at first i thought you must be playing a joke on me or something...
Maybe you want to create a lucene analyzer that mimic's solr defaults.
Search the mail archives for this recent thread, and KK posted his code:
Re: How to support stemming and case folding for english content mixed
with
That's the thing there is no actual requirement.
I've been presented with all the languages that company theoretically provides.
My guess is that what I'm going to end up with is all western languages, good
share of Arabic family, complete set of Eastern and Eastern European ones and
of course CJ
Really, you have a requirement that the system should search written Cornish?
I think you might have larger problems!
On Mon, Jun 15, 2009 at 9:18 PM, OBender Hotmail wrote:
> Here is the list of possible languages. Don't laugh :) I know those are
> almost all world languages but it is a true re
Here is the list of possible languages. Don't laugh :) I know those are almost
all world languages but it is a true requirement. Well, actual number will be
closer to 70 not 100 but still I don't really know which ones from the list
below will end up in the DB.
---
Afrikaans Albanian Arabi
its not too bad, here would be a simple one that only breaks words on
whitespace and lowercases:
public class Example extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream ts = new WhitespaceTokenizer(reader);
ts = new LowerCaseFilter(ts);
retur
I've looked over SolR quickly, it is a bit too heavy for my project.
So what is required (at a minimum) to build an analyzer, sandbox has a few of
them varying in complexity.
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, June 15, 2009 4:51 PM
To: java-user@
Well just reply back if SolR is inappropriate for your needs.
In that case, you will need to build a custom analyzer (its not too
bad), so that you can use compass.
On Mon, Jun 15, 2009 at 4:19 PM, OBender Hotmail wrote:
> Hi,
>
> My goal is to find a framework that encapsulates as much low level
Hi,
My goal is to find a framework that encapsulates as much low level
indexing/search technology as possible and have it integrate nicely with Spring.
It looked like Compass was/is a good encapsulation of the functionality. I'll
take a look at SolR though, thanks for the pointer.
-Origina
I am having the following method to highlight the terms.
public static String getHighlighter(String colName, Highlighter
highlighter,IndexSearcher searcher,int id, Analyzer analyzer) throws
IOException {
String highlightTerm;
TokenStream tokenStream;
Hi,
(Since this is an issue you brought up on the Compass forums)
I wonder what stage you are in the development process?
Have you considered SolR, or does compass provide some other
functionality that you need?
The reason I say this, is because the easiest solution might be to use
a nightly Sol
Hi All!
I'm new to Lucene so forgive me if this question was asked before.
I have a database with records in the same table in many different languages
(up to 70) it includes all W-European, Arabic, Eastern, CJK, Cyrillic, etc.
you name it.
I've looked at what people say about Lucene and it l
FuzzyQuery performance is related to number of unique terms in the index not
the number of documents e.g. a single "telephone directory" document could
contain millions of terms.
Each term considered is compared using an "edit distance" algo which is CPU
intensive.
The FuzzyQuery prefix length
Erick,
this a web application running 24 hours a day thus caching cannot be the
reason. I get the same result after I re-start the same search.
Zsolt
Erick Erickson wrote:
Well, if you're seeing it, it's possible
But the first question is always "what were you measuring?" Be aware
that
Well, if you're seeing it, it's possible
But the first question is always "what were you measuring?" Be aware
that when you open a searcher, the first few queries can fill caches, etc
and
may take an anomalously long time, especially if you're sorting. So could
you give more details of your t
Hi,
on 99470 documents (I mean Lucene documents) a FuzzyQuery needs approx
30 seconds but PrefixQuery less than one.
All Lucene files need 65MB together.
I'm bit surprised of that. Is that possible?
Zsolt
Zsolt Koppany
Phone: +49-711-67400-679
--
Thanks Joel, good point.
We'll definitely be there by 7pm but may be a little earlier if the
will to continue working is elusive.
2009/6/14 Joel Halbert :
> Hi Rich - from what time?
>
>
> -Original Message-
> From: Richard Marr
> Reply-To: java-user@lucene.apache.org
> To: java-user@l
On Mon, Jun 15, 2009 at 1:04 AM, Amin Mohammed-Coleman wrote:
> Hi
>
> I'm looking at Hadoop and Katta and I was wondering if some may be able
> clarify the following:
>
> 1) Is Katta replacing the Hadoop Lucene contribution
You mean the index package in Hadoop's contrib folder?
So far what I ha
Hi Rich - from what time?
-Original Message-
From: Richard Marr
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: London Open Source Search meetup - Mon 15th June
Date: Fri, 12 Jun 2009 12:54:30 +0100
Hi all,
Just a quick reminder that this is happening
Hi
I'm looking at Hadoop and Katta and I was wondering if some may be able
clarify the following:
1) Is Katta replacing the Hadoop Lucene contribution
2) Are people still using Hadoop Lucene to perform indexing
Cheers
Amin
On Sat, Jun 13, 2009 at 7:46 AM, Amin Mohammed-Coleman wrote:
> Hi
> T
19 matches
Mail list logo