Hello,
This is my first question to lucene mailing list, sorry if the question
sounds funny.
I have been experimenting to store lucene index files on cassandra,
unfortunately the exception got overwhelmed. Below are the stacktrace.
org.apache.lucene.index.CorruptIndexException: codec mismatch:
This means Lucene was attempting to open _0.fnm but somehow got the
contents of _0.cfs instead; seems likely that it's a bug in the
Cassanda Directory implementation? Somehow it's opening the wrong
file name?
Mike McCandless
http://blog.mikemccandless.com
On Fri, Feb 14, 2014 at 3:13 AM,
Hello,
I am designing a system with documents having one field containing
values such as Ae1 Br2 Cy8 ..., i.e. a sequence of items made of
letters and numbers (max=7 per item), all separated by a space,
possibly 200 items per field, with no limit upon the number of
documents (although I would not
This is how Collector works: it is called for every document matching
the query, and then its job is to choose which of those hits to keep.
This is because in general the hits to keep can come at any time, not
just the first N hits you see; e.g. the best scoring hit may be the
very last one.
But
On Fri, Feb 14, 2014 at 6:17 AM, Yann-Erwan Perio ye.pe...@gmail.com wrote:
Hello,
I am designing a system with documents having one field containing
values such as Ae1 Br2 Cy8 ..., i.e. a sequence of items made of
letters and numbers (max=7 per item), all separated by a space,
possibly 200
On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless
luc...@mikemccandless.com wrote:
This is similar to PathHierarchyTokenizer, I think.
Ah, yes, very much. I'll check it out and see if I can make something
of it. I am not sure to what extent it'll be reusable though, as my
tokenizer also sets
On Fri, Feb 14, 2014 at 1:11 PM, Yann-Erwan Perio ye.pe...@gmail.com wrote:
On Fri, Feb 14, 2014 at 12:33 PM, Michael McCandless
luc...@mikemccandless.com wrote:
Hi again,
That should not be the case: it should match all terms with that
prefix regardless of the term's length. Try to boil it
On Fri, Feb 14, 2014 at 8:21 AM, Yann-Erwan Perio ye.pe...@gmail.com wrote:
I have written a test which demonstrates that the mistake is indeed on
my side. It's probably due to inconsistent rules for
indexing/searching content having special characters (namely the
plus sign).
OK, thanks for
I am not interested in the scores at all. My requirement is simple, I only
need the first 100 hits or the numHits I specify ( irrespective of there
scores). The collector should stop after collecting the numHits specified.
Is there a way to tell in the collector to stop after collecting the
Hi There,
Is there a way to do reverse matching by indexing the queries in an index and
passing a document to see how many queries matched that? I know that I can have
the queries in memory and have the document parsed in a memory index and then
loop through trying to match each query. The
I'm having a problem with Lucene 4.5.1. Whenever I attempt to index a file
2GB in size, it dies with the following exception:
java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset,
startOffset=-2147483648,endOffset=-2147483647
Essentially,
Hi guys, this is my first time posting on the Lucene list, so hello everyone.
I really like the way that the StandardTokenizer works, however I'd like for it
to not split tokens on / (forward slash). I've been looking at
http://unicode.org/reports/tr29/#Default_Word_Boundaries to try to
Welcome Diego,
I think you’re right about MidLetter - adding a char to it should disable
splitting on that char, as long as there is a letter on one side or the other.
(If you’d like that behavior to be extended to numeric digits, you should use
MidNumLet instead.)
I tested this by adding
I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At any rate, I don't have control over the size of the
documents that go into my database. Sometimes my customer's log files end up really big. I'm willing to have huge indexes for these
things.
Wouldn't just changing from
You should consider making each _line_ of the log file a (Lucene)
document (assuming it is a log-per-line log file)
-Glen
On Fri, Feb 14, 2014 at 4:12 PM, John Cecere john.cec...@oracle.com wrote:
I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At
any rate, I don't have
As docIDs are ints too, it's most likely he'll hit the limit of 2B documents per index though withthat approach though :)I do agree that indexing huge documents doesn't seem to have a lot of value, even when youknow a doc is a hit for a certain query, how are you going to display the results to
Hello,
I have recently been given a requirement to improve document highlights within
our system. Unfortunately, the current functionality gives more of a best-guess
on what terms to highlight vs the actual terms to highlight that actually did
perform the match. A couple examples of issues
Hello,
I try to use lucene-icu li in solr-4.6.1. I need to change a char mapping in
lucene-icu. I have made changes
to
lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
and built jar file using ant , but it did not help.
I took a look to lucene/analysis/icu/build.xml and see these
Hi Siraj,
MemoryIndex is used for such use case. Here is a couple of pointers:
http://www.slideshare.net/jdhok/diy-percolator
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
On Friday, February 14, 2014 8:21 PM, Siraj Haider si...@jobdiva.com
Do you get the exception if you run ant before changing the data files?
Header authentication failed, please check if you have a valid ICU data
file
Check with the ICU project as to the proper format for THEIR files. I mean,
this doesn't sound like a Lucene issue.
Maybe it could be as
Hi,
Here are two more relevant links:
https://github.com/flaxsearch/luwak
http://www.lucenerevolution.org/2013/Turning-Search-Upside-Down-Using-Lucene-for-Very-Fast-Stored-Queries
Ahmet
On Saturday, February 15, 2014 3:01 AM, Ahmet Arslan iori...@yahoo.com wrote:
Hi Siraj,
MemoryIndex is
Hi Jack,
I do not get exception before changing data files. And I do not get exception
after changing data files and creating lucene-icu...jar by ant.
But changing data files and running ant does not change the output.
So I decided to manually create .nrm file by using steps outlined in the
22 matches
Mail list logo