You're right, Lucene changed wrt the 0x character: 2.3 now uses
this character internally as an end of term marker when storing term
text.
This was done as part of LUCENE-843 (speeding up indexing).
Technically that character is an invalid UTF16 character (for
interchange), but it looks like
Unfortunately, we lost the StandardTokenizerConstants interface as
part of this:
https://issues.apache.org/jira/browse/LUCENE-966
which was a speedup to StandardTokenizer by switching to JFlex instead
of JavaCC.
But, the constants that are used by StandardTokenizer are still
available as
Thanks for the explanation Mike. It's not a big issue, it's just a test case
where I was needed to ensure ordering for the test, so I'll just use a valid
high utf-16 character. It just seemed odd that the field was showing strangely
in Luke. Your explanation gives the reason, thanks.
Hi,
this is most likely a question for Mike. I'm trying to figure out what
changes we need to make in order to support flexible indexing and
LUCENE-1231. Currently I'm looking into the DocumentsWriter.
If we want to support different posting lists, then we probably want to
change the
Impossible to use custom norm encoding/decoding
---
Key: LUCENE-1261
URL: https://issues.apache.org/jira/browse/LUCENE-1261
Project: Lucene - Java
Issue Type: Bug
Components:
[
https://issues.apache.org/jira/browse/LUCENE-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12586880#action_12586880
]
Karl Wettin commented on LUCENE-1261:
-
Hi John,
see LUCENE-1260
karl
Hi Michael,
I've actually been working on factoring DocumentsWriter, as a first
step towards flexible indexing.
I agree we would have an abstract base Posting class that just tracks
the term text.
Then, DocumentsWriter manages inverting each field, maintaining the
per-field hash of term Text -
setting a flag in a filter is easy :
8---
package org.apache.lucene.analysis.shingle;
import java.io.IOException;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
/**
* @author Mathieu
[
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12586954#action_12586954
]
Hoss Man commented on LUCENE-1260:
--
bq. I haven't thought too much about it yet, but it
But, the constants that are used by StandardTokenizer are still
available as static ints in the StandardTokenizer class (ie, ALPHANUM,
APOSTROPHE, etc.). Does that work?
Problem as mentioned below is that the StandardTokenizerImpl.java is package
private and even though the ints and string
But, StandardTokenizer is public? It exports those constants for you?
Mike
Antony Bowesman wrote:
But, the constants that are used by StandardTokenizer are still
available as static ints in the StandardTokenizer class (ie,
ALPHANUM,
APOSTROPHE, etc.). Does that work?
Problem as
But, StandardTokenizer is public? It exports those constants for you?
Really? Sorry, but I can't find them - in 2.3.1 sources, there are no
references to those statics. Javadocs have no reference to them in
StandardTokenizer
Hi all,
I am new to lucene and am using it for text search in my web application,
and for that i need to index records in database.
We are using jdbc directory to store the indexes. Now the problem is when is
start the process of indexing the records for the first time it is taking
huge amount
That is opposite of my testing:...
The 'foreach' is consistently faster. The time difference is
independent of the size of the array. What I know about JVM
implementations, the foreach version SHOULD always be faster -
because the no bounds checking needs to be done on the element
On Tue, Apr 8, 2008 at 7:48 PM, robert engels [EMAIL PROTECTED] wrote:
That is opposite of my testing:...
The 'foreach' is consistently faster.
It's consistently slower for me (I tested java5 and java6 both with
-server on a P4).
I'm a big fan of testing different methods in different test
NullPointerException from FieldsReader after problem reading the index
--
Key: LUCENE-1262
URL: https://issues.apache.org/jira/browse/LUCENE-1262
Project: Lucene - Java
foreach vs explicit loop counter is pretty academic for Lucene anyway I think.
I can't think of any inner loops where it would really matter.
-Yonik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
: But, StandardTokenizer is public? It exports those constants for you?
:
: Really? Sorry, but I can't find them - in 2.3.1 sources, there are no
: references to those statics. Javadocs have no reference to them in
: StandardTokenizer
I think Michael is forgetting that he re-added those
There is a FAQ covering this question...
http://wiki.apache.org/lucene-java/LuceneFAQ#head-86d479476c63a2579e867b75d4faa9664ef6cf4d
start by getting your code to compile against 1.9.1 without any
deprecation warnings. The deprecation messages in the 1.9.1 javadocs will
tell you which new
19 matches
Mail list logo