Checkstyle has a onetoplevelclass rule that would enforce this
On October 17, 2017 3:45:01 AM EDT, Uwe Schindler wrote:
>Hi,
>
>this has nothing to do with the Java version. I generally ignore this
>Eclipse-failure as I only develop in Eclipse, but run from command
>line. The
Oh thanks Alan that's a good suggestion, but I already wrote max and sum double
values sources since it was easy enough. If you think that's a good approach I
could post a patch.
On October 13, 2017 3:57:30 AM EDT, Alan Woodward wrote:
>Hi,
>
>Yes, moving stuff over to
These are only used in classical Greek I think, explaining probably why they
are not covered by the simpler filter.
On September 27, 2017 9:48:37 AM EDT, Ahmet Arslan
wrote:
>I may be wrong about ASCIIFoldingFilter. Please go with the
>ICUFoldingFilter.
>Ahmet
>On
There was some interesting work done on optimizing queries including
very common words (stop words) that I think overlaps with your problem.
See this blog post
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2
from the Hathi Trust.
The upshot in a
Maybe high frequency terms that are not evenly distributed throughout
the corpus would be a better definition. Discriminative terms. I'm
sure there is something in the machine learning literature about
unsupervised clustering that would help here. But I don't know what it
is :)
-Mike
On
with an extra 1st page field for the too-huge
documents.
-Paul
-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: Saturday, June 23, 2012 7:16 PM
To: java-user@lucene.apache.org
Cc: Jack Krupansky
Subject: Re: Fast way to get the start of document
I got the sense
whether to highlight.
-Mike Sokolov
On 6/23/2012 6:17 PM, Jack Krupansky wrote:
Simply have two fields, full_body and limited_body. The former
would index but not store the full document text from Tika (the
content metadata.) The latter would store but not necessarily index
the first 10K or so
://wiki.apache.org/solr/FunctionQuery#tf
Lucene does have FunctionQuery, ValueSource, and
TermFreqValueSource.
See:
http://lucene.apache.org/solr/api/org/apache/solr/search/function/FunctionQuery.html
-- Jack Krupansky
-Original Message- From: Mike Sokolov
Sent: Saturday, June 16, 2012 2
I imagine this is a question that comes up from time to time, but I
haven't been able to find a definitive answer anywhere, so...
I'm wondering whether there is some type of Lucene query that filters by
term frequency. For example, suppose I want to find all documents that
have exactly 2
It sounds me as if there could be a market for a new kind of query that
would implement:
A w/5 (B and C)
in the way that people understand it to mean - the same A near both B
and C, not just any A.
Maybe it's too hard to implement using rewrites into existing SpanQueries?
In term of the
does anybody know how to express a MatchAllDocsQuery in surround query
parser language? I've tried
*
and()
but those don't parse. I looked at the grammar and I don't think there
is a way. Please let us all know if you know otherwise!
Thanks
-Mike Sokolov
...@ifactory.com wrote:
does anybody know how to express a MatchAllDocsQuery in surround query
parser language? I've tried
*
and()
but those don't parse. I looked at the grammar and I don't think there is a
way. Please let us all know if you know otherwise!
Thanks
-Mike Sokolov
in surround query
parser language? I've tried
*
and()
but those don't parse. I looked at the grammar and I don't think there is a
way. Please let us all know if you know otherwise!
Thanks
-Mike Sokolov
-
To unsubscribe, e-mail: java
know if it would be worth the trouble.
It turns out in my very specific case I have a term that appears in
every document in a particular field, so I am just using a search for
that at the moment.
-Mike
On 5/6/2012 8:04 PM, Mike Sokolov wrote:
I think what I have in mind would be purely
I think you have hit on all the best solutions.
The Jira issues you mentioned do indeed hold out some promising
solutions here, but they are a ways away, requiring some significant
re-plumbing and I'm not sure there is a lot of attention being paid to
that at the moment. You should vote for
My personal view, as a bystander with no more information than you, is
that one has to assume there will be further index format changes before
a 4.0 release. This is based on the number of changes in the last 9
months, and the amount of activity on the dev list.
For us the implication is we
Can you wrap a SpanNearQuery around an DisjunctionSumQuery with
minNrShouldMatch=8?
-Mike
On 07/13/2011 08:53 AM, Jeroen Lauwers wrote:
Hi,
I was wondering if anyone could help me on this:
I want to search for:
1. a set of words (eg. 10)
2. only a couple of words may come in
me in the right direction?
Jeroen
-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: woensdag 13 juli 2011 15:23
To: java-user@lucene.apache.org
Cc: Jeroen Lauwers
Subject: Re: Advanced NearSpanQuery
Can you wrap a SpanNearQuery around an DisjunctionSumQuery
Our apps use highlighting, and I expect that highlighting is an
expensive operation since it requires processing the text of the
documents, but I ran a test and was surprised just how expensive it is.
I made a test index with three fields: path, modified, and contents. I
made the index using
Down to basics, Lucene searches work by locating terms and resolving
documents from them. For standard term queries, a term is located by a
process akin to binary search. That means that it uses log(n) seeks to
get the term. Let's say you have 10M terms in your corpus. If you stored
that in a
It's an idea - sorry I don't have an implementation I can share easily;
it's embedded in our application code and not easy to refactor. I'm not
sure where this would fit in the solr architecture; maybe some subclass
of SearchHandler? I guess the query rewriter would need to be aware of
which
Are the tokens unique within a document? If so, why not store a document
for every doc/token pair with fields:
id (doc#/token#)
doc-id (doc#)
token
weight1
weight2
frequency
Then search for token, sort by weight1, weight2 or frequency.
If the token matches are unique within a document you
that contain foo, but I want them
sorted by frequency.
Then, I would have doc1, doc2.
Now, I want to search for all the documents that contain foon, but I want them
sorted by weight1.
Then, I would have doc2, doc1
Does that clarify?
On May 5, 2011, at 3:01 PM, Mike Sokolov wrote
Background: I've been trying to enable hit highlighting of XML documents
in such a way that the highlighting preserves the well-formedness of the
XML.
I thought I could get this to work by implementing a CharFilter that
extracts text from XML (somewhat like HTMLStripCharFilter, except I am
24 matches
Mail list logo