I used the wikimedia2m data set for the second set of tests (the first test
was on a tiny index - 10k docs) -- at least I think I did! I am kind of new
to the benchmarking game. I ran the becnhmarks with python
src/python/localrun.py -source wikimedium2m, and I can see that the index
dir is 861M.
Heavy is the head that wears the crown - congrats and thank you! And
here's to a peaceful transition of power in the new year :)
On Mon, Dec 31, 2018 at 1:39 PM Dawid Weiss wrote:
>
> Congratulations, Cassandra!
>
> On Mon, Dec 31, 2018 at 7:04 PM Gus Heck wrote:
> >
> > Congratulations :)
> >
>
This is a great idea. It would also be compelling to modify the term
frequency using this deboosting so that stacked indexed terms can be
weighted according to their closeness to the original term.
On Tue, Nov 20, 2018, 2:19 PM jim ferenczi Sorry for the late reply,
>
> > So perhaps one way forwa
Oh! got it - We run our tests and other release machinery etc against
a single JDK, and it is currently Java 8. I will precommit with Java 8
then. Presumably at some future date JDK11 becomes the system of
record? Historically how long have we waited after a new Java release
before shifting over?
O
I agree w/Robert let's not reinvent solutions that are solved elsewhere. In
an ideal world, wouldn't you want to be able to delegate tokenization of
latin script portions to StandardTokenizer? I know that's not possible
today, and I wouldn't derail the work here to try to make it happen since
it wo
In case it wasn't clear, I am +1 for Alan's plan. We can always restore
offset-alterations here if at some future date we figure out how to do it
correctly.
On Fri, Oct 26, 2018 at 6:08 AM Michael Sokolov wrote:
> The current situation is that it is impossible to apply offsets cor
The current situation is that it is impossible to apply offsets correctly
in a TokenFilter. It seems to work OK most of the time, but truly correct
behavior relies on prior components in the chain not having altered the
length of tokens, which some of them occasionally do. For complete
correctness
If maxMergeCount was 2, you could get into a situation with three large
merges I think; the largest would be paused, but the others could still
take > 10 mins to complete. Are you sure that your observation is at odds
with what the document says the scheduler is doing?
On Wed, Oct 10, 2018 at 2:28
My current usage of this filter requires it to be a filter, since I need to
precede it with other filters. I think the idea of not touching offsets
preserves more flexibility, and since the offsets are already unreliable,
we wouldn't be losing much.
On Sun, Sep 30, 2018, 11:32 AM Alan Woodward (JI
iven you that role, Michael, please see if you see
> the Resolve button now.
>
> Cassandra
>
> On Fri, Aug 31, 2018 at 11:09 AM Uwe Schindler wrote:
>
>> Hi,
>>
>> When back in office, I will check the project roles of Lucene and Sole
>> Jira projects.
&
> So, if you do not see it, the permissions may be in play. I will leave
> the issue as is, to let the discrepancy to be figured out.
>
> Regards,
> Alex.
>
> On 29 August 2018 at 15:56, Michael Sokolov wrote:
> > This old issue was still assigned to me:
> > https:/
This old issue was still assigned to me:
https://issues.apache.org/jira/browse/LUCENE-3318. I had worked on it seven
years ago, but it is no longer relevant today, and I'd like to close it,
but I don't see any UI affordance for doing that in JIRA. Am I missing
permissions? Is the issue in some weir
Michael Sokolov wrote:
> I am trying to run ant precommit (on master) and it fails for me with this
> message:
>
> -ecj-javadoc-lint-unsupported:
>
> BUILD FAILED
> /home/
> ANT.AMAZON.COM/sokolovm/workspace/lbench/lucene_baseline/lucene/common-build.xml:2076:
> Lintin
I am trying to run ant precommit (on master) and it fails for me with this
message:
-ecj-javadoc-lint-unsupported:
BUILD FAILED
/home/
ANT.AMAZON.COM/sokolovm/workspace/lbench/lucene_baseline/lucene/common-build.xml:2076:
Linting documentation with ECJ is not supported on this Java version
(unkno
ke into account the fact that the default
> codec changed. However, I did not add backward-codecs.jar to the classpath,
> you should rebuild the index that you use for benchmarking so that it uses
> the Lucene80 codec instead of Lucene70.
>
> Le ven. 24 août 2018 à 02:03, Michael Sokolov a
@ def run():
- idFieldPostingsFormat='Lucene50',
+ idFieldPostingsFormat='FST50',
On Thu, Aug 23, 2018 at 5:52 PM Michael Sokolov wrote:
> OK thanks. I guess this benchmark must be run on a large-enough
OK thanks. I guess this benchmark must be run on a large-enough index that
it doesn't fit entirely in RAM already anyway? When I ran it locally using
the vanilla benchmark instructions, I believe the generated index was quite
small (wikimedium10k). At any rate, I don't have any specific use case
y
Can I interest someone in reviewing my patch for
https://issues.apache.org/jira/browse/LUCENE-765? It's additional javadoc
for in the index package
I was rooting around for some low-impact helpful thing to do here, and
found this on a list of "newdev" issues. It's fairly high-level but should
be h
I happened to stumble across this chart
https://home.apache.org/~mikemccand/lucenebench/PKLookup.html showing a
pretty drastic drop in this benchmark on 5/13. I looked at the commits
between the previous run and this one and did some investigation, trying to
do some git bisect to find the problem u
Oh! Nice -- I'll have a look. I had started tinkering with my own, but it
would be nice if it already existed thanks!
On Thu, Aug 16, 2018 at 10:42 AM Tomoko Uchida (JIRA)
wrote:
>
> [
> https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:co
Did you mean q=oow in your example? As written, I don't see how there is a
problem.
On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini
wrote:
> Hi, still fighting with synonyms, I have another question.
> I'm not understanding the role, and the effect, of the
> "autoGeneratePhraseQueries" attribut
> In general I’d avoid index-time synonyms in lucene because synonyms can
create graphs (eg if a single term gets expanded to several terms), and we
can’t index graphs correctly.
I wonder what it would take to address this. I guess the blast radius of
adding a token "width" could be pretty large.
Can you run a mirror instance and swap traffic, performing reindexing on an
online system, and then bring it online when complete?
On Sun, Jul 8, 2018, 7:46 PM changchun huang (JIRA) wrote:
>
> [
> https://issues.apache.org/jira/browse/LUCENE-8389?page=com.atlassian.jira.plugin.system.issue
You should really try asking on an Atlassian support forum since Jira is
their project and they support it. This bug database is for tracking issues
about Lucene itself. Also please note that Lucene 3 is many years old now,
and no longer receiving bug fixes. The current version is 7, soon to be 8,
Would it make sense to change TimeExceededException so it extends
CollectionTerminatedException?
On Wed, May 16, 2018 at 4:29 PM, Tony Xu (JIRA) wrote:
> Tony Xu created LUCENE-8319:
> ---
>
> Summary: A Time-limiting collector that works with
> Collector
+1
On Tue, Apr 24, 2018 at 9:58 AM, Alan Woodward (JIRA)
wrote:
>
> [ https://issues.apache.org/jira/browse/LUCENE-8273?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16449897#comment-16449897 ]
>
> Alan Woodward commented on LUCENE-8273:
> -
yes, thanks!
On Fri, Apr 13, 2018 at 7:05 PM, Michael McCandless (JIRA)
wrote:
>
> [ https://issues.apache.org/jira/browse/LUCENE-8248?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16438060#comment-16438060 ]
>
> Michael McCandless commented on
Ah true that would be messy! I'll update the patch.
On Tue, Apr 10, 2018 at 7:26 PM, Michael McCandless (JIRA)
wrote:
>
> [ https://issues.apache.org/jira/browse/LUCENE-8248?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16433177#comment-16433177
Ok that was actually my first implementation. It was a lot messier. I'll
follow up with details when I get back to a keyboard
On Thu, Apr 5, 2018, 9:09 AM Adrien Grand (JIRA) wrote:
>
> [
> https://issues.apache.org/jira/browse/LUCENE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels
The javadocs for both WDF and WDGF include a pretty detailed discussion
about the proper use of the "combinations" parameter, but no such parameter
exists. I don't know the history here, but it sounds as if the docs might
be referring to some previous incarnation of this filter, perhaps in the
cont
Perhaps Robert is a fan of Object.clone()
On Feb 28, 2018 9:59 AM, "Bruno Roustant (JIRA)" wrote:
>
> [ https://issues.apache.org/jira/browse/LUCENE-8159?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16380407#comment-16380407 ]
>
> Bruno Roustan
On 12/1/2012 7:59 AM, Per Steffensen wrote:
It is all about information - git has it, SVN doesnt. And my logical
sence tells me that is has to be git and not github!
:-) Now tell me that I am stupid :-)
This kind of information (merge tracking) has been in svn since 1.5 (see
http://subversio
I work with a lot of XML data sources and have needed to implement an
analysis chain for Solr/Lucene that accepts XML. In the course of doing
that, I found I needed something very much like HTMLCharFilter, but that
does standard XML parsing (understands XML entities defined in an
internal or ex
I'm not sure you will find anyone wanting to put in this effort now, but
another suggestion for a general approach might be:
1 very basic static analysis to catch what you can - this should be a
pretty minimal effort only given what can reasonably be achieved
2 throw runtime errors as Hoss sa
My first post too - but if I can offer a suggestion - there are more
modern XML validation technologies available than DTD. I would heartily
recommend RelaxNG/Compact notation (see
http://relaxng.org/compact-tutorial-20030326.html) - you can generate
Relax from a DTD, but it is more expressive
501 - 535 of 535 matches
Mail list logo