[
https://issues.apache.org/jira/browse/SOLR-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263599#comment-14263599
]
wolfgang hoschek commented on SOLR-6907:
+1 Looks reasonable to me.
URLEncode
[
https://issues.apache.org/jira/browse/SOLR-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224815#comment-14224815
]
wolfgang hoschek commented on SOLR-4509:
Would be good to remove that stale check
[
https://issues.apache.org/jira/browse/SOLR-6212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047223#comment-14047223
]
wolfgang hoschek commented on SOLR-6212:
This is already fixed in the latest stable
[
https://issues.apache.org/jira/browse/SOLR-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047391#comment-14047391
]
wolfgang hoschek commented on SOLR-5109:
FWIW, morphlines currently won't work
[
https://issues.apache.org/jira/browse/SOLR-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047394#comment-14047394
]
wolfgang hoschek edited comment on SOLR-5109 at 6/30/14 5:36 AM
[
https://issues.apache.org/jira/browse/SOLR-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047394#comment-14047394
]
wolfgang hoschek commented on SOLR-5109:
Another potential issue is that hadoop
From our perspective we don’t really see use cases for DIH anymore.
Morphlines was developed primarily with Lucene in mind (even though it doesn’t
require Lucene).
Flume Morphline Solr Sink handles streaming ingestion into Solr in reliable,
scalable, flexible and loosely coupled ways, in
[
https://issues.apache.org/jira/browse/SOLR-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015266#comment-14015266
]
wolfgang hoschek commented on SOLR-6126:
[~dsmiley] It uses the --zk-host CLI
[
https://issues.apache.org/jira/browse/SOLR-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015092#comment-14015092
]
wolfgang hoschek commented on SOLR-6126:
The comment in the code is a bit outdated
[
https://issues.apache.org/jira/browse/SOLR-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932328#comment-13932328
]
wolfgang hoschek commented on SOLR-5848:
Going forward I'd recommend upgrading
[
https://issues.apache.org/jira/browse/SOLR-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932378#comment-13932378
]
wolfgang hoschek commented on SOLR-5848:
Sounds good. Thx!
Morphlines
wolfgang hoschek created SOLR-5786:
--
Summary: MapReduceIndexerTool --help text is missing large parts
of the help text
Key: SOLR-5786
URL: https://issues.apache.org/jira/browse/SOLR-5786
Project
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914549#comment-13914549
]
wolfgang hoschek commented on SOLR-5605:
Correspondingly, I filed https
[
https://issues.apache.org/jira/browse/SOLR-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wolfgang hoschek updated SOLR-5786:
---
Summary: MapReduceIndexerTool --help output is missing large parts of the
help text
[
https://issues.apache.org/jira/browse/SOLR-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wolfgang hoschek updated SOLR-5786:
---
Description:
As already mentioned repeatedly and at length, this is a regression introduced
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915037#comment-13915037
]
wolfgang hoschek commented on SOLR-5605:
bq. Are you not a committer? At Apache
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915037#comment-13915037
]
wolfgang hoschek edited comment on SOLR-5605 at 2/27/14 9:23 PM
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911744#comment-13911744
]
wolfgang hoschek commented on SOLR-5605:
I have looked, have you? I have fixed
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wolfgang hoschek reopened SOLR-5605:
Without this the --help text is screwed.
https://issues.apache.org/jira/secure/EditComment
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905806#comment-13905806
]
wolfgang hoschek commented on SOLR-5605:
Yes, as already mentioned, otherwise some
Welcome on board!
Wolfgang.
On Jan 26, 2014, at 4:32 PM, Erick Erickson wrote:
Good to have you aboard!
Erick
On Sat, Jan 25, 2014 at 10:52 PM, Mark Miller markrmil...@gmail.com wrote:
Welcome!
- Mark
http://about.me/markrmiller
On Jan 25, 2014, at 4:40 PM, Michael McCandless
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862272#comment-13862272
]
wolfgang hoschek commented on SOLR-5605:
Thanks for getting to the bottom
[
https://issues.apache.org/jira/browse/SOLR-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862272#comment-13862272
]
wolfgang hoschek edited comment on SOLR-5605 at 1/4/14 11:42 AM
[
https://issues.apache.org/jira/browse/SOLR-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862273#comment-13862273
]
wolfgang hoschek commented on SOLR-5584:
As mentioned above, morphlines
+1
On Jan 2, 2014, at 10:53 PM, Simon Willnauer wrote:
+1
On Thu, Jan 2, 2014 at 9:51 PM, Mark Miller markrmil...@gmail.com wrote:
bzr is dying; Emacs needs to move
http://lists.gnu.org/archive/html/emacs-devel/2014-01/msg5.html
Interesting thread.
For similar reasons, I
[
https://issues.apache.org/jira/browse/SOLR-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858699#comment-13858699
]
wolfgang hoschek commented on SOLR-5584:
What exactly is failing for you
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856657#comment-13856657
]
wolfgang hoschek commented on SOLR-1301:
Also see https://issues.cloudera.org
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848097#comment-13848097
]
wolfgang hoschek edited comment on SOLR-1301 at 12/16/13 2:27 AM
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848775#comment-13848775
]
wolfgang hoschek commented on SOLR-1301:
bq. it would be convenient if we could
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848097#comment-13848097
]
wolfgang hoschek commented on SOLR-1301:
Might be best to write a program
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443
]
wolfgang hoschek commented on SOLR-1301:
I'm not aware of anything needing jersey
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843443#comment-13843443
]
wolfgang hoschek edited comment on SOLR-1301 at 12/9/13 7:30 PM
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843523#comment-13843523
]
wolfgang hoschek commented on SOLR-1301:
Apologies for the confusion. We
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842034#comment-13842034
]
wolfgang hoschek commented on SOLR-1301:
There are also some important fixes
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13842034#comment-13842034
]
wolfgang hoschek edited comment on SOLR-1301 at 12/7/13 2:57 AM
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839308#comment-13839308
]
wolfgang hoschek commented on SOLR-1301:
There are also some fixes downstream
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839311#comment-13839311
]
wolfgang hoschek commented on SOLR-1301:
Minor nit: could remove
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839556#comment-13839556
]
wolfgang hoschek commented on SOLR-1301:
FWIW, a current printout of --help showing
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839556#comment-13839556
]
wolfgang hoschek edited comment on SOLR-1301 at 12/5/13 12:55 AM
On Dec 3, 2013, at 12:11 AM, Uwe Schindler wrote:
Looks like Java's service loader lookup impl has become more strict in Java8.
This issue on Java 8 is kind of unfortunate because morphlines and solr-mr
doesn't actually use JAXP at all.
For the time being might be best to disable testing
FYI, I filed this saxon ticket: https://saxonica.plan.io/issues/1944
On Dec 3, 2013, at 12:52 AM, Wolfgang Hoschek wrote:
On Dec 3, 2013, at 12:11 AM, Uwe Schindler wrote:
Looks like Java's service loader lookup impl has become more strict in
Java8.
This issue on Java 8 is kind
opinion on the subject in that
stack overflow topic... :)
Dawid
On Tue, Dec 3, 2013 at 9:52 AM, Wolfgang Hoschek whosc...@cloudera.com
wrote:
On Dec 3, 2013, at 12:11 AM, Uwe Schindler wrote:
Looks like Java's service loader lookup impl has become more strict in
Java8.
This issue
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837976#comment-13837976
]
wolfgang hoschek commented on SOLR-1301:
bq. module/dir names
I propose morphlines
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837979#comment-13837979
]
wolfgang hoschek commented on SOLR-1301:
+1 to map-reduce-indexer module name/dir
Subject: Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b117) -
Build # 8549 - Still Failing!
Ha! Thanks for filing the issue, Wolfgang.
D.
On Tue, Dec 3, 2013 at 12:01 PM, Wolfgang Hoschek
whosc...@cloudera.com wrote:
Actually, Mike's opinion has changed because now Saxon
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837976#comment-13837976
]
wolfgang hoschek edited comment on SOLR-1301 at 12/3/13 6:40 PM
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838054#comment-13838054
]
wolfgang hoschek commented on SOLR-1301:
bq. The problem with these two names
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838064#comment-13838064
]
wolfgang hoschek commented on SOLR-1301:
+1 on Steve's suggestion as well. Thanks
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838305#comment-13838305
]
wolfgang hoschek edited comment on SOLR-1301 at 12/3/13 11:11 PM
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838305#comment-13838305
]
wolfgang hoschek commented on SOLR-1301:
Upon a bit more reflection might be better
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837068#comment-13837068
]
wolfgang hoschek commented on SOLR-1301:
There is also a known issue
Looks like Java's service loader lookup impl has become more strict in Java8.
This issue on Java 8 is kind of unfortunate because morphlines and solr-mr
doesn't actually use JAXP at all.
For the time being might be best to disable testing on Java8 for this contrib,
in order to get a stable
Welcome Joel!
Wolfgang.
On Oct 3, 2013, at 9:56 AM, Erick Erickson wrote:
Welcome Joel!
On Thu, Oct 3, 2013 at 9:33 AM, Martijn v Groningen
martijn.v.gronin...@gmail.com wrote:
Welcome Joel!
On 3 October 2013 15:45, Shawn Heisey s...@elyograg.org wrote:
On 10/2/2013 11:24 PM,
Thanks to all! Looking forward to more contributions.
Wolfgang.
On Sep 26, 2013, at 3:21 AM, Uwe Schindler wrote:
Hi,
I'm pleased to announce that after a long abstinence, Wolfgang Hoschek
rejoined the Lucene/Solr committer team. He is working now at Cloudera and
plans to help
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768629#comment-13768629
]
wolfgang hoschek commented on SOLR-1301:
cdk-morphlines-solr-core and cdk
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768662#comment-13768662
]
wolfgang hoschek commented on SOLR-1301:
Seems like the patch still misses tika-xmp
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763618#comment-13763618
]
wolfgang hoschek commented on SOLR-1301:
FYI, One things that's definitely off
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763636#comment-13763636
]
wolfgang hoschek commented on SOLR-1301:
By the way, docs and the downstream code
[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763644#comment-13763644
]
wolfgang hoschek commented on SOLR-1301:
This new solr-mr contrib uses morphlines
[
https://issues.apache.org/jira/browse/LUCENE-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547367#comment-13547367
]
wolfgang hoschek commented on LUCENE-4661:
--
Might be good to experiment
,
the
performance gain of using equals() on interned strings is no match
for the
performance loss of interning the field name of each field.
Wolfgang Hoschek-2 wrote:
I noticed that, too, but in my case the difference was often much
more extreme: it was one of the primary bottlenecks
I need to read the TokenStream at least twice
I used the horribly hackey but quick-for-me method of adding a
method to MemoryIndex that accepts a List of Tokens. Any ideas?
I'm not sure about modifying MemoryIndex. It should be easy enough
to create a subclass of TokenStream -
[
https://issues.apache.org/jira/browse/LUCENE-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462579
]
wolfgang hoschek commented on LUCENE-129:
-
Just to clarify: The empty finalize() method body in MemoryIndex
[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451817 ]
wolfgang hoschek commented on LUCENE-550:
-
All Lucene unit tests have been adapted to work with my alternate index.
Everything but proximity queries
[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451768 ]
wolfgang hoschek commented on LUCENE-550:
-
Ok. That means a basic test passes. For some more exhaustive tests, run all the
queries in
src/test/org
[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451731 ]
wolfgang hoschek commented on LUCENE-550:
-
Other question: when running the driver in test mode (checking for equality of
query results against
[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451730 ]
wolfgang hoschek commented on LUCENE-550:
-
What's the benchmark configuration? For example, is throughput bounded by
indexing or querying? Measuring N
MemoryIndex was designed to maximize performance for a specific use
case: pure in-memory datastructure, at most one document per
MemoryIndex instance, any number of fields, high frequency reads,
high frequency index writes, no thread-safety required, optional
support for storing offsets.
Initially it might, but probably eventually not. I was
thinking Lucene formats might also be bit more compact
than vanilla hash maps, but I guess that depends on
many factors. But I will probably want to play with
actual queries later on, based on frequencies.
OK.
In the latter case, are
On Dec 17, 2005, at 2:36 PM, Paul Elschot wrote:
Gentlemen,
While maintaining my bookmarks I ran into this:
Case Study: Enabling Low-Cost XML-Aware Searching
Capable of Complex Querying:
http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/
03-02-08/03-02-08.html
Some loose thoughts:
done with
extension to this code.
Regards,
Paul Elschot
On Friday 16 December 2005 03:45, Wolfgang Hoschek wrote:
I think implementing an XQuery Full-Text engine is far beyond the
scope of Lucene.
Implementing a building block for the fulltext aspect of it would be
more manageable
I think implementing an XQuery Full-Text engine is far beyond the
scope of Lucene.
Implementing a building block for the fulltext aspect of it would be
more manageable. Unfortunately The W3C fulltext drafts
indiscriminately mix and mingle two completely different languages
into a single
in Java 6, but that doesn't help too much given the
Java 1.4 req.
-Yonik
On 12/15/05, Wolfgang Hoschek [EMAIL PROTECTED] wrote:
STAX would probably make coding easier, but unfortunately complicates
the packaging side: one must ship at least two additional external
jars (stax interfaces and impl
That's basically what I'm implementing with Nux, except that the
syntax and calling conventions are a bit different, and that Lucene
analyzers can optionally be specified, which makes it a lot more
powerful (but also a bit more complicated).
Wolfgang.
On Dec 6, 2005, at 10:48 AM, Incze
Hopefully that makes sense to someone besides just me. It's
certainly a
lot more complexity then a simple one to one mapping, but it seems
to me
like the flexability is worth spending the extra time to design/
build it.
Makes perfect sense to me, and it doesn't seem any more complex
Hopefully that makes sense to someone besides just me. It's
certainly a
lot more complexity then a simple one to one mapping, but it seems
to me
like the flexability is worth spending the extra time to design/
build it.
Makes perfect sense to me, and it doesn't seem any more complex
Yonik, I haven't been terribly active lately, but I've been voted in
as committer as well... :-)
http://marc.theaimsgroup.com/?l=lucene-devw=2r=1s=hoschek
+committerq=b
Cheers,
Wolfgang.
On Dec 2, 2005, at 2:53 PM, Yonik Seeley wrote:
~yonik/yourkit/
On Aug 30, 2005, at 12:47 PM, Doug Cutting wrote:
Yonik Seeley wrote:
I've been looking around... do you have a pointer to the source
where just the suffix is converted from UTF-8?
I understand the index format, but I'm not sure I understand the
problem that would be posed by the prefix
The Nux-1.3 release has been uploaded to
http://dsd.lbl.gov/nux/
Nux is an open-source Java toolkit making efficient and powerful XML
processing easy.
Changelog:
•Upgraded to saxonb-8.5 (saxon-8.4 and 8.3 should continue
to work as well).
•Upgraded to xom-1.1-rc1
On Jul 19, 2005, at 12:58 PM, Daniel Naber wrote:
Hi,
currently Analyzer is an abstract class. Shouldn't we make it an
Interface?
Currently that's not possible, but it will be as soon as the
deprecated
method is removed (i.e. after Lucene 1.9).
Regards
Daniel
Daniel, what's the use
poor java startup time
For the one's really keen on reducing startup time the Jolt Java VM
daemon may perhaps be of some interest:
http://www.dystance.net/software/jolt/index.html
I played with it a year ago when I was curious to see what could be
done about startup time in the context of
As an aside, in my performance testing of Lucene using JProfiler,
it seems
to me that the only way to improve Lucene's performance greatly can
come
from 2 areas
1. optimizing the JVM array/looping/JIT constructs/capabilities to
avoid
bounds checking/improve performance
2. improve function
Cool stuff. Once this has stabilized and settled down I might start
exposing the surround language from XQuery/XPath as an experimental
match facility.
Wolfgang.
On May 28, 2005, at 10:07 AM, Paul Elschot wrote:
On Saturday 28 May 2005 17:06, Erik Hatcher wrote:
On May 28, 2005,
The nux-1.2 release has been uploaded to
http://dsd.lbl.gov/nux/
Nux is an open-source Java XML toolset geared towards embedded use in
high-throughput XML messaging middleware such as large-scale Peer-to-
Peer infrastructures, message queues, publish-subscribe and
matchmaking systems
For the MemoryIndex, I'm seeing large performance overheads due to
repetitive temporary string interning of o.a.l.index.Term.
For example, consider a FuzzyTermQuery or similar, scanning all terms
via TermEnum in the index: 40% of the time is spent in String.intern
() of new Term().
Right. One doesn't need to run those benchmarks to immediately see
that most time is spent in VM startup, class loading, hotspot
compilation rather than anything Lucene related. Even a simple
System.out.println(hello) typically takes some 0.3 secs on a fast
box and JVM.
Wolfgang.
On May
Here's a performance patch for MemoryIndex.MemoryIndexReader that
caches the norms for a given field, avoiding repeated recomputation of
the norms. Recall that, depending on the query, norms() can be called
over and over again with mostly the same parameters. Thus, replace
public byte[]
Here's a convenience add-on method to MemoryIndex. If it turns out that
this could be of wider use, it could be moved into the core analysis
package. For the moment the MemoryIndex might be a better home.
Opinions, anyone?
Wolfgang.
/**
* Convenience method; Creates and returns a token
On May 3, 2005, at 5:26 PM, Erik Hatcher wrote:
Wolfgang,
I've now added this.
Thanks :-)
I'm not seeing how this could be generally useful. I'm curious how
you are using it and why it is better suited for what you're doing
than any other analyzer.
keyword tokenizer is a bit overloaded
I'm looking at it right now. The tests pass fine when you put
lucene-1.4.3.jar instead of the current lucene onto the classpath which
is what I've been doing so far. Something seems to have changed in the
scoring calculation. No idea what that might be. I'll see if I can find
out.
calculation now be done differently? If so, how?
Thanks for any clues into the right direction.
Wolfgang.
On May 2, 2005, at 9:05 AM, Wolfgang Hoschek wrote:
I'm looking at it right now. The tests pass fine when you put
lucene-1.4.3.jar instead of the current lucene onto the classpath
which
Yes, the svn trunk uses skipTo more often than 1.4.3.
However, your implementation of skipTo() needs some improvement.
See the javadoc of skipTo of class Scorer:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
Scorer.html#skipTo(int)
What's wrong with the version I sent? Remeber
The version I sent returns in O(1), if performance was your concern.
Or
did you mean something else?
Since 0 is the only document number in the index, a
return target == 0;
might be nice for skipTo(). It doesn't really help performance, though,
and the next() works just as well.
Regards,
Paul
Thanks!
Wolfgang.
I've committed this change after it successfully worked for me.
Thanks!
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Here is the first and most high-priority patch I've settled on to get
Lucene to work efficiently for the typical usage scenarios of
MemoryIndex. More patches are forthcoming if this one is received
favourably...
There's large overhead involved in forcing all IndexReader impls to
have a
Whichever place you settle on is fine with me.
[In case it might make a difference: Just note that MemoryIndex has a
small auxiliary dependency on PatternAnalyzer in addField() because the
Analyzer superclass doesn't have a tokenStream(String fieldName, String
text) method. And PatternAnalyzer
OK. I'll send an update as soon as I get round to it...
Wolfgang.
On Apr 27, 2005, at 12:22 PM, Doug Cutting wrote:
Erik Hatcher wrote:
I'm not quite sure where to put MemoryIndex - maybe it deserves to
stand on its own in a new contrib area?
That sounds good to me.
Ok... once Wolfgang gives me
is running round in the
woods),
* English));
* /pre
On Apr 22, 2005, at 1:53 PM, Wolfgang Hoschek wrote:
I've now got the contrib code cleaned up, tested and documented into a
decent state, ready for your review and comments.
Consider this a formal contrib (Apache license is attached
.
Cheers,
Wolfgang.
On Apr 20, 2005, at 11:26 AM, Wolfgang Hoschek wrote:
On Apr 20, 2005, at 9:22 AM, Erik Hatcher wrote:
On Apr 20, 2005, at 12:11 PM, Wolfgang Hoschek wrote:
By the way, by now I have a version against 1.4.3 that is 10-100
times faster (i.e. 3 - 20 index+query steps/sec
in
the first place!)
Luc
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 16, 2005 2:09 AM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
On Apr 15, 2005, at 6:15 PM, Wolfgang Hoschek wrote:
Cool
1 - 100 of 107 matches
Mail list logo