Re: Info required on licensing of Lucene component

2023-03-22 Thread David Smiley
I suppose this begs the question, why are we including NOTICE.txt in our distribution for *anything* we don't distribute? ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Mar 21, 2023 at 7:57 PM Michael Sokolov wrote: > Lucene is licen

Re: [VOTE] Lucene logo contest, third time's a charm

2020-09-03 Thread David Smiley
(binding) vote: D, A1 (thanks Ryan for your thorough vote instructions & preparation)

Re: [VOTE] Lucene logo contest

2020-06-15 Thread David Smiley
C. The current Lucene logo [4] ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 15, 2020 at 6:08 PM Ryan Ernst wrote: > Dear Lucene and Solr developers! > > In February a contest was started to design a new logo for

Re: ComplexPhraseQueryParser performance question

2020-02-12 Thread David Smiley
uates" to Lucene core some day. It's placement in sandbox is why it can't be added to any of Lucene's query parsers like complex phrase. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Feb 12, 2020 at 11:07 AM wrote: > Hi,- > > Regard

Re: More Spatial Relations

2018-06-01 Thread David Smiley
ough doc-value seems slow. > It needs to load shapes from doc-value for all docs and check each with the > query shape. > > > 2018-06-02 3:31 GMT+08:00 David Smiley : > > > Hi Bingtao, > > > > > If I want to implement query for other relations, should I just >

Re: More Spatial Relations

2018-06-01 Thread David Smiley
Hi Bingtao, > If I want to implement query for other relations, should I just serialize shape to binary format(e.g. wkb) and fall back to jts? Yes. There's a lot already there but you'd need to subclass some stuff to add some other predicates. See CompositeSpatialStrategy and

Re: Spatial Indexing of Polygons

2017-08-15 Thread David Smiley
Hi Tom, If you qualify the solution as using BKD (what Lucene now calls "PointValues") then -- no. But that's an implementation detail. In terms of capabilities, if you want to index polygons (as represented on the surface of a sphere, not 2d) in Lucene, then it's possible. See Geo3dRptTest.

Re: Lucene GeoNear Search and Sort Performance

2017-07-19 Thread David Smiley
I like how you figured out how to add DocValues without having to modify PointVectorStrategy at all -- nice. But I think you are completely using PointVectorStrategy, even for filtering (query)? It does a relatively slow job at that; I suggested using RecursivePrefixTreeStrategy for the

Re: Lucene GeoNear Search and Sort Performance

2017-07-17 Thread David Smiley
Oh yes So in 5.x there was no DocValues option for this SpatialStrategy impl which is a shame of course. You could take the code for 6.x: https://github.com/apache/lucene-solr/blob/branch_6x/lucene/spatial-extras/src/java/org/apache/lucene/spatial/vector/PointVectorStrategy.java and back

Re: Lucene GeoNear Search and Sort Performance

2017-07-16 Thread David Smiley
As I mentioned that PointVectorStrategy has an argument that accepts a Lucene FieldType that you can add docValues to. On Sun, Jul 16, 2017 at 2:07 PM sc wrote: > Thanks for the suggestion. > > I changed the strategy to > > this.strategy = new PointVectorStrategy(ctx,

Re: Lucene GeoNear Search and Sort Performance

2017-07-14 Thread David Smiley
Hi "sc", I suspect you are hitting OOM for makeDistanceValueSource call on RecursivePrefixTreeStrategy. That strategy is best for filtering (the query), but it's a memory pig for distance sorting requirements. Instead, use PointVectorStrategy for the makeDistanceValueSource purpose. For that

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-07-02 Thread David Smiley
If there are no filters, then LatLonDocValuesField is going to be asked to sort all of your docs, which is obviously going to take awhile. Can you simply add a filter? Like a distance filter using LatLonPoint? On Thu, Jun 29, 2017 at 11:49 AM sc wrote: > Hi, > >I

Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1

2017-06-14 Thread David Smiley
Nice! On Tue, Jun 13, 2017 at 11:12 PM Tom Hirschfeld wrote: > Hey All, > > I was able to solve my problem a few weeks ago and wanted to update you > all. The root issue was with the caching mechanism in > "makedistancevaluesource" method in the lucene spatial module,

Re: Term Dictionary taking up lots of heap memory, looking for solutions, lucene 5.3.1

2017-06-06 Thread David Smiley
I know I'm late to this thread, but I saw this and specifically "reverse geocoding" and it caught my attention. I recently did this on a public project with Solr, which you may find of interest: https://github.com/cga-harvard/hhypermap-bop/tree/master/enrich/solr-geo-admin I'm super pleased with

Re: Highlighting and delineating Passages (fragmenting)

2017-05-30 Thread David Smiley
On Tue, May 30, 2017 at 9:25 AM Dawid Weiss wrote: > > #2 & #3 is the same requirement; you elaborate on #2 with more detail in > #3. > > The UH can't currently do this; but with the OH (original Highlighter) > you > > can but it appears somewhat awkward. See

Re: Highlighting and delineating Passages (fragmenting)

2017-05-30 Thread David Smiley
Looks like you should use the original Highlighter until requirement #2,3 can be done with the UnifiedHighlighter. Other than #2,3, the UH can handle all these requirements, and the OH can do all. On Sat, May 27, 2017 at 6:08 AM Dawid Weiss wrote: > Thanks for your

Highlighting and delineating Passages (fragmenting)

2017-05-26 Thread David Smiley
I was recently asked if/how the UnifiedHighlighter can return a Passage centered around the highlighted words. I'm responding to a wider audience (java-user list, ...). Each highlighter implementation fragments the content into passages (with highlights) using a different algorithm. The

Re: "Point in polygon" search with Lucene / Spatial4j / JTS

2016-06-05 Thread David Smiley
Hello Randy. If you are on Lucene 6x, or possibly some late 5x releases, there are newer Lucene spatial implementations that have fewer moving parts to them and so will be simpler. I'm almost certain they would be fastest too, although perhaps that's not much of an issue with only 100k's of data

Re: highlighter with query over more than one word

2016-06-03 Thread David Smiley
It would help tremendously if you can give a specific code example showing the problem. On Thu, Jun 2, 2016 at 6:41 AM Sascha Janz wrote: > > we use highlighter to get textfragments for our hit list. > > the code is straight forward like this > >Analyzer analyzer = new

Re: Best way to plug in alternative range query support

2016-05-25 Thread David Smiley
Ken, See BooleanQuery.Builder. p.s. nice to see you at Apache Big Data in Vancouver. ~ David On Thu, May 19, 2016 at 4:28 PM Ken Krugler wrote: > Hi all, > > I’ve got an alternative representation in the index for numeric fields, > and I need to construct an

Re: Equivalent LatLongDistanceFilter in Lucene 4.4 API

2013-10-08 Thread David Smiley (@MITRE.org)
Hi James, The spatial module in v4 is completely different than the one in v3. It would be good for you to review the new API rather then looking for a 1-1 equivalent to a class that existed in v3. Take a look at the top level javadocs for the spatial module, and in particular look at

Re: Spatial indexing: IndexOutOfBounds in QuadPrefixTree

2013-03-09 Thread David Smiley (@MITRE.org)
Just finished: http://wiki.apache.org/solr/SpatialForTimeDurations - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-indexing-IndexOutOfBounds-in-QuadPrefixTree-tp4040511p4045998.html

DiskDocValues vs Lucene42Codec

2013-03-08 Thread David Smiley (@MITRE.org)
DiskDocValues is a codec (or part of a codec, apparenlty) for accessing the DocValues from disk, with minimal RAM usage for things like offsets. Lucene42Codec alternatively puts all of DocValues in RAM. Is the actual disk resident data format the same between them? And how do you pick choose

Re: DiskDocValues vs Lucene42Codec

2013-03-08 Thread David Smiley (@MITRE.org)
Thanks Robert; that's very helpful. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/DiskDocValues-vs-Lucene42Codec-tp4044061p4045935.html Sent from the Lucene - Java Users mailing list

Re: Spatial indexing: IndexOutOfBounds in QuadPrefixTree

2013-03-08 Thread David Smiley (@MITRE.org)
Paul, FYI: http://lucene.472066.n3.nabble.com/InvalidShapeException-when-using-SpatialRecursivePrefixTreeFieldType-with-custom-worldBounds-tt4045351.html I suggested to file a bug report. ~ David Paul Alexandrow wrote Hi List, I've encountered this problem using Solr (4.1.0), but as far

Re: question re lucene spatial toolkit aka LSP aka spatial4j

2012-08-08 Thread David Smiley (@MITRE.org)
I responded to the solr-user thread. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/question-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3999889p425.html Sent from the Lucene -

Re: Spatial Search

2012-07-10 Thread David Smiley (@MITRE.org)
Amir, CachedDistanceValueSource is indeed poorly named; I need to get renaming it on the TODO list; I've identified this before. Calculating the distance is computationally cheap enough to calculate for the X number of results (top-20-ish) you are returning in your search results to not bother

Re: Spatial Search

2012-07-09 Thread David Smiley (@MITRE.org)
Amir, The geo query's score happens to be the distance but I don't expect that to remain so. I plan for it to be 1/distance which is a better relevancy value -- a score is supposed to be about relevancy after all. If you want to get the distance for a search result, I recommend calculating it

Re: Spatial Search

2012-01-03 Thread David Smiley (@MITRE.org)
The Extras / demo related part arguably doesn't count. ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-tp3623494p3629480.html Sent from the Lucene - Java Users

Re: Spatial Search

2012-01-01 Thread David Smiley (@MITRE.org)
as TwoDoublesStrategy does; that's something I need to improve, but it works. ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-tp3623494p3625177.html Sent from

Re: Spatial Search

2012-01-01 Thread David Smiley (@MITRE.org)
served working with Lucene directly instead of Solr (or Solr's Lucene based competitors). ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Spatial-Search-tp3623494p3625962.html Sent