Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller
> Users don't deal with low level docvalues codec APIs, so I see this "as a user" as irrelevant, sorry. Higher-level classes (e.g. Field class) could impl it this way as implementation detail. Hmm, that's a different perspective than I had, but I understand where you're coming from and I think I

Re: Adding a new PointDocValuesField

2022-05-26 Thread Robert Muir
On Thu, May 26, 2022 at 11:49 AM Greg Miller wrote: > > I agree that technically it's just as good. I also think it's less > clear for a user. The concept of "points" is something we've > established in Lucene, so I think it makes sense for users to think > about indexing points as a doc value as

Re: Adding a new PointDocValuesField

2022-05-26 Thread Greg Miller
I agree that technically it's just as good. I also think it's less clear for a user. The concept of "points" is something we've established in Lucene, so I think it makes sense for users to think about indexing points as a doc value as opposed to having to manage multiple fields for all their

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 2:08 PM Greg Miller wrote: > > > I guess with an “unsorted” numeric DV type we could get there with aligned > indices, as you describe, but that seems less appealing than supporting > multi-dim points directly. > Name one technical reason why? Unsorted would be exactly

Re: Adding a new PointDocValuesField

2022-05-25 Thread Marc D'Mello
Read your example again and yes, that makes sense. I was only thinking in terms of single dimensions, my bad! On Wed, May 25, 2022 at 11:08 AM Greg Miller wrote: > I appreciate all the feedback, but disagree that we can accomplish what > we’re trying to do here with the existing fields. > >

Re: Adding a new PointDocValuesField

2022-05-25 Thread Greg Miller
I appreciate all the feedback, but disagree that we can accomplish what we’re trying to do here with the existing fields. It’s not sufficient to AND together multiple fields for this use-case because of the fact that the different dimensions can be multi-valued and not all combinations are valid.

Re: Adding a new PointDocValuesField

2022-05-25 Thread Marc D'Mello
> > But adding a new type should be the last resort. I did not realize that was the case, that's good to know. It seems like I should just use BDV (which does make the code change easier/faster so I have no issues with it). As for Patrick's suggestion of using separate numeric fields instead of

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 12:17 AM Greg Miller wrote: > > A "two separate field approach" would > consist of indexing year and make separately, and you'd lose the > information that only certain combinations are valid. Am I overlooking > something with your suggestion? Maybe there's something we

Re: Adding a new PointDocValuesField

2022-05-25 Thread Greg Miller
> then use LatLonDocValuesField Right! Actually, LatLonDocValuesField is a good example of what we're trying to do here, but specialized to the 2D, lat/long case. It stores a doc value representation of a lat/long point that can be used for "slow" queries—which complement the points-based

Re: Adding a new PointDocValuesField

2022-05-25 Thread Robert Muir
On Wed, May 25, 2022 at 8:04 AM Michael Sokolov wrote: > > Also, there should be examples from other fields. Suppose you are > indexing map data and want to support a UI that shows "hot spots" on > the map where there is a lot of let's say ... activity of some sort. > You'd like to facet on 2-d

Re: Adding a new PointDocValuesField

2022-05-25 Thread Michael Sokolov
Also, there should be examples from other fields. Suppose you are indexing map data and want to support a UI that shows "hot spots" on the map where there is a lot of let's say ... activity of some sort. You'd like to facet on 2-d areas. Or for log analytics -- you want to do anomaly detection

Re: Adding a new PointDocValuesField

2022-05-25 Thread Patrick Zhai
Hi Greg, thanks for the explanation! The example makes perfect sense to me, I was under the impression that this was combining two independent fields and I was wrong. I'm not biased towards having or not a new field for it, but for multi-value, don't we have a SortedSetDocValuesField that works

Re: Adding a new PointDocValuesField

2022-05-24 Thread Greg Miller
Thanks for the comments Patrick, but I'm not sure I'm fully understanding the suggestion here. I don't see a path forward that uses different fields, but maybe I'm missing something. Imagine you're running an ecommerce site selling automotive parts and you need to index fitment information that

Re: Adding a new PointDocValuesField

2022-05-24 Thread Patrick Zhai
As pointed out by Rob in the issue I would also suggest to start with the simple > separate-numeric-docvalues-fields case and use similar logic as the > org.apache.lucene.facet.range package, just on 2-D, or maybe 3-D, N-D, etc I think that's a preferable solution to me, because: 1. It does not

Re: Adding a new PointDocValuesField

2022-05-24 Thread Marc D'Mello
Hi, Thanks for the responses! For Patrick's question, right now in faceting we don't have any good way to AND between two fields. I think the original hyper rectangle issue has a good example of a use case: https://issues.apache.org/jira/browse/LUCENE-10274. As for Robert's point, this feature

Re: Adding a new PointDocValuesField

2022-05-24 Thread Robert Muir
This seems really exotic feature to add a dedicated docvalues field for. We should let BINARY be the catchall for stuff like this. On Mon, May 23, 2022 at 10:17 PM Marc D'Mello wrote: > > Hi, > > Some background: I've been working on this PR to add hyper rectangle faceting > capabilities to

Re: Adding a new PointDocValuesField

2022-05-23 Thread Patrick Zhai
Hi Marc Thank you for starting the discussion, I think all your points make sense, but I'm wondering if we really need everything packed into one field? And what are the advantages of doing that? I *think* most of the facet related use cases can be satisfied using multiple fields, one field per

Adding a new PointDocValuesField

2022-05-23 Thread Marc D'Mello
Hi, Some background: I've been working on this PR to add hyper rectangle faceting capabilities to Lucene facets and I needed to create a new doc values field to support this feature. Initially, I had a field that just extended BinaryDocValues, but then