[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 +1 by inspection. Thanks a lot for the work on this. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Thanks @cestella and @mmiklavc. Just waiting on a +1 from @justinleet and we'll call it done. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 Lgtm with these changes. I'm +1 by inspection. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 @mmiklavc let me know if the recent commit satisfies your feedback. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 @justinleet @merrimanr Nope, I'm good. I'll give my +1 (pending acceptance by the rest of the people who have commented here). ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-25 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Latest commit fixes the copy/paste error and adds copyFields documentation to the README. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-24 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 I think, based on my understanding, that we document around the copy fields limitation. @cestella @mmiklavc are you both good with the approach of documenting the schema definitions for L

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-24 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Yes we can set docValues and stored to false for copyFields. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-24 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr Can the copyFields destination be `docValues=false`? I'm inclined to just document that requirement (assuming it works), and not worry too much about it until someone actually needs so

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-24 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 > So, from the last few examples discussed it suggests to me that being a polyfield is actually a problem, but it's only part of the total reason for the problem. Under the hood they are setting st

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-24 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @mmiklavc That sounds right, yes. For returning dynamic fields (at least on the known problematic fields), I don't think it's pretty much ever advantageous. At most it avoids a split on the LatLon

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 @justinleet - Thanks for that write up - that helps considerably. Just to round out @merrimanr 's comment about polyfields: >At this point I question whether a field being a polyfiel

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Thanks for the thorough reply @justinleet. Everything you have said makes sense to me. In my testing I also found copy fields to be an issue if they were set to stored. Same problem as L

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr Let me replay my understanding to see if I'm on the right track. The problem we have is that we're returning fields that we can't reindex as a whole document when we run a glob

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 A note about the added integration test. I moved the test to the SolrUpdateIntegrationTest because this issue is specific to Solr. Adding an additional test exposed a problem in our Solr in memor

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 To give an idea of how I arrived at the conclusion that other fields are not a problem, this is how I tested it. I will use currency as an example. First I added this to our bro schema: ```

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Just to clarify, both `docValues` and `stored` needs to be set to false. If one or the other are true then it fails. At this point I question whether a field being a polyfield is causing t

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 I'd be ok with going the `docValues=false` route, but I'm really curious as to why other polytypes wouldn't have this issue. I'll hold for that justification. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 If we're going to the `docValues=false` + documentation route, we should have the documentation be more generic than just the LatLon and Point (e.g. custom types that may get used in more advance

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 The latest commit reverts the original fix in this PR, adds the extra dynamic field to the existing schemas, and adds a section in the Solr README documenting this for LatLon and PointType field ty

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Adding this dynamic field worked for me as well. I tested in a Solr installation as well as the integration test. I believe this will solve our issue with the older LatLon type. The only other f

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-23 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 For the integration test, adding the specific dynamic fields for the underlying LatLon seems to take care of the problem for that type. e.g. this makes the test you added pass: ```

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-22 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Yes it should be simple string values or, in other words, the doc should look the same minus the changes. I should have stated this in my previous comment but ideally those *_coordinate fields wou

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-20 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 ``` { "enrichments.geo.ip_dst_addr.location_point_0_coordinate": [ "34.0438", "34.0438" ], "enrichments.geo.ip_dst_addr.location_point_1_coordinate": [ "-

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-20 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Another potential solution I thought of is to only return the intersection of the fields defined in the schema with fields that are present in a document. This could potentially eliminate the need

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-20 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 I did some additional testing and here are my notes. I've attempted to compile a list of all the proposed solutions and their outcomes. > checking the type for isPolyField() This

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-20 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Just added an integration test that demonstrates this behavior. I had to switch from the test schema to the schema that is actually deployed and remove the HBaseDao from this particular test to ge

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 I've been changing/uploading schemas in full dev with the Solr CLI and using the Solr Admin console to test. I can write up with a list of commands that mimic what I'm doing in the console or crea

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr Ok, that's fine, we can do it on full-dev. Can you please submit a step-by-step manual testing script to exhibit this issue? I was asking for an integration test because then it'll be r

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 If you add the "-h node1" option to the start command in install_solr.sh you can develop locally against Solr in full dev. The line ends up looking like: ``` su $SOLR_USER -c "bin/solr -e

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 Can we get this problem exhibited in the integration tests? It would help us try potential solutions out easily. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 @justinleet I tested that and it looks like it's just what the schema defines. The only way we could use those is if we did a regex max on the actual fields in a document. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr Re: dynamic fields. Looks like you can get them in 6.6, but possibly not 5.5 https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/client/solrj/response/LukeResponse

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 @justinleet > They may not know, or even be able to easily discover, what fields are poly types and what aren't. @cestella, @merrimanr >>Are date fields polyfields?

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 To add onto my last comment, the subfields probably get created right now, but get caught by the "*" field that's type "ignored", which seems wrong. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr Not sure if it returns dynamic fields. I didn't see them, but there might be a different way to grab them. Again, it's a really cursory idea that needs more research if other things f

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 @mmiklavc The main worry is I'm not honestly sure if that's exactly how it works. I didn't dig into the caveats or any issues that we might hit with it. I also don't know what constraints get pl

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 @justinleet will this return dynamic fields? Why did you change the schema? ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 I agree, I'm not seeing how partial updates solves the general problem of "We need to avoid updates against derived fields". End users aren't going to necessarily know what a generated field is.

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 @justinleet Your example suggests that by using a LukeRequest and calling getFieldInfo() on the result, we should be able to naturally obtain the filtered list of fields that can be used as a whitel

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 The schema declaration itself might have changed between 5.x and 6.x. I saw something about it, but didn't double check since I hadn't looked in to the request at all yet. The request itself shou

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 @justinleet Looks like [LukeRequestHandler](http://lucene.apache.org/solr/5_4_1/solr-core/org/apache/solr/handler/admin/LukeRequestHandler.html) has been around since Solr 1.2. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 > I'm not following. A patch would only include the field to be added, changed, or removed. JSONPatch, the input that we accept for the `patch` command, offers more than add, change and rem

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 > Can you confirm that this isn't an issue when you set docValues to false, like in my previous comment? I will research this. > Are date fields polyfields? I don't know

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 It might also be possible to do this the other way, rather than getting a list of all fields then figuring out which are polyFields, we might be able to get just all valid fields. Substantially m

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 Before going down the path of partial document update: * Can you confirm that this isn't an issue when you set docValues to false, like in my previous comment? * Are date fields polyfields?

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-19 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 After doing more research on this, managing fields included when updating a document looks to be more complex than we thought. The `isPolyField()` method is part of the internal API and we don't h

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/995 An interesting consequence of not using docValues is https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-RetrievingDocValuesDuringSearch >When useDocValuesAsStored="false", n

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 Another possibility, which these guys [did](https://github.com/ukwa/webarchive-discovery/issues/105) is disabling docValues for the spatial field. I'm not clear on the consequences of that. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 One thing that I would question, looking at that answer, is whether it's just a LatLonType problem or if it really is a problem with all poly fields. I would suggest tryign to replicate with a curr

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 Yes, but according to that thread, this became a problem in 6.5. In 5.5, presumably, this worked. Should we, perhaps, have different schema for 5.x vs 6.x? ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 I did see that but that field type doesn't in Solr 5.5. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 I'm sure you did this, @merrimanr, but I googled your error and ran upon [this](https://stackoverflow.com/questions/44375034/solr-docvaluesfield-appears-more-than-once-in-this-document-solr-6-5). A

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/995 @cestella I think that must be the case, given that we cannot assume the fields present beyond a certain set. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 @merrimanr @ottobackwards Would it be fair to say that a sufficient solution would be one which handles all types such that `isPolyField() == true`? This should handle the cases in which users use

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 There are a finite number of field types that ship with Solr so the fact that someone can create an infinite number of fields really isn't an issue. Any field that is created and stored in Solr ha

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/995 I am thinking of the metron as a framework POV here. Right now, someone can create a parser, an enrichment a (in this PR's world) solr/es schema for their input. This PR makes assumpt

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 Let's say you want to add/remove/change a field in an existing document. You would lookup the document, make your change, then reindex the document. The problem is that expanded fields (polyfield

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 @ottobackwards At the very least, if this is indeed a problem with all [PointType](http://lucene.apache.org/solr/6_3_0/solr-core/org/apache/solr/schema/PointType.html) fields and not just lat/long f

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/995 > This causes a problem in our DAO layer because we don't do partial updates (we reindex the whole document) and these expanded fields are included in the updated document. @merrimanr Can y

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread ottobackwards
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/995 Is it possible for parser writers to create other fields that have this issue? ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 If you can do it without schema caching, that'd be best. That being said, hitting solr each query to retrieve type information is likely to be a bad idea as well. ---

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/995 I can play around with this. Extending my comment above, we might be able to use this method to determine which fields are expanded rather than just looking for the "subFieldSuffix" attribute. Al

[GitHub] metron issue #995: METRON-1526: Location field types cause DocValuesField ap...

2018-04-18 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/995 It seems as though we should be checking the type for `isPolyField()` (see [here](http://lucene.apache.org/solr/6_3_0/solr-core/org/apache/solr/schema/PointType.html#isPolyField--). The p