Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
+1 by inspection. Thanks a lot for the work on this.
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Thanks @cestella and @mmiklavc. Just waiting on a +1 from @justinleet and
we'll call it done.
---
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
Lgtm with these changes. I'm +1 by inspection.
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
@mmiklavc let me know if the recent commit satisfies your feedback.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet @merrimanr Nope, I'm good. I'll give my +1 (pending acceptance
by the rest of the people who have commented here).
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Latest commit fixes the copy/paste error and adds copyFields documentation
to the README.
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
I think, based on my understanding, that we document around the copy fields
limitation.
@cestella @mmiklavc are you both good with the approach of documenting the
schema definitions for L
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Yes we can set docValues and stored to false for copyFields.
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr Can the copyFields destination be `docValues=false`? I'm
inclined to just document that requirement (assuming it works), and not worry
too much about it until someone actually needs so
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
> So, from the last few examples discussed it suggests to me that being a
polyfield is actually a problem, but it's only part of the total reason for the
problem. Under the hood they are setting st
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@mmiklavc That sounds right, yes. For returning dynamic fields (at least on
the known problematic fields), I don't think it's pretty much ever
advantageous. At most it avoids a split on the LatLon
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet - Thanks for that write up - that helps considerably.
Just to round out @merrimanr 's comment about polyfields:
>At this point I question whether a field being a polyfiel
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Thanks for the thorough reply @justinleet. Everything you have said makes
sense to me.
In my testing I also found copy fields to be an issue if they were set to
stored. Same problem as L
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr Let me replay my understanding to see if I'm on the right track.
The problem we have is that we're returning fields that we can't reindex as
a whole document when we run a glob
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
A note about the added integration test. I moved the test to the
SolrUpdateIntegrationTest because this issue is specific to Solr. Adding an
additional test exposed a problem in our Solr in memor
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
To give an idea of how I arrived at the conclusion that other fields are
not a problem, this is how I tested it. I will use currency as an example.
First I added this to our bro schema:
```
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Just to clarify, both `docValues` and `stored` needs to be set to false.
If one or the other are true then it fails.
At this point I question whether a field being a polyfield is causing t
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
I'd be ok with going the `docValues=false` route, but I'm really curious as
to why other polytypes wouldn't have this issue. I'll hold for that
justification.
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
If we're going to the `docValues=false` + documentation route, we should
have the documentation be more generic than just the LatLon and Point (e.g.
custom types that may get used in more advance
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
The latest commit reverts the original fix in this PR, adds the extra
dynamic field to the existing schemas, and adds a section in the Solr README
documenting this for LatLon and PointType field ty
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Adding this dynamic field worked for me as well. I tested in a Solr
installation as well as the integration test. I believe this will solve our
issue with the older LatLon type. The only other f
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
For the integration test, adding the specific dynamic fields for the
underlying LatLon seems to take care of the problem for that type.
e.g. this makes the test you added pass:
```
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Yes it should be simple string values or, in other words, the doc should
look the same minus the changes. I should have stated this in my previous
comment but ideally those *_coordinate fields wou
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
```
{
"enrichments.geo.ip_dst_addr.location_point_0_coordinate": [
"34.0438",
"34.0438"
],
"enrichments.geo.ip_dst_addr.location_point_1_coordinate": [
"-
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Another potential solution I thought of is to only return the intersection
of the fields defined in the schema with fields that are present in a document.
This could potentially eliminate the need
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
I did some additional testing and here are my notes. I've attempted to
compile a list of all the proposed solutions and their outcomes.
> checking the type for isPolyField()
This
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Just added an integration test that demonstrates this behavior. I had to
switch from the test schema to the schema that is actually deployed and remove
the HBaseDao from this particular test to ge
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
I've been changing/uploading schemas in full dev with the Solr CLI and
using the Solr Admin console to test. I can write up with a list of commands
that mimic what I'm doing in the console or crea
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr Ok, that's fine, we can do it on full-dev. Can you please
submit a step-by-step manual testing script to exhibit this issue? I was
asking for an integration test because then it'll be r
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
If you add the "-h node1" option to the start command in install_solr.sh
you can develop locally against Solr in full dev. The line ends up looking
like:
```
su $SOLR_USER -c "bin/solr -e
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
Can we get this problem exhibited in the integration tests? It would help
us try potential solutions out easily.
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet I tested that and it looks like it's just what the schema
defines. The only way we could use those is if we did a regex max on the
actual fields in a document.
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr Re: dynamic fields. Looks like you can get them in 6.6, but
possibly not 5.5
https://lucene.apache.org/solr/6_6_0/solr-solrj/org/apache/solr/client/solrj/response/LukeResponse
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet
> They may not know, or even be able to easily discover, what fields are
poly types and what aren't.
@cestella, @merrimanr
>>Are date fields polyfields?
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
To add onto my last comment, the subfields probably get created right now,
but get caught by the "*" field that's type "ignored", which seems wrong.
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr Not sure if it returns dynamic fields. I didn't see them, but
there might be a different way to grab them. Again, it's a really cursory idea
that needs more research if other things f
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
@mmiklavc The main worry is I'm not honestly sure if that's exactly how it
works. I didn't dig into the caveats or any issues that we might hit with it.
I also don't know what constraints get pl
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet will this return dynamic fields? Why did you change the schema?
---
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
I agree, I'm not seeing how partial updates solves the general problem of
"We need to avoid updates against derived fields". End users aren't going to
necessarily know what a generated field is.
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet Your example suggests that by using a LukeRequest and calling
getFieldInfo() on the result, we should be able to naturally obtain the
filtered list of fields that can be used as a whitel
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
The schema declaration itself might have changed between 5.x and 6.x. I
saw something about it, but didn't double check since I hadn't looked in to the
request at all yet. The request itself shou
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
@justinleet Looks like
[LukeRequestHandler](http://lucene.apache.org/solr/5_4_1/solr-core/org/apache/solr/handler/admin/LukeRequestHandler.html)
has been around since Solr 1.2.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
> I'm not following. A patch would only include the field to be added,
changed, or removed.
JSONPatch, the input that we accept for the `patch` command, offers more
than add, change and rem
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
> Can you confirm that this isn't an issue when you set docValues to false,
like in my previous comment?
I will research this.
> Are date fields polyfields?
I don't know
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
It might also be possible to do this the other way, rather than getting a
list of all fields then figuring out which are polyFields, we might be able to
get just all valid fields. Substantially m
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
Before going down the path of partial document update:
* Can you confirm that this isn't an issue when you set docValues to false,
like in my previous comment?
* Are date fields polyfields?
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
After doing more research on this, managing fields included when updating a
document looks to be more complex than we thought. The `isPolyField()` method
is part of the internal API and we don't h
Github user justinleet commented on the issue:
https://github.com/apache/metron/pull/995
An interesting consequence of not using docValues is
https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-RetrievingDocValuesDuringSearch
>When useDocValuesAsStored="false", n
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
Another possibility, which these guys
[did](https://github.com/ukwa/webarchive-discovery/issues/105) is disabling
docValues for the spatial field. I'm not clear on the consequences of that.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
One thing that I would question, looking at that answer, is whether it's
just a LatLonType problem or if it really is a problem with all poly fields. I
would suggest tryign to replicate with a curr
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
Yes, but according to that thread, this became a problem in 6.5. In 5.5,
presumably, this worked. Should we, perhaps, have different schema for 5.x vs
6.x?
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
I did see that but that field type doesn't in Solr 5.5.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
I'm sure you did this, @merrimanr, but I googled your error and ran upon
[this](https://stackoverflow.com/questions/44375034/solr-docvaluesfield-appears-more-than-once-in-this-document-solr-6-5).
A
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/995
@cestella I think that must be the case, given that we cannot assume the
fields present beyond a certain set.
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
@merrimanr @ottobackwards Would it be fair to say that a sufficient
solution would be one which handles all types such that `isPolyField() ==
true`? This should handle the cases in which users use
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
There are a finite number of field types that ship with Solr so the fact
that someone can create an infinite number of fields really isn't an issue.
Any field that is created and stored in Solr ha
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/995
I am thinking of the metron as a framework POV here. Right now, someone
can create a parser, an enrichment a (in this PR's world) solr/es schema for
their input.
This PR makes assumpt
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
Let's say you want to add/remove/change a field in an existing document.
You would lookup the document, make your change, then reindex the document.
The problem is that expanded fields (polyfield
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
@ottobackwards At the very least, if this is indeed a problem with all
[PointType](http://lucene.apache.org/solr/6_3_0/solr-core/org/apache/solr/schema/PointType.html)
fields and not just lat/long f
Github user mmiklavc commented on the issue:
https://github.com/apache/metron/pull/995
> This causes a problem in our DAO layer because we don't do partial
updates (we reindex the whole document) and these expanded fields are included
in the updated document.
@merrimanr Can y
Github user ottobackwards commented on the issue:
https://github.com/apache/metron/pull/995
Is it possible for parser writers to create other fields that have this
issue?
---
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
If you can do it without schema caching, that'd be best. That being said,
hitting solr each query to retrieve type information is likely to be a bad idea
as well.
---
Github user merrimanr commented on the issue:
https://github.com/apache/metron/pull/995
I can play around with this. Extending my comment above, we might be able
to use this method to determine which fields are expanded rather than just
looking for the "subFieldSuffix" attribute. Al
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/995
It seems as though we should be checking the type for `isPolyField()` (see
[here](http://lucene.apache.org/solr/6_3_0/solr-core/org/apache/solr/schema/PointType.html#isPolyField--).
The p
64 matches
Mail list logo