Please give us some more time to get back to you. ~ ashutosh
On 2021/04/01 17:49:17, "Rao, Chaitra" <[email protected]> wrote: > Hi All, > > We have a usecase to create lineage between two hive tables. > This works well if the table (which is input to lineage process) does not > have any classification (tag) associated with it. > > However, we have seen cases in production, where input hive table has > classifications and it is also associated in other 40,000 lineage processes. > > Also another point to note here is the classification propagation is modelled > as ‘ONE_TO_TWO’. > > The case where it fails is, > When the 40001 lineage process is created between the input and output > tables, as tag propagation is marked as ‘ONE_TO_TWO’, the propagation starts > happening to all the other 40,000 lineage processes as well that the input > hive table is associated with. > > We are on Atlas2.0 and in this version, this above operation did not complete > even after 10 hours. > > > On further investigation, we found that Atlas2.0 code refers to this gremlin > query “TAG_PROPAGATION_IMPACTED_INSTANCES_FOR_REMOVAL” and this query is > responsible in fetching the 40000 lineage processes. > > > > We also saw that in the latest versions of Atlas, gremlin query is replaced > with in-memory traversal, but it does traverse all the 40,000 > (https://issues.apache.org/jira/browse/ATLAS-3563). > > > > However, what we want in our usecase is to propagate classification only to > the new hive table output that was being created in the lineage workflow, > instead of iterating over the remaining older 40,000 processes. > > > > > > In order to achieve this, we have made some code changes as below in > DeleteHandlerV1. > > > > Before > final List<AtlasVertex> propagatedEntityVertices = > CollectionUtils.isNotEmpty(classificationVertices) ? > graphHelper.getIncludedImpactedVerticesWithReferences(toVertex, > getRelationshipGuid(edge)) : null; > > > > After > > final List<AtlasVertex> propagatedEntityVertices = new ArrayList<>(); > > propagatedEntityVertices.add(toVertex); > > > > > > We wanted to know if you see any other side effects if we go via this route, > where we do not get the impactedVertices at all? > > It would be very helpful if you could please shed some light on this. > > Many Thanks, > Chaitra >
