Hi All, We have a usecase to create lineage between two hive tables. This works well if the table (which is input to lineage process) does not have any classification (tag) associated with it.
However, we have seen cases in production, where input hive table has classifications and it is also associated in other 40,000 lineage processes. Also another point to note here is the classification propagation is modelled as ‘ONE_TO_TWO’. The case where it fails is, When the 40001 lineage process is created between the input and output tables, as tag propagation is marked as ‘ONE_TO_TWO’, the propagation starts happening to all the other 40,000 lineage processes as well that the input hive table is associated with. We are on Atlas2.0 and in this version, this above operation did not complete even after 10 hours. On further investigation, we found that Atlas2.0 code refers to this gremlin query “TAG_PROPAGATION_IMPACTED_INSTANCES_FOR_REMOVAL” and this query is responsible in fetching the 40000 lineage processes. We also saw that in the latest versions of Atlas, gremlin query is replaced with in-memory traversal, but it does traverse all the 40,000 (https://issues.apache.org/jira/browse/ATLAS-3563). However, what we want in our usecase is to propagate classification only to the new hive table output that was being created in the lineage workflow, instead of iterating over the remaining older 40,000 processes. In order to achieve this, we have made some code changes as below in DeleteHandlerV1. Before final List<AtlasVertex> propagatedEntityVertices = CollectionUtils.isNotEmpty(classificationVertices) ? graphHelper.getIncludedImpactedVerticesWithReferences(toVertex, getRelationshipGuid(edge)) : null; After final List<AtlasVertex> propagatedEntityVertices = new ArrayList<>(); propagatedEntityVertices.add(toVertex); We wanted to know if you see any other side effects if we go via this route, where we do not get the impactedVertices at all? It would be very helpful if you could please shed some light on this. Many Thanks, Chaitra
