Hi All,

We have a usecase to create lineage between two hive tables.
This works well if the table (which is input to lineage process)  does not have 
any classification (tag) associated with it.

However, we have seen cases in production, where input hive table has 
classifications and it is also associated in other 40,000 lineage processes.

Also another point to note here is the classification propagation is modelled 
as ‘ONE_TO_TWO’.

The case where it fails is,
When the 40001 lineage process is created between the input and output tables, 
as tag propagation is marked as ‘ONE_TO_TWO’, the propagation starts happening 
to all the other 40,000 lineage processes as well that the input hive table is 
associated with.

We are on Atlas2.0 and in this version, this above operation did not complete 
even after 10 hours.


On further investigation, we found that Atlas2.0 code refers to this gremlin 
query “TAG_PROPAGATION_IMPACTED_INSTANCES_FOR_REMOVAL” and this query is 
responsible in fetching the 40000 lineage processes.



We also saw that in the latest versions of Atlas,  gremlin query is replaced 
with in-memory traversal, but it does traverse all the 40,000 
(https://issues.apache.org/jira/browse/ATLAS-3563).



However, what we want in our usecase is to propagate classification only to the 
new hive table output that was being created in the lineage workflow, instead 
of iterating over the remaining older 40,000 processes.





In order to achieve this, we have made some code changes as below in 
DeleteHandlerV1.



Before
final List<AtlasVertex> propagatedEntityVertices = 
CollectionUtils.isNotEmpty(classificationVertices) ? 
graphHelper.getIncludedImpactedVerticesWithReferences(toVertex, 
getRelationshipGuid(edge)) : null;



After

final List<AtlasVertex> propagatedEntityVertices = new ArrayList<>();

propagatedEntityVertices.add(toVertex);





We wanted to know if you see any other side effects if we go via this route, 
where we do not get the impactedVertices at all?

It would be very helpful if you could please shed some light on this.

Many Thanks,
Chaitra

Reply via email to