Please give us some more time to get back to you.

~ ashutosh

On 2021/04/01 17:49:17, "Rao, Chaitra" <[email protected]> wrote: 
> Hi All,
> 
> We have a usecase to create lineage between two hive tables.
> This works well if the table (which is input to lineage process)  does not 
> have any classification (tag) associated with it.
> 
> However, we have seen cases in production, where input hive table has 
> classifications and it is also associated in other 40,000 lineage processes.
> 
> Also another point to note here is the classification propagation is modelled 
> as ‘ONE_TO_TWO’.
> 
> The case where it fails is,
> When the 40001 lineage process is created between the input and output 
> tables, as tag propagation is marked as ‘ONE_TO_TWO’, the propagation starts 
> happening to all the other 40,000 lineage processes as well that the input 
> hive table is associated with.
> 
> We are on Atlas2.0 and in this version, this above operation did not complete 
> even after 10 hours.
> 
> 
> On further investigation, we found that Atlas2.0 code refers to this gremlin 
> query “TAG_PROPAGATION_IMPACTED_INSTANCES_FOR_REMOVAL” and this query is 
> responsible in fetching the 40000 lineage processes.
> 
> 
> 
> We also saw that in the latest versions of Atlas,  gremlin query is replaced 
> with in-memory traversal, but it does traverse all the 40,000 
> (https://issues.apache.org/jira/browse/ATLAS-3563).
> 
> 
> 
> However, what we want in our usecase is to propagate classification only to 
> the new hive table output that was being created in the lineage workflow, 
> instead of iterating over the remaining older 40,000 processes.
> 
> 
> 
> 
> 
> In order to achieve this, we have made some code changes as below in 
> DeleteHandlerV1.
> 
> 
> 
> Before
> final List<AtlasVertex> propagatedEntityVertices = 
> CollectionUtils.isNotEmpty(classificationVertices) ? 
> graphHelper.getIncludedImpactedVerticesWithReferences(toVertex, 
> getRelationshipGuid(edge)) : null;
> 
> 
> 
> After
> 
> final List<AtlasVertex> propagatedEntityVertices = new ArrayList<>();
> 
> propagatedEntityVertices.add(toVertex);
> 
> 
> 
> 
> 
> We wanted to know if you see any other side effects if we go via this route, 
> where we do not get the impactedVertices at all?
> 
> It would be very helpful if you could please shed some light on this.
> 
> Many Thanks,
> Chaitra
> 

Reply via email to