No problem. I think your suggestion makes sense (essentially namespacing it) although you might want to be more verbose (ie. Relationship instead of r). Or maybe you really want to introduce namespacing here not just by convention.
I was also surprised that the DataSet definition superseded the definition of the entity “inheriting”. What happened in our case is that we are testing the atlas-Spark connector and we were testing against a pretty recent version of Atlas. You probably want to keep a somewhat stable “API” in this case. Ie. atlas 1.0 was working but atlas 1.1-branch wasn’t. That are unexpected changes for minor version changes I would argue. Another suggestion is to fail with a conflict notice on loading the definition instead of when trying to define entities. As mentioned all error messages were not helpful and I really had to dig down into Atlas’ source code to kind of grasp what was happening. B. Verstuurd vanaf mijn iPad > Op 21 okt. 2018 om 18:34 heeft Madhan Neethiraj <[email protected]> het > volgende geschreven: > > Bolke, > > Good to know you found the root cause; I apologize for the delay in > responding to your first report of this issue. > > One approach to avoid such attribute name conflict (between an entity’s own > attribute and attribute injected into an entity by a relationship) is to: > scope relationship-introduced attributes with name prefix like “r:”. If you > have any suggestions, please share. I will create a JIRA to track this issue > shortly. > > Thanks, > Madhan > > > From: Bolke de Bruin <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Saturday, October 20, 2018 at 5:47 AM > To: "[email protected]" <[email protected]> > Subject: Re: invalid relationshipDef: avro_schema_associatedEntities: end > type 1: DataSet, end type 2: spark_table > > Thanks Madhan, > > We figured this out ourselves and it was indeed the issue (sorry for not > reporting back). One thing which could be a lot better is the error message, > as it was not at all helpful and actually pointed us in the wrong direction. > Obviously handling this better in general would be appreciated. > > Cheers > Bolke > > > On 20 Oct 2018, at 03:23, Madhan Neethiraj <[email protected]> wrote: > > Bolke, > > This issue is caused by use of attribute named schema in spark_table, which > clashes with same named attribute in its super-type DataSet; type DataSet got > relationship attribute named schema from type-definitions in > 1065-avro_model.json (shown below). > > We need to think thorough the details on handling such conflicts, which might > occur outside control of type authors. > > In the meantime, here are couple of options you can try: > 1. if you are not using types defined in 1065-avro_model.json, I would > suggest to remove these types and try. This should get you going. > 2. other option is to rename attribute ‘spark_table.schema’ to say > ‘spark_table.columns’. > > Hope this helps. > > Madhan > > > > Entity-def avro_schema: (from 1065-avro_model.json) > { > "name": "avro_schema", > "description": "Atlas Type representing Abstract Top-level Avro > Schema", > "superTypes": [ > "avro_record" > ], > "typeVersion": "1.0", > "attributeDefs": [ > { > "name": "namespace", > "typeName": "string", > "cardinality": "SINGLE", > "isIndexable": true, > "isOptional": false, > "isUnique": false > }, > { > "name": "associatedEntities", > "typeName": "array<DataSet>", > "cardinality": "LIST", > "isIndexable": false, > "isOptional": true, > "isUnique": false > } > ] > } > > Relationship-def avro_schema_associatedEntities: (from 1065-avro_model.json) > { > "name": "avro_schema_associatedEntities", > "typeVersion": "1.0", > "relationshipCategory": "ASSOCIATION", > "endDef1": { > "type": "avro_schema", > "name": "associatedEntities", > "isContainer": false, > "cardinality": "SET", > "isLegacyAttribute": true > }, > "endDef2": { > "type": "DataSet", > "name": "schema", ç this adds a relationship attribute > ‘schema’ to DataSet type > "isContainer": false, > "cardinality": "SET" > }, > "propagateTags": "NONE" > } > > > > > > From: Bolke de Bruin <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, October 10, 2018 at 6:42 AM > To: "[email protected]" <[email protected]> > Subject: Re: invalid relationshipDef: avro_schema_associatedEntities: end > type 1: DataSet, end type 2: spark_table > > This is the relevant debug from Atlas: > > 2018-10-10 13:27:31,717 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ Finding edges for vertex[id=8248 > type=spark_table guid=b6183317-63c8-4147-a91f-cddef2e5fdab] with label > __spark_table.schema (GraphHelper:337) > 2018-10-10 13:27:31,717 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ > getRelationshipEdgeLabel(avro_schema_associatedEntities) > (AtlasRelationshipStoreV2:757) > 2018-10-10 13:27:31,718 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ Finding edges for vertex[id=8248 > type=spark_table guid=b6183317-63c8-4147-a91f-cddef2e5fdab] with label > r:avro_schema_associatedEntities (GraphHelper:337) > 2018-10-10 13:27:31,719 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ <== > AtlasErrorCode.getMessage([avro_schema_associatedEntities, DataSet, > spark_table]) (AtlasErrorCode:221) > 2018-10-10 13:27:31,720 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ ==> > AtlasErrorCode.getMessage([avro_schema_associatedEntities, DataSet, > spark_table]): invalid relationshipDef: avro_schema_associatedEntities: end > type 1: DataSet, end type 2: spark_table (AtlasErrorCode:228) > 2018-10-10 13:27:31,720 DEBUG - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ PERF|createOrUpdate()|1527 > (AtlasPerfTracer:77) > 2018-10-10 13:27:31,721 ERROR - [pool-1-thread-9 - > fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ graph rollback due to exception > (GraphTransactionInterceptor:154) > > > > > > On 10 Oct 2018, at 15:13, Bolke de Bruin <[email protected]> wrote: > > Hi, > > We are trying to use the Spark connector for Atlas and we are encountering an > issue we do not understand. To reproduce use a clean Atlas installation and > build the Atlas 1.0 connector from > https://github.com/hortonworks-spark/spark-atlas-connector and use a Spark > 2.3. Follow the instructions to add the listeners and then run > > scala> > Seq((1,2)).toDF("i","j").write.mode("overwrite").saveAsTable("default.atlas_bolke”) > > This is the result > > org.apache.atlas.AtlasServiceException: Metadata service API > org.apache.atlas.AtlasClientV2$API_V2@1dfc2ecd failed with status 400 (Bad > Request) Response Body > ({"errorCode":"ATLAS-400-00-036","errorMessage":"invalid relationshipDef: > avro_schema_associatedEntities: end type 1: DataSet, end type 2: > spark_table”}) > > We have no clue why this error occurs. This relationship is not defined by > the spark connector, neither is it referenced. This is the JSON dump of the > entity definition of spark_table: > > > { > "enumDefs": [], > "structDefs": [], > "classificationDefs": [], > "entityDefs": [{ > "category": "ENTITY", > "guid": "3bd9315c-f159-4865-ac8d-11dbcca79adc", > "createdBy": "admin", > "updatedBy": "admin", > "createTime": 1539112556115, > "updateTime": 1539112556115, > "version": 1, > "name": "spark_table", > "description": "spark_table", > "typeVersion": "1.0", > "attributeDefs": [{ > "name": "qualifiedName", > "typeName": "string", > "isOptional": false, > "cardinality": "SINGLE", > "valuesMinCount": 1, > "valuesMaxCount": 1, > "isUnique": true, > "isIndexable": true, > "includeInNotification": false > }, { > "name": "database", > "typeName": "spark_db", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "tableType", > "typeName": "string", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "storage", > "typeName": "spark_storagedesc", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false, > "constraints": [{ > "type": "ownedRef" > }] > }, { > "name": "schema", > "typeName": "array<spark_column>", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false, > "constraints": [{ > "type": "ownedRef" > }] > }, { > "name": "provider", > "typeName": "string", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "partitionColumnNames", > "typeName": "array<string>", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "bucketSpec", > "typeName": "map<string,string>", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "owner", > "typeName": "string", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "createTime", > "typeName": "long", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "lastAccessTime", > "typeName": "long", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "properties", > "typeName": "map<string,string>", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "comment", > "typeName": "string", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }, { > "name": "unsupportedFeatures", > "typeName": "array<string>", > "isOptional": true, > "cardinality": "SINGLE", > "valuesMinCount": 0, > "valuesMaxCount": 1, > "isUnique": false, > "isIndexable": false, > "includeInNotification": false > }], > "superTypes": ["DataSet"], > "subTypes": [] > }], > "relationshipDefs": [] > } > > This is the request that errors: > > { > "referredEntities": null, > "entities": [{ > "typeName": "spark_table", > "attributes": { > "schema": [{ > "typeName": "spark_column", > "attributes": { > "metadata": "{}", > "nullable": true, > "qualifiedName": "local-1539114333944.default.atlas_bolke.col-i", > "name": "i", > "type": "integer" > }, > "guid": "-79141485348090", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "spark_column", > "attributes": { > "metadata": "{}", > "nullable": true, > "qualifiedName": "local-1539114333944.default.atlas_bolke.col-j", > "name": "j", > "type": "integer" > }, > "guid": "-79141485348091", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }], > "owner": "bolke", > "lastAccessTime": 0, > "unsupportedFeatures": [], > "qualifiedName": "local-1539114333944.default.atlas_bolke", > "storage": { > "typeName": "spark_storagedesc", > "attributes": { > "serde": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", > "qualifiedName": "local-1539114333944.default.atlas_bolke.storageFormat", > "compressed": false, > "locationUri": { > "typeName": "fs_path", > "attributes": { > "path": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "name": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke" > }, > "guid": "-79141485348089", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "inputFormat": > "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", > "outputFormat": > "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", > "properties": { > "serialization.format": "1" > } > }, > "guid": "-79141485348088", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "tableType": "MANAGED", > "partitionColumnNames": [], > "database": { > "typeName": "spark_db", > "attributes": { > "owner": "bolke", > "qualifiedName": "local-1539114333944.default", > "name": "default", > "description": "Default Hive database", > "locationUri": { > "typeName": "fs_path", > "attributes": { > "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse" > }, > "guid": "-79141485348087", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "properties": {} > }, > "guid": "-79141485348086", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "provider": "parquet", > "createTime": 1539114452000, > "name": "atlas_bolke", > "properties": { > "transient_lastDdlTime": "1539114452" > } > }, > "guid": "-79141485348092", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "spark_db", > "attributes": { > "owner": "bolke", > "qualifiedName": "local-1539114333944.default", > "name": "default", > "description": "Default Hive database", > "locationUri": { > "typeName": "fs_path", > "attributes": { > "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse" > }, > "guid": "-79141485348087", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "properties": {} > }, > "guid": "-79141485348086", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "fs_path", > "attributes": { > "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse", > "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse" > }, > "guid": "-79141485348087", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "spark_storagedesc", > "attributes": { > "serde": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", > "qualifiedName": "local-1539114333944.default.atlas_bolke.storageFormat", > "compressed": false, > "locationUri": { > "typeName": "fs_path", > "attributes": { > "path": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "name": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke" > }, > "guid": "-79141485348089", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, > "inputFormat": > "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", > "outputFormat": > "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", > "properties": { > "serialization.format": "1" > } > }, > "guid": "-79141485348088", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "fs_path", > "attributes": { > "path": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "qualifiedName": > "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke", > "name": > "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke" > }, > "guid": "-79141485348089", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "spark_column", > "attributes": { > "metadata": "{}", > "nullable": true, > "qualifiedName": "local-1539114333944.default.atlas_bolke.col-i", > "name": "i", > "type": "integer" > }, > "guid": "-79141485348090", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }, { > "typeName": "spark_column", > "attributes": { > "metadata": "{}", > "nullable": true, > "qualifiedName": "local-1539114333944.default.atlas_bolke.col-j", > "name": "j", > "type": "integer" > }, > "guid": "-79141485348091", > "status": null, > "createdBy": null, > "updatedBy": null, > "createTime": null, > "updateTime": null, > "version": 0, > "relationshipAttributes": null, > "classifications": null, > "meanings": null > }] > } > > > Can someone shed some light on this? > > Thanks > > Bolke >
