Thanks Madhan,

We figured this out ourselves and it was indeed the issue (sorry for not 
reporting back). One thing which could be a lot better is the error message, as 
it was not at all helpful and actually pointed us in the wrong direction. 
Obviously handling this better in general would be appreciated.

Cheers
Bolke

> On 20 Oct 2018, at 03:23, Madhan Neethiraj <[email protected]> wrote:
> 
> Bolke,
>  
> This issue is caused by use of attribute named schema in spark_table, which 
> clashes with same named attribute in its super-type DataSet; type DataSet got 
> relationship attribute named schema from type-definitions in 
> 1065-avro_model.json (shown below).
>  
> We need to think thorough the details on handling such conflicts, which might 
> occur outside control of type authors.
>  
> In the meantime, here are couple of options you can try:
>   1. if you are not using types defined in 1065-avro_model.json, I would 
> suggest to remove these types and try. This should get you going.
>   2. other option is to rename attribute ‘spark_table.schema’ to say 
> ‘spark_table.columns’.
>  
> Hope this helps.
>  
> Madhan
>  
>  
>  
> Entity-def avro_schema: (from 1065-avro_model.json)
>         {
>             "name": "avro_schema",
>             "description": "Atlas Type representing Abstract Top-level Avro 
> Schema",
>             "superTypes": [
>                 "avro_record"
>             ],
>             "typeVersion": "1.0",
>             "attributeDefs": [
>                 {
>                     "name": "namespace",
>                     "typeName": "string",
>                     "cardinality": "SINGLE",
>                     "isIndexable": true,
>                     "isOptional": false,
>                     "isUnique": false
>                 },
>                 {
>                     "name": "associatedEntities",
>                     "typeName": "array<DataSet>",
>                     "cardinality": "LIST",
>                    "isIndexable": false,
>                     "isOptional": true,
>                     "isUnique": false
>                 }
>             ]
>         }
>  
> Relationship-def avro_schema_associatedEntities: (from 1065-avro_model.json)
>         {
>             "name": "avro_schema_associatedEntities",
>            "typeVersion": "1.0",
>             "relationshipCategory": "ASSOCIATION",
>             "endDef1": {
>                 "type": "avro_schema",
>                 "name": "associatedEntities",
>                 "isContainer": false,
>                 "cardinality": "SET",
>                 "isLegacyAttribute": true
>             },
>             "endDef2": {
>                 "type": "DataSet",
>                 "name": "schema", ç this adds a relationship attribute 
> ‘schema’ to DataSet type
>                 "isContainer": false,
>                 "cardinality": "SET"
>             },
>             "propagateTags": "NONE"
>         }
>  
>  
>  
>  
>  
> From: Bolke de Bruin <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Wednesday, October 10, 2018 at 6:42 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: invalid relationshipDef: avro_schema_associatedEntities: end 
> type 1: DataSet, end type 2: spark_table
>  
> This is the relevant debug from Atlas: 
>  
> 2018-10-10 13:27:31,717 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ Finding edges for vertex[id=8248 
> type=spark_table guid=b6183317-63c8-4147-a91f-cddef2e5fdab] with label 
> __spark_table.schema (GraphHelper:337)
> 2018-10-10 13:27:31,717 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ 
> getRelationshipEdgeLabel(avro_schema_associatedEntities) 
> (AtlasRelationshipStoreV2:757)
> 2018-10-10 13:27:31,718 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ Finding edges for vertex[id=8248 
> type=spark_table guid=b6183317-63c8-4147-a91f-cddef2e5fdab] with label 
> r:avro_schema_associatedEntities (GraphHelper:337)
> 2018-10-10 13:27:31,719 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ <== 
> AtlasErrorCode.getMessage([avro_schema_associatedEntities, DataSet, 
> spark_table]) (AtlasErrorCode:221)
> 2018-10-10 13:27:31,720 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ ==> 
> AtlasErrorCode.getMessage([avro_schema_associatedEntities, DataSet, 
> spark_table]): invalid relationshipDef: avro_schema_associatedEntities: end 
> type 1: DataSet, end type 2: spark_table (AtlasErrorCode:228)
> 2018-10-10 13:27:31,720 DEBUG - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ PERF|createOrUpdate()|1527 
> (AtlasPerfTracer:77)
> 2018-10-10 13:27:31,721 ERROR - [pool-1-thread-9 - 
> fc7a2150-23f3-4f53-8d09-420dbcbd4ffd:] ~ graph rollback due to exception  
> (GraphTransactionInterceptor:154)
>  
>  
> 
> 
>> On 10 Oct 2018, at 15:13, Bolke de Bruin <[email protected] 
>> <mailto:[email protected]>> wrote:
>>  
>> Hi, 
>>  
>> We are trying to use the Spark connector for Atlas and we are encountering 
>> an issue we do not understand. To reproduce use a clean Atlas installation 
>> and build the Atlas 1.0 connector from 
>> https://github.com/hortonworks-spark/spark-atlas-connector 
>> <https://github.com/hortonworks-spark/spark-atlas-connector> and use a Spark 
>> 2.3. Follow the instructions to add the listeners and then run
>>  
>> scala> 
>> Seq((1,2)).toDF("i","j").write.mode("overwrite").saveAsTable("default.atlas_bolke”)
>>  
>> This is the result
>>  
>> org.apache.atlas.AtlasServiceException: Metadata service API 
>> org.apache.atlas.AtlasClientV2$API_V2@1dfc2ecd failed with status 400 (Bad 
>> Request) Response Body 
>> ({"errorCode":"ATLAS-400-00-036","errorMessage":"invalid relationshipDef: 
>> avro_schema_associatedEntities: end type 1: DataSet, end type 2: 
>> spark_table”})
>>  
>> We have no clue why this error occurs. This relationship is not defined by 
>> the spark connector, neither is it referenced. This is the JSON dump of the 
>> entity definition of spark_table:
>>  
>>  
>> {
>> "enumDefs": [],
>> "structDefs": [],
>> "classificationDefs": [],
>> "entityDefs": [{
>> "category": "ENTITY",
>> "guid": "3bd9315c-f159-4865-ac8d-11dbcca79adc",
>> "createdBy": "admin",
>> "updatedBy": "admin",
>> "createTime": 1539112556115,
>> "updateTime": 1539112556115,
>> "version": 1,
>> "name": "spark_table",
>> "description": "spark_table",
>> "typeVersion": "1.0",
>> "attributeDefs": [{
>> "name": "qualifiedName",
>> "typeName": "string",
>> "isOptional": false,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 1,
>> "valuesMaxCount": 1,
>> "isUnique": true,
>> "isIndexable": true,
>> "includeInNotification": false
>> }, {
>> "name": "database",
>> "typeName": "spark_db",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "tableType",
>> "typeName": "string",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "storage",
>> "typeName": "spark_storagedesc",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false,
>> "constraints": [{
>> "type": "ownedRef"
>> }]
>> }, {
>> "name": "schema",
>> "typeName": "array<spark_column>",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false,
>> "constraints": [{
>> "type": "ownedRef"
>> }]
>> }, {
>> "name": "provider",
>> "typeName": "string",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "partitionColumnNames",
>> "typeName": "array<string>",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "bucketSpec",
>> "typeName": "map<string,string>",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "owner",
>> "typeName": "string",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "createTime",
>> "typeName": "long",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "lastAccessTime",
>> "typeName": "long",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "properties",
>> "typeName": "map<string,string>",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "comment",
>> "typeName": "string",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }, {
>> "name": "unsupportedFeatures",
>> "typeName": "array<string>",
>> "isOptional": true,
>> "cardinality": "SINGLE",
>> "valuesMinCount": 0,
>> "valuesMaxCount": 1,
>> "isUnique": false,
>> "isIndexable": false,
>> "includeInNotification": false
>> }],
>> "superTypes": ["DataSet"],
>> "subTypes": []
>> }],
>> "relationshipDefs": []
>> }
>>  
>> This is the request that errors:
>>  
>> {
>> "referredEntities": null,
>> "entities": [{
>> "typeName": "spark_table",
>> "attributes": {
>> "schema": [{
>> "typeName": "spark_column",
>> "attributes": {
>> "metadata": "{}",
>> "nullable": true,
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.col-i",
>> "name": "i",
>> "type": "integer"
>> },
>> "guid": "-79141485348090",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "spark_column",
>> "attributes": {
>> "metadata": "{}",
>> "nullable": true,
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.col-j",
>> "name": "j",
>> "type": "integer"
>> },
>> "guid": "-79141485348091",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }],
>> "owner": "bolke",
>> "lastAccessTime": 0,
>> "unsupportedFeatures": [],
>> "qualifiedName": "local-1539114333944.default.atlas_bolke",
>> "storage": {
>> "typeName": "spark_storagedesc",
>> "attributes": {
>> "serde": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.storageFormat",
>> "compressed": false,
>> "locationUri": {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "name": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke"
>> },
>> "guid": "-79141485348089",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "inputFormat": 
>> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
>> "outputFormat": 
>> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
>> "properties": {
>> "serialization.format": "1"
>> }
>> },
>> "guid": "-79141485348088",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "tableType": "MANAGED",
>> "partitionColumnNames": [],
>> "database": {
>> "typeName": "spark_db",
>> "attributes": {
>> "owner": "bolke",
>> "qualifiedName": "local-1539114333944.default",
>> "name": "default",
>> "description": "Default Hive database",
>> "locationUri": {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse"
>> },
>> "guid": "-79141485348087",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "properties": {}
>> },
>> "guid": "-79141485348086",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "provider": "parquet",
>> "createTime": 1539114452000,
>> "name": "atlas_bolke",
>> "properties": {
>> "transient_lastDdlTime": "1539114452"
>> }
>> },
>> "guid": "-79141485348092",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "spark_db",
>> "attributes": {
>> "owner": "bolke",
>> "qualifiedName": "local-1539114333944.default",
>> "name": "default",
>> "description": "Default Hive database",
>> "locationUri": {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse"
>> },
>> "guid": "-79141485348087",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "properties": {}
>> },
>> "guid": "-79141485348086",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse",
>> "name": "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse"
>> },
>> "guid": "-79141485348087",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "spark_storagedesc",
>> "attributes": {
>> "serde": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.storageFormat",
>> "compressed": false,
>> "locationUri": {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "name": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke"
>> },
>> "guid": "-79141485348089",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> },
>> "inputFormat": 
>> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
>> "outputFormat": 
>> "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
>> "properties": {
>> "serialization.format": "1"
>> }
>> },
>> "guid": "-79141485348088",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "fs_path",
>> "attributes": {
>> "path": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "qualifiedName": 
>> "file:/Users/bolke/Downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke",
>> "name": 
>> "/users/bolke/downloads/spark-2.3.2-bin-hadoop2.7/spark-warehouse/atlas_bolke"
>> },
>> "guid": "-79141485348089",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "spark_column",
>> "attributes": {
>> "metadata": "{}",
>> "nullable": true,
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.col-i",
>> "name": "i",
>> "type": "integer"
>> },
>> "guid": "-79141485348090",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }, {
>> "typeName": "spark_column",
>> "attributes": {
>> "metadata": "{}",
>> "nullable": true,
>> "qualifiedName": "local-1539114333944.default.atlas_bolke.col-j",
>> "name": "j",
>> "type": "integer"
>> },
>> "guid": "-79141485348091",
>> "status": null,
>> "createdBy": null,
>> "updatedBy": null,
>> "createTime": null,
>> "updateTime": null,
>> "version": 0,
>> "relationshipAttributes": null,
>> "classifications": null,
>> "meanings": null
>> }]
>> }
>>  
>>  
>> Can someone shed some light on this?
>>  
>> Thanks
>>  
>> Bolke

Reply via email to