Hi,

Which version of Spark are you using?

UUID is generated by DB through OS low level call and it is 36 characters

UUID=$(uuidgen)
echo $UUID
ef080790-4c3f-4a5f-8db7-1024338d34f2


in other words string will do it or VARCHAR(36)

When you run that SQL directly on the database itself what do you get?

The alternative is to make two calls directly via JDBC to the underlying
database, get the data back into DF from those two tables and do the join
in Pyspark itself as a test

Spark connection tyo and DB which allows JDBC is generic

def loadTableFromJDBC(spark, url, tableName, user, password, driver,
fetchsize):
    try:
       df = spark.read. \
            format("jdbc"). \
            option("url", url). \
            option("dbtable", tableName). \
            option("user", user). \
            option("password", password). \
            option("driver", driver). \
            option("fetchsize", fetchsize). \
            load()
       return df
    except Exception as e:
        print(f"""{e}, quitting""")
        sys.exit(1)

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 16 Apr 2021 at 12:21, Alun ap Rhisiart <cod...@alunaprhisiart.uk>
wrote:

> I’m just starting using PySpark (Databricks) for a education application.
> Part of this is monitoring children’s online behaviour to alert teachers
> whether there may be problems with bullying, extreme diets, suicide
> ideation, and so on. I have IoT data which I need to combine with
> information from MariaDB (this is all in Azure). I have SparkJDBC42 and
> mariadb_java_client_2_7_2 jars installed. The connection to the database is
> established, in that I can see it can retrieve the schema for tables.
>
> I have a couple of issues. The first is that I can never retrieve any id
> columns (which are all defined as BigInt(20)), as I get a ‘long out of
> range’ error. I’m currently working around that by not including the ids
> themselves in the return. However, the big problem is that I get the column
> names returned in each row instead of the values for each row, where the
> columns are defined as strings (VARCHAR etc). Also, for columns defined as
> TinyInt they are returned as booleans, but reversed (0 is returned as
> True). I have tried running  the SQL outside of databricks/Spark (eg in
> DataGrip) and it returns perfectly sensible data every time.
>
> The code at gist:412e1f3324136a574303005a0922f610
> <https://gist.github.com/alunap/412e1f3324136a574303005a0922f610>
>
>
> Returned:
> +----+------+----+-----------+----+--------------+ |uuid|gender|
> cpp|young_carer| spp|asylum_refugee|
> +----+------+----+-----------+----+--------------+ |uuid|gender|true|
> true|true| true| |uuid|gender|true| true|true| true| |uuid|gender|true|
> true|true| true| |uuid|gender|true| true|true| true| |uuid|gender|true|
> true|true| true| |uuid|gender|true| true|true| true| |uuid|gender|true|
> true|true| true| |uuid|gender|true| true|true| true| |uuid|gender|true|
> true|true| true| |uuid|gender|true| true|true| true|
> +----+------+----+-----------+----+--------------+ only showing top 10 rows
>
> On the database, device.uuid field is VARCHAR(255) and contains valid
> uuids (no nulls).
> children.gender is VARCHAR(255) and contains ‘M’, ‘F’, ‘MALE’, ‘FEMALE’,
> ‘NONE’, or null.
> children.cpp, young_carer, spp, and asylum_refugee are all tinyint(1) = 0.
> They are nearly all 0, but the first 10 rows contain some nulls.
>
> I tried enclosing the query with brackets ‘(SELECT…) t’ as I gather it is
> a subquery, and I tried adding a WHERE d.uuid = ‘an id’ with an id being
> one where there are no nulls in the column, but no difference. So,
> completely baffled at this point.
>
> Thanks for any suggestions,
>
> Alun ap Rhisiart
>

Reply via email to