jornfranke opened a new issue, #926: URL: https://github.com/apache/sedona/issues/926
## Expected behavior I have a spatial dataset with points which I load from a parquet file. Essentially it has an id, longitude and latitude. It does not matter really, even a small one with a few points (e.g. 5) I join the spatial dataset with a Raster file Geotiff (https://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/FLOODS/EuropeanMaps/floodMap_RP100.zip , overview page: https://data.jrc.ec.europa.eu/dataset/1d128b6c-a4ee-4858-9e34-6210707f3c81) . Then, I need to get the value of the raster for each point. The code is the following ``` df = spark.read.parquet("path/to/dataset/with/longitude_latitude") df.createOrReplaceTempView("pointDF") df = spark.sql('SELECT id, ST_Point(CAST(longitude AS Decimal(24,20)), CAST(latitude AS Decimal(24,20))) as geometry FROM pointDF').withColumnRenamed("geometry","geometry_points") pointDf = df.repartition("id") rasterDf = spark.read.format("binaryFile").load("path/to/raster/floodmap_EFAS_RP100_C.tif")\ .withColumn(f"raster", expr(f"RS_FromGeoTiff(content})")) pointDf=pointDf.join(rasterDf)\ .withColumn(f"raster_value",expr(f"RS_Value(raster,geometry_points)"))\ .drop(f"raster",f"content") pointDf.show(2) ``` This should work in reasonable time. ## Actual behavior Even for very few points it take ages to get the value (> 10 min) on a very powerful cluster (although it is not even remotely consumed). For other rasters (much smaller, < 2MB) this works perfectly reasonable fast - even for million of points. ## Steps to reproduce the problem See description above ## Settings Sedona version = 1.4.1 Apache Spark version = 3.2.0 Apache Flink version = not used API type = Scala, Java, Python? Scala version = 2.12 JRE version = 1.8 Python version = 3.9 Environment = Cloudera CDP -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org