I do think this is the right way, you will have to do testing with test data
verifying that the expected output of the calculation is the output.
Even if the logical Plan Is correct your calculation might not be. E.g. There
can be bugs in Spark, in the UI or (what is very often) the client
Hi All,
This is another issue that I was facing with the spark - s3 operability and
wanted to ask to the broader community if its faced by anyone else.
I have a rather simple aggregation query with a basic transformation. The
output however has lot of output partitions (20K partitions). The spark
Hi fellow Spark Devs,
If anyone here has some experience in spark kinesis streaming, would it be
possible to provide your thoughts on this pull request [1].
Some info:
The patch removes two important hard coded values for kinesis retries and
will make kinesis recovery from crashes more reliable.
If you just want to emulate pushing down a join, you can just wrap the IN
list query in a JDBCRelation directly:
scala> val r_df = spark.read.format("jdbc").option("url",
> "jdbc:h2:/tmp/testdb").option("dbtable", "R").load()
> r_df: org.apache.spark.sql.DataFrame = [A: int]
> scala> r_df.show
>
2017-04-06 4:00 GMT+02:00 Michael Segel :
> Just out of curiosity, what would happen if you put your 10K values in to a
> temp table and then did a join against it?
The answer is predicates pushdown.
In my case I'm using this kind of query on JDBC table and IN