Hi,
I have the following code that is reading a table to a apache spark
DataFrame:
val df = spark.read.format("jdbc")
.option("url","jdbc:postgresql:host/database")
.option("dbtable","tablename").option("user","username")
.option("password", "password")
.load()
When I first invoke df.count() I get a smaller number than the next time I
invoke the same count method.
Why this happen?
Doesn't Spark load a snapshot of my table in a DataFrame on my Spark
Cluster when I first read that table?
My table on postgres keeps being fed and it seems my data frame is
reflecting this behavior.
How should I manage to load just a static snapshot my table to spark's
DataFrame by the time `read` method was invoked?
Any help is appreciated,
--
Saulo