Re: what are the implications of setting `schemaSampleSize = -1` on cloudant connector?

Esteban M Laver Tue, 16 May 2017 08:15:58 -0700

Hi Chris,

When schemaSampleSize is set to -1, the connector will scan all the documents in the database. +1 scans only the first document. Using the value -1 would add the most overhead. N number of documents will scan an arbitrary number of documents in the database (if N is greater than the number of documents in the database, we will apply -1). 0 or any non-integer value is not permitted and will result in an error. Below is an example of adding the setting directly to your Spark Context:

spark = SparkSession\

.builder\

.appName("Multiple schema test")\

.config("cloudant.host","ACCOUNT.cloudant.com")\

.config("cloudant.username", "USERNAME")\

.config("cloudant.password","PASSWORD")\

.config("jsonstore.rdd.schemaSampleSize", -1)\

.getOrCreate()

And, Here is how the option can be used for a local setting applied to a single RDD:

spark.sql("CREATE TEMPORARY TABLE schema-test USING com.cloudant.spark OPTIONS ( schemaSampleSize '10',database 'schema-test')")

schemaTestTable = spark.sql("SELECT * FROM schema-test")

This and some additional information can be found here: https://github.com/cloudant-labs/spark-cloudant#schema-variance. This information will soon be added to the bahir/sql-cloudant project.

Thanks,

Esteban

Re: what are the implications of setting `schemaSampleSize = -1` on cloudant connector?

Reply via email to