Hi to all,
I'm the maintainer of the JDBC driver OrientDB.
We are trying to fetch data to Spark from an orientDB using theJDBC driver.
I'm facing some issues:
To gather metadata spark performs a "test query" of this form: select
* from TABLE_NAME whre 1=0
For this case, I write a workaround inside the driver, getting rid of
where 1=0 and replaging it with LIMIT 1.
After that query, it then performs a query with each field wrapped by
double quote:
SELECT "stringKey", "intKey" FROM Item
In orientDB's SQL dialect a double quote means a string value, so for
each record of the result set it will return stringkey and intKey as
vaules.
row 1: stringKey:strinKey, intKey:intKey
row 2: stringKey:strinKey, intKey:intKey
row 3: stringKey:strinKey, intKey:intKey
....
Is there a way to configure SqlContext to avoid the double quoting of
fields names?
I'm using Java with spark 1.6.2:
Map<String, String> options = new HashMap<String, String>() {{
put("url", "jdbc:orient:plocal:./target/databases/sparkTest");
put("dbtable", "Item");
}};
SQLContext sqlCtx = new SQLContext(ctx);
DataFrame jdbcDF = sqlCtx.read().format("jdbc").options(options).load();
I found that someone has the same problem with SAS JDBC.
As a workaround I will implement a query cleaner inside the driver,
but an option to configure the quoting char would be better.
Regards,
RF
--
Roberto Franchini
"The impossible is inevitable"
https://github.com/robfrank/ https://twitter.com/robfrankie
hangout:ro.franchini skype:ro.franchini
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]