[ https://issues.apache.org/jira/browse/SPARK-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212049#comment-14212049 ]
Andrew Ash commented on SPARK-748: ---------------------------------- I agree this would be valuable -- almost like a "Spark Cookbook" of how to read and write data from various other systems. Step one is probably deciding what software to mention. Tentatively I propose: Spark Core - HDFS - HBase - Cassandra - Elasticsearch - JDBC, with examples for Postgres and MySQL - General Hadoop InputFormat Spark Streaming - Kafka - Flume - Storm For destination, this could go on the documentation included in the git repo and published to the Spark website, or on the Spark project wiki. I tend to prefer the former. A possible location for that could be http://spark.apache.org/docs/latest/programming-guide.html#external-datasets > Add documentation page describing interoperability with other software (e.g. > HBase, JDBC, Kafka, etc.) > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-748 > URL: https://issues.apache.org/jira/browse/SPARK-748 > Project: Spark > Issue Type: New Feature > Components: Documentation > Reporter: Josh Rosen > > Spark seems to be gaining a lot of data input / output features for > integrating with systems like HBase, Kafka, JDBC, Hadoop, etc. > It might be a good idea to create a single documentation page that provides a > list of all of the data sources that Spark supports and links to the relevant > documentation / examples / {{spark-users}} threads. This would help > prospective users to evaluate how easy it will be to integrate Spark with > their existing systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org