[ https://issues.apache.org/jira/browse/S2GRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chul Kang reassigned S2GRAPH-185: --------------------------------- Assignee: Chul Kang > Support Spark Structured Streaming to work with data in streaming and batch > --------------------------------------------------------------------------- > > Key: S2GRAPH-185 > URL: https://issues.apache.org/jira/browse/S2GRAPH-185 > Project: S2Graph > Issue Type: New Feature > Reporter: Chul Kang > Assignee: Chul Kang > Priority: Major > > By default, S2Graph will publish all edge/vertex requests to the Kafka in WAL > format. > In Kakao, S2Graph has been used as a master database to store all user's > activities, > I have been developing several ETL jobs that are suitable for these > use-cases, and I want to contribute them. > Use cases are as follows, > {code:java} > edge/vertex incoming through the Kafka save to other storages > - druid sink for slice and dice > - es sink for search > - file sink for store edge/vertex > ingest from various storage to s2graph > - MySQL binlog > - hdfs/hive/hbase > ETL job on edge/vertex data > - merge all user activities based on userId. > - generate statistical information > - apply ML library on graph data format > {code} > > Below are some simple requirements for this, > * supports both streaming/static source data processing > * computation flow is re-usable and sharing on streaming and batch > * operate by simple job description > > Spark Structured Streaming supports unified API for both streaming and batch > by using Dataframe/Dataset API from SparkSQL. > It allows the same operations to be executed on bounded/unbounded data > sources and guarantees exactly-once fault-tolerance. > Structured streaming provides several DataSource and Sink, and it supports > the implementation of the Source/Sink interface. > Using this, we can easily develop ETL Job that can be linked to various > repositories. > > Reference: > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)