+1 for dsls/scio.

Let me know how I can help there !

Thanks
Regards
JB

On 07/01/2016 08:43 PM, Neville Li wrote:
Looks like dsls/scio is the winner :)

I like it too plus we get to keep the Scio name. This also leaves room for
other Scala wrappers of different flavor.
Scio is a DSL in the domain of functional style data pipelines.

On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía <ieme...@gmail.com> wrote:

Just to summarize, at this point:

- Everybody agrees about the fact that scio is not an SDK.
- Almost everybody agrees that given the current choice they would prefer
‘dsls/scio’
- Some of us are not particularly married with the DSL classification.

I have a proposition to make, we can define two concepts with their given
structure in the Beam repository:

1. Beam API: A set of abstractions to program the complete Beam Model in a
given programming language.

These are idiomatic versions of the Beam Model, and ideally should cover
the complete Beam Model e.g. scio is one example. The directory structure
for Beam APIs could be:

apis/scala
apis/clojure
apis/groovy
...

2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
graphs, machine learning, etc

These represent domain specific idioms, e.g. a graph DSL would represent
graph concepts. e.g. edges, vertex, etc as first citizens. The directory
structure for Beam DSLs could be:

dsls/graph
dsls/ml
dsls/cep
...

Given these definitions for the concrete scio case I think the most
accurate directory would be:

apis/scala
or
apis/scala/scio

I personally prefer the first one (apis/scala) because we don’t have any
other scala API for the moment and because I think that we shouldn’t have
more than one API per language to avoid confusion e.g. imagine that someone
creates apis/java/bcollections to represent Beam Pipelines as distributed
collections, that would be confusing. However I understand the arguments
for the second directory e.g. to support different APIs per language, and
to preserve their original names (scio). Anyway I would be ok with any of
the two.

I excuse myself for this long message, and for not choosing any of the two
structures proposed in this thread, but I think it is important to be clear
about the differences in scope of both Beam APIs and DSLs in particular if
we think about new users.

What do you think, do you think my proposition makes sense, any suggestions
?

Regards,
Ismaël

ps. One last thing, I found this text that in part corroborates my feeling
about scio been an API and not a DSL:

“… a Scala Dataflow API (a nascent open-source version of which already
exists, and which seems likely to flower into maturity in due time given
Dataflow's move to join the ASF).”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <rang...@google.com.invalid>
wrote:

On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin
<dhalp...@google.com.invalid

wrote:

I love the
name scio. But I think sdks/scala might be most appropriate and would
make
it a first class citizen for Beam.


I am strongly against it being in the 'sdks/' top-level module -- it's
not
a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.


+1. I agree, it is not Beam SDK in that sense.

Raghu.



Where would a future python sdk reside?


The Python SDK is in the python-sdk branch on Apache already, and it
lives
in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to