Big Data analytics

Maxim Veksler via swift-evolution Sat, 28 Oct 2017 09:45:24 -0700

Hey Guys,

The big data and machine learning world is dominated by Python, Scala an R.


I'm a Swifter by heart, but not so much by tools of trait.

I'd appreciate a constructive discussion on how that could be changed.

While R is a non goal for obvious reasons, i'd argue that since both Scala
and Python are general purpose languages, taking them head to head might be
a low hanging fruit.

To make the claim I'd like to reference to projects such as

 - Hadoop, Spark, Hive are all huge eco-systems which are entirely JVM
based.
 - Apache Parquet, a highly efficient column based storage format for big
data analytics which was implemented in Java, and C++.
 - Apache Arrow, a physical memory spec that big data systems can use to
allow zero transformations on data transferred between systems. Which (for
obvious reasons) focused on JVM, to C interoperability.

Python's Buffer Protocol which ensures it's predominance (for the time
being) as a prime candidate for data science related projects
https://jeffknupp.com/blog/2017/09/15/python-is-the-
fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/

While Swift's Memory Ownership manifesto touches similar turf discussing
copy on write and optimizing memory access overhead it IMHO takes a system
level perspective targeting projects such as kernel code. I'd suggest that
viewing the problem from an efficient CPU/GPU data crunching machine
perspective might shade a different light on the requirements and use
cases.


I'd be happy to learn more, and have a constructive discussion on the
subject.


Thank you,
Max.


-- 
puıɯ ʎɯ ɯoɹɟ ʇuǝs

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

[swift-evolution] [Discussion] Swift for Data Science / ML / Big Data analytics

Reply via email to