Hi all,
I’ve been doing some work lately with Spark’s ML interfaces, which include
sparse and dense Vector and Matrix types, backed on the Scala side by
Breeze. Using these interfaces, you can construct DataFrames whose column
types are vectors and matrices, and though the API isn’t terribly rich,
As far as I know, there is an implementation of tensor type in C++/Python
already. Should we just finalize the spec and add implementation to Java?
On the Spark side, it's probably more complicated as Vector and Matrix are
not "first class" types in Spark SQL. Spark ML implements them as UDT
(user
The tensor type in the c++ api is a stand-alone object afaict, Phillip and
I were unable to construct an arrow column of them. I agree that it’s a
good starting point, one interpretation of what I’m suggesting is that we
take it as the reference implementation, add it to the spec, and write the
jav
> As far as I know, there is an implementation of tensor type in C++/Python
> already. Should we just finalize the spec and add implementation to Java?
There is nothing specified yet as far as a *column* of
ndarrays/tensors. We defined Tensor metadata for the purposes of
IPC/serialization but mad
My gut feeling is that such a column type should specify both the shape and
primitive type of all values in the column. I can’t think of a common use
case that requires differently shaped tensors in a single column.
Can anyone here come up with such a use case?
If not, I can try to draft a propos
What do people think whether "shape" should be included as a optional part
of schema metadata or a required part of the schema itself?
I feel having it be required might be too restrictive for interop with
other systems.
On Mon, Apr 9, 2018 at 9:13 PM, Leif Walsh wrote:
> My gut feeling is that
The simplest thing would be to have a "tensor" or "ndarray" type where
each cell has the same shape. This would amount to adding the current
"Tensor" Flatbuffers table to the Type union in
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L194
The benefit of having each cell having th
Thanks, I’ll create a jira and google doc. I agree those are the main
questions to iron out.
If there’s a desire to avoid scope creeping this in before 1.0, I think in
parallel I’ll start a conversation with the spark community about using the
existing FixedSizeBinary type plus some custom metadat