You can get a SchemaRDD from the Hive table, map it into a RDD of Vectors, and then construct a RowMatrix. The transformations are lazy, so there is no external storage requirement for intermediate data. -Xiangrui
On Sun, Jan 18, 2015 at 4:07 AM, guxiaobo1982 <guxiaobo1...@qq.com> wrote: > Hi, > > We have large datasets with data format for Spark MLLib matrix, but there > are pre-computed by Hive and stored inside Hive, my question is can we > create a distributed matrix such as IndexedRowMatrix directlly from Hive > tables, avoiding reading data from Hive tables and feed them into an empty > Matrix. > > Regards > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org