Hello
As per documentation, lapply works on single records and lapplyPartition
works on partition
However the format of output does not change
When I use lapplypartition, the data is converted to vertical format
Here is my code
library(SparkR)
sc <- sparkR.init("local")
lines <- textFile(sc,"/sparkdev/datafiles/covariance.txt")
totals <- lapply(lines, function(lines)
{
sumx <- 0
sumy <- 0
totaln <- 0
for (i in 1:length(lines)){
dataxy <- unlist(strsplit(lines[i], ","))
sumx <- sumx + as.numeric(dataxy[1])
sumy <- sumy + as.numeric(dataxy[2])
}
##list(as.numeric(sumx), as.numeric(sumy), as.numeric(sumxy),
as.numeric(totaln))
##list does same as below
c(sumx,sumy)
}
)
output <- collect(totals)
for (element in output) {
cat(as.character(element[1]),as.character(element[2]), "\n")
}
I am expecting output as 55, 55
However it is giving
55,NA
55,NA
Where am I going wrong ?
Thanks
Pranay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-lapplyPartition-transforms-the-data-in-vertical-format-tp11540.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]