Let's say I use HashingTF in my Pipeline to hash a string feature. This is available in Python and Scala, but they hash strings to different values since both use their respective runtime's native hash implementation. This means that I create different feature vectors for the same input. While I can load/store something like a NaiveBayesModel across the two languages successfully, it seems like the hashing part doesn't translate.
Is that accurate, or, have I completely missed a way to get the same hashing for the same input across languages? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org