Hallo Andrew,
thanks for your hint.
Yes, that's way I've found too.
def createIndexMap(x : CheckpointedDrm[Int]) : RDD[(Int, Int)] = {
val xIndexFiltered = x.rdd
.filter(r => r._2.get(0) > 0)
.map(r => r._1)
xIndexFiltered.zipWithIndex
.map(r => (r._1,r._2.toInt))
}
First, I filter the DRM and create a map with old and new indexes, as you
mentioned.
By appling joins this index map, I'm can reduce the rows in my DRM according to
certain condition, do some more calculation and map back the newly calculated
values to the original DRM.
Like:
def mergeDrm(drmOrig : CheckpointedDrm[Int],drmFiltriert :
CheckpointedDrm[Int], indexMapping: RDD[(Int, Int)]) : CheckpointedDrm[Int] = {
drmWrap (
drmOrig.rdd
.map(r => Pair(r._1, r._2))
.leftOuterJoin(indexMapping.map(r => Pair(r._1, r._2)))
.map(r=> Pair(r._2._2, (r._1, r._2._1)))
.leftOuterJoin(drmFiltriert.rdd.map(r => Pair(Option(r._1), r._2)))
.map(r=> (r._2._1._1, r._2._2.getOrElse(r._2._1._2)))
)
}
Greets
Kuno
-----Ursprüngliche Nachricht-----
Von: Andrew Musselman <[email protected]>
Gesendet: Dienstag, 7. Juli 2020 23:16
An: [email protected]
Betreff: Re: How to do logical subsetting in Mathout
Kuno, thanks for your note. I don't know of an equivalent function out of the
box, but if you want to get the indices where a condition is true you could try
something in Scala like:
myList.zipWithIndex.collect { case (item, index) if item > 1 => index }
Hope this is helpful.
On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
[email protected]> wrote:
> Hi all,
>
> I've pumped into the Mahout, because I need to migrate a R Script
> including matric algebra to Spark Cluster.
>
> Mahouts Scala/Spark Binding provides all of the operations, except of
> logical subsetting.
>
> Example:
>
> x1 = c(1.0,4.0,2.0,5.0)
> x2 = c(0,0,0,0)
> x2[x1 > 1] = 2
>
> Would set value 2 to return Row 2,3 and 4.
>
> Is there an equivalent function in Mahout?
>
>
> Thanks.
>
> Kuno
>
>