Re: Duplicates in Collaborative Filtering Output

2023-01-23 Thread Kartik Ohri
Hi again! Ironically, soon after sending the previous email I actually found the bug in our setup that was resulting in duplicates and it wasn't Mllib ALS after all. Sorry for the confusion. Regards. On Mon, Jan 23, 2023 at 1:09 PM Kartik Ohri wrote: > Hi! > > We are using Sp

Duplicates in Collaborative Filtering Output

2023-01-22 Thread Kartik Ohri
Hi! We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit feedback based collaborative filtering recommendation job. While looking at the output of recommendForUserSubset

Re: error: java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available

2021-08-07 Thread Kartik Ohri
Hi Mich! It looks like the issue comes from the BigQuery Connector and not Spark itself. For reference, see https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/256 and https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/350. These issues also mention a few

Re: Structuring a PySpark Application

2021-07-01 Thread Kartik Ohri
on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 30 Jun 2021 at 19:21, Kartik Ohri wrote: > >> Hi Mich! >> >&g

Re: Structuring a PySpark Application

2021-07-01 Thread Kartik Ohri
> Gourav Sengupta > > On Wed, Jun 30, 2021 at 3:47 PM Kartik Ohri > wrote: > >> Hi all! >> >> I am working on a Pyspark application and would like suggestions on how >> it should be structured. >> >> We have a number of possible jobs, organized in

Re: Structuring a PySpark Application

2021-06-30 Thread Kartik Ohri
of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 30 Jun 2021 at 18

Re: Structuring a PySpark Application

2021-06-30 Thread Kartik Ohri
> > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any mon

Structuring a PySpark Application

2021-06-30 Thread Kartik Ohri
Hi all! I am working on a Pyspark application and would like suggestions on how it should be structured. We have a number of possible jobs, organized in modules. There is also a " RequestConsumer