You can use left anti join instead. isin accept a list type, not a column type.
On Mon, Jan 24, 2022 at 01:38 Bitfox <bit...@bitfox.top> wrote: > >>> df.show(3) > > +----+-----+ > > |word|count| > > +----+-----+ > > | on| 1| > > | dec| 1| > > |2020| 1| > > +----+-----+ > > only showing top 3 rows > > > >>> df2.show(3) > > +--------+-----+ > > |stopword|count| > > +--------+-----+ > > | able| 1| > > | about| 1| > > | above| 1| > > +--------+-----+ > > only showing top 3 rows > > > >>> df3=df.filter(~col("word").isin(df2.stopword )) > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter > > jdf = self._jdf.filter(condition._jc) > > File > "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line > 1310, in __call__ > > File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco > > raise converted from None > > pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4 > missing from word#0,count#1L in operator !Filter NOT word#0 IN > (stopword#4).; > > !Filter NOT word#0 IN (stopword#4) > > +- LogicalRDD [word#0, count#1L], false > > > > > > The filter method doesn't work here. > > Maybe I need a join for two DF? > > What's the syntax for this? > > > > Thank you and regards, > > Bitfox > -- Gary Liu