Re: may I need a join here?

Gary Liu Mon, 24 Jan 2022 04:50:09 -0800

You can use left anti join instead. isin accept a list type, not a column
type.


On Mon, Jan 24, 2022 at 01:38 Bitfox <bit...@bitfox.top> wrote:

> >>> df.show(3)
>
> +----+-----+
>
> |word|count|
>
> +----+-----+
>
> |  on|    1|
>
> | dec|    1|
>
> |2020|    1|
>
> +----+-----+
>
> only showing top 3 rows
>
>
> >>> df2.show(3)
>
> +--------+-----+
>
> |stopword|count|
>
> +--------+-----+
>
> |    able|    1|
>
> |   about|    1|
>
> |   above|    1|
>
> +--------+-----+
>
> only showing top 3 rows
>
>
> >>> df3=df.filter(~col("word").isin(df2.stopword ))
>
> Traceback (most recent call last):
>
>   File "<stdin>", line 1, in <module>
>
>   File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter
>
>     jdf = self._jdf.filter(condition._jc)
>
>   File
> "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line
> 1310, in __call__
>
>   File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco
>
>     raise converted from None
>
> pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4
> missing from word#0,count#1L in operator !Filter NOT word#0 IN
> (stopword#4).;
>
> !Filter NOT word#0 IN (stopword#4)
>
> +- LogicalRDD [word#0, count#1L], false
>
>
>
>
>
> The filter method doesn't work here.
>
> Maybe I need a join for two DF?
>
> What's the syntax for this?
>
>
>
> Thank you and regards,
>
> Bitfox
>
-- 
Gary Liu

Re: may I need a join here?

Reply via email to