Source: scikit-learn X-Debbugs-CC: t...@security.debian.org Severity: important Tags: security
Hi, The following vulnerability was published for scikit-learn. CVE-2024-5206[0]: | A sensitive data leakage vulnerability was identified in scikit- | learn's TfidfVectorizer, specifically in versions up to and | including 1.4.1.post1, which was fixed in version 1.5.0. The | vulnerability arises from the unexpected storage of all tokens | present in the training data within the `stop_words_` attribute, | rather than only storing the subset of tokens required for the TF- | IDF technique to function. This behavior leads to the potential | leakage of sensitive information, as the `stop_words_` attribute | could contain tokens that were meant to be discarded and not stored, | such as passwords or keys. The impact of this vulnerability varies | based on the nature of the data being processed by the vectorizer. https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8 (1.5.0rc1) If you fix the vulnerability please also make sure to include the CVE (Common Vulnerabilities & Exposures) id in your changelog entry. For further information see: [0] https://security-tracker.debian.org/tracker/CVE-2024-5206 https://www.cve.org/CVERecord?id=CVE-2024-5206 Please adjust the affected versions in the BTS as needed.