scrapy@1.8.4 vulnerabilities

A high-level Web Crawling and Web Scraping framework

Direct Vulnerabilities

Known vulnerabilities in the scrapy package. This does not include vulnerabilities belonging to this package’s dependencies.

Automatically find and fix vulnerabilities affecting your projects. Snyk scans for vulnerabilities and provides fixes for free.
Fix for free
Vulnerability Vulnerable Version
  • M
URL Redirection to Untrusted Site ('Open Redirect')

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to URL Redirection to Untrusted Site ('Open Redirect') due to the improper handling of scheme-specific proxy settings during HTTP redirects. An attacker can potentially intercept sensitive information by exploiting the failure to switch proxies when redirected from HTTP to HTTPS URLs or vice versa.

How to fix URL Redirection to Untrusted Site ('Open Redirect')?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Files or Directories Accessible to External Parties

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Files or Directories Accessible to External Parties via the DOWNLOAD_HANDLERS setting. An attacker can redirect traffic to unintended protocols such as file:// or s3://, potentially accessing sensitive data or credentials by manipulating the start URLs of a spider and observing the output.

Notes:

  1. HTTP redirects should only work between URLs that use the http:// or https:// schemes.

  2. A malicious actor, given write access to the start requests of a spider and read access to the spider output, could exploit this vulnerability to:

a) Redirect to any local file using the file:// scheme to read its contents.

b) Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.

c) Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.

  1. A spider that always outputs the entire contents of a response would be completely vulnerable.

  2. A spider that extracted only fragments from the response could significantly limit vulnerable data.

How to fix Files or Directories Accessible to External Parties?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Exposure of Sensitive Information to an Unauthorized Actor

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor due to improper handling of HTTP headers during cross-origin redirects. An attacker can intercept the Authorization header and potentially access sensitive information by exploiting this misconfiguration in redirect scenarios where the domain remains the same but the scheme or port changes.

Note: In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header.

How to fix Exposure of Sensitive Information to an Unauthorized Actor?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • H
Information Exposure Through Sent Data

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure Through Sent Data due to the failure to remove the Authorization header when redirecting across domains. An attacker can potentially allow for account hijacking by exploiting the exposure of the Authorization header to unauthorized actors.

How to fix Information Exposure Through Sent Data?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • H
Regular Expression Denial of Service (ReDoS)

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) when parsing content. An attacker can cause extreme CPU and memory usage by handling a malicious response.

How to fix Regular Expression Denial of Service (ReDoS)?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • M
Information Exposure

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure in which a spider could leak cookie headers when being forwarded to a third party, potentially attacker-controlled website.

How to fix Information Exposure?

Upgrade Scrapy to version 2.6.0 or higher.

[,2.6.0)
  • M
Denial of Service (DoS)

via S3FilesStore. Files are stored in memory before uploaded to s3, increasing memory usage if giant or many files are being uploaded at the same time.

[0,)