scrapy@2.7.1 vulnerabilities

A high-level Web Crawling and Web Scraping framework

Direct Vulnerabilities

Known vulnerabilities in the scrapy package. This does not include vulnerabilities belonging to this package’s dependencies.

Automatically find and fix vulnerabilities affecting your projects. Snyk scans for vulnerabilities and provides fixes for free.
Fix for free
Vulnerability Vulnerable Version
  • M
URL Redirection to Untrusted Site ('Open Redirect')

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to URL Redirection to Untrusted Site ('Open Redirect') due to the improper handling of scheme-specific proxy settings during HTTP redirects. An attacker can potentially intercept sensitive information by exploiting the failure to switch proxies when redirected from HTTP to HTTPS URLs or vice versa.

How to fix URL Redirection to Untrusted Site ('Open Redirect')?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Files or Directories Accessible to External Parties

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Files or Directories Accessible to External Parties via the DOWNLOAD_HANDLERS setting. An attacker can redirect traffic to unintended protocols such as file:// or s3://, potentially accessing sensitive data or credentials by manipulating the start URLs of a spider and observing the output.

Notes:

  1. HTTP redirects should only work between URLs that use the http:// or https:// schemes.

  2. A malicious actor, given write access to the start requests of a spider and read access to the spider output, could exploit this vulnerability to:

a) Redirect to any local file using the file:// scheme to read its contents.

b) Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.

c) Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.

  1. A spider that always outputs the entire contents of a response would be completely vulnerable.

  2. A spider that extracted only fragments from the response could significantly limit vulnerable data.

How to fix Files or Directories Accessible to External Parties?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Exposure of Sensitive Information to an Unauthorized Actor

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor due to improper handling of HTTP headers during cross-origin redirects. An attacker can intercept the Authorization header and potentially access sensitive information by exploiting this misconfiguration in redirect scenarios where the domain remains the same but the scheme or port changes.

Note: In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header.

How to fix Exposure of Sensitive Information to an Unauthorized Actor?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • H
Information Exposure Through Sent Data

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure Through Sent Data due to the failure to remove the Authorization header when redirecting across domains. An attacker can potentially allow for account hijacking by exploiting the exposure of the Authorization header to unauthorized actors.

How to fix Information Exposure Through Sent Data?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • H
Regular Expression Denial of Service (ReDoS)

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) when parsing content. An attacker can cause extreme CPU and memory usage by handling a malicious response.

How to fix Regular Expression Denial of Service (ReDoS)?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • H
Improper Resource Shutdown or Release

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Improper Resource Shutdown or Release due to the enforcement of response size limits only during the download of raw, usually-compressed response bodies and not during decompression. A malicious website being scraped could send a small response that, upon decompression, could exhaust the memory available to the process, potentially affecting any other process sharing that memory, and affecting disk usage in case of uncompressed response caching.

How to fix Improper Resource Shutdown or Release?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • H
Regular Expression Denial of Service (ReDoS)

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via the XMLFeedSpider class or any subclass that uses the default node iterator iternodes, as well as direct uses of the scrapy.utils.iterators.xmliter function. An attacker can cause extreme CPU and memory usage during the parsing of its content by handling a malicious response.

Note:

For versions 2.6.0 to 2.11.0, the vulnerable function is open_in_browser for a response without a base tag.

How to fix Regular Expression Denial of Service (ReDoS)?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • H
Origin Validation Error

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Origin Validation Error due to the improper handling of the Authorization header during cross-domain redirects. An attacker can leak sensitive information by inducing the server to redirect a request with the Authorization header to a different domain.

How to fix Origin Validation Error?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • M
Denial of Service (DoS)

via S3FilesStore. Files are stored in memory before uploaded to s3, increasing memory usage if giant or many files are being uploaded at the same time.

[0,)