scrapy@2.2.1 vulnerabilities

A high-level Web Crawling and Web Scraping framework

Direct Vulnerabilities

Known vulnerabilities in the scrapy package. This does not include vulnerabilities belonging to this package’s dependencies.

Automatically find and fix vulnerabilities affecting your projects. Snyk scans for vulnerabilities and provides fixes for free.
Fix for free
Vulnerability Vulnerable Version
  • M
URL Redirection to Untrusted Site ('Open Redirect')

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to URL Redirection to Untrusted Site ('Open Redirect') due to the improper handling of scheme-specific proxy settings during HTTP redirects. An attacker can potentially intercept sensitive information by exploiting the failure to switch proxies when redirected from HTTP to HTTPS URLs or vice versa.

How to fix URL Redirection to Untrusted Site ('Open Redirect')?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Files or Directories Accessible to External Parties

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Files or Directories Accessible to External Parties via the DOWNLOAD_HANDLERS setting. An attacker can redirect traffic to unintended protocols such as file:// or s3://, potentially accessing sensitive data or credentials by manipulating the start URLs of a spider and observing the output.

Notes:

  1. HTTP redirects should only work between URLs that use the http:// or https:// schemes.

  2. A malicious actor, given write access to the start requests of a spider and read access to the spider output, could exploit this vulnerability to:

a) Redirect to any local file using the file:// scheme to read its contents.

b) Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.

c) Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.

  1. A spider that always outputs the entire contents of a response would be completely vulnerable.

  2. A spider that extracted only fragments from the response could significantly limit vulnerable data.

How to fix Files or Directories Accessible to External Parties?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • M
Exposure of Sensitive Information to an Unauthorized Actor

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor due to improper handling of HTTP headers during cross-origin redirects. An attacker can intercept the Authorization header and potentially access sensitive information by exploiting this misconfiguration in redirect scenarios where the domain remains the same but the scheme or port changes.

Note: In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header.

How to fix Exposure of Sensitive Information to an Unauthorized Actor?

Upgrade Scrapy to version 2.11.2 or higher.

[,2.11.2)
  • H
Information Exposure Through Sent Data

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure Through Sent Data due to the failure to remove the Authorization header when redirecting across domains. An attacker can potentially allow for account hijacking by exploiting the exposure of the Authorization header to unauthorized actors.

How to fix Information Exposure Through Sent Data?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • H
Regular Expression Denial of Service (ReDoS)

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) when parsing content. An attacker can cause extreme CPU and memory usage by handling a malicious response.

How to fix Regular Expression Denial of Service (ReDoS)?

Upgrade Scrapy to version 2.11.1 or higher.

[,2.11.1)
  • H
Improper Resource Shutdown or Release

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Improper Resource Shutdown or Release due to the enforcement of response size limits only during the download of raw, usually-compressed response bodies and not during decompression. A malicious website being scraped could send a small response that, upon decompression, could exhaust the memory available to the process, potentially affecting any other process sharing that memory, and affecting disk usage in case of uncompressed response caching.

How to fix Improper Resource Shutdown or Release?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • H
Regular Expression Denial of Service (ReDoS)

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via the XMLFeedSpider class or any subclass that uses the default node iterator iternodes, as well as direct uses of the scrapy.utils.iterators.xmliter function. An attacker can cause extreme CPU and memory usage during the parsing of its content by handling a malicious response.

Note:

For versions 2.6.0 to 2.11.0, the vulnerable function is open_in_browser for a response without a base tag.

How to fix Regular Expression Denial of Service (ReDoS)?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • H
Origin Validation Error

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Origin Validation Error due to the improper handling of the Authorization header during cross-domain redirects. An attacker can leak sensitive information by inducing the server to redirect a request with the Authorization header to a different domain.

How to fix Origin Validation Error?

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

[,1.8.4) [2.0.0,2.11.1)
  • M
Credential Exposure

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Credential Exposure via the process_request() function in downloadermiddlewares/httpproxy.py. A proxy can leak credentials to another proxy if third-party downloader middlewares leave Proxy-Authentication headers unchanged when updating proxy metadata for a new request.

NOTE: To fully mitigate the effects of vulnerability, replacing or upgrading the third-party downloader middleware might be necessary after upgrading.

How to fix Credential Exposure?

Upgrade Scrapy to version 1.8.3, 2.6.2 or higher.

[,1.8.3) [2.0.0,2.6.2)
  • H
Information Exposure

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure via responses from domain names whose public domain name suffix contains 1 or more periods are able to set cookies that are included in requests to any other domain sharing the same domain name suffix.

How to fix Information Exposure?

Upgrade Scrapy to version 1.8.2, 2.6.0 or higher.

[,1.8.2) [2.0.0,2.6.0)
  • M
Information Exposure

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure in which a spider could leak cookie headers when being forwarded to a third party, potentially attacker-controlled website.

How to fix Information Exposure?

Upgrade Scrapy to version 2.6.0 or higher.

[,2.6.0)
  • M
Information Exposure

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure. If you use HttpAuthMiddleware (i.e. the http_user and http_pass spider attributes) for HTTP authentication, all requests will expose your credentials to the request target. This includes requests generated by Scrapy components, such as robots.txt requests sent by Scrapy when the ROBOTSTXT_OBEY setting is set to True, or as requests reached through redirects.

How to fix Information Exposure?

Upgrade Scrapy to version 2.5.1, 1.8.1 or higher.

[2.0.0,2.5.1) [,1.8.1)
  • M
Denial of Service (DoS)

via S3FilesStore. Files are stored in memory before uploaded to s3, increasing memory usage if giant or many files are being uploaded at the same time.

[0,)