Crawlee v2.3.0 Release Notes
Release Date: 2022-04-07 // about 2 years ago-
- feat: accept more social media patterns (#1286)
- ๐ feat: add multiple click support to
enqueueLinksByClickingElements
(#1295) - ๐ง feat: instance-scoped "global" configuration (#1315)
- feat: requestList accepts proxyConfiguration for requestsFromUrls (#1317)
- โก๏ธ feat: update
playwright
to v1.20.2 - โก๏ธ feat: update
puppeteer
to v13.5.2 > We noticed that with this version of puppeteer actor run could crash with >We either navigate top level or have old version of the navigated frame
error > (puppeteer issue here). > It should not happen while running the browser in headless mode. > In case you need to run the browser in headful mode (headless: false
), > we recommend pinning puppeteer version to10.4.0
in actorpackage.json
file. - ๐ feat: stealth deprecation (#1314)
- feat: allow passing a stream to KeyValueStore.setRecord (#1325)
- ๐ fix: use correct apify-client instance for snapshotting (#1308)
- ๐ fix: automatically reset
RequestQueue
state after 5 minutes of inactivity, closes #997 - ๐ fix: improve guessing of chrome executable path on windows (#1294)
- ๐ fix: prune CPU snapshots locally (#1313)
- ๐ fix: improve browser launcher types (#1318)
0 concurrency mitigation
๐ This release should resolve the 0 concurrency bug by automatically resetting the internal
RequestQueue
state after 5 minutes of inactivity.We now track last activity done on a
RequestQueue
instance:- โ added new request
- started processing a request (added to
inProgress
cache) - marked request as handled
- reclaimed request
If we don't detect one of those actions in last 5 minutes, and we have some requests in the
inProgress
cache, we try to reset the state. We can override this limit viaCRAWLEE_INTERNAL_TIMEOUT
env var.This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the
inProgress
cache.