Crawlee v2.3.0 Release Notes

Release Date: 2022-04-07 // about 2 years ago
    • feat: accept more social media patterns (#1286)
    • ๐Ÿ‘ feat: add multiple click support to enqueueLinksByClickingElements (#1295)
    • ๐Ÿ”ง feat: instance-scoped "global" configuration (#1315)
    • feat: requestList accepts proxyConfiguration for requestsFromUrls (#1317)
    • โšก๏ธ feat: update playwright to v1.20.2
    • โšก๏ธ feat: update puppeteer to v13.5.2 > We noticed that with this version of puppeteer actor run could crash with > We either navigate top level or have old version of the navigated frame error > (puppeteer issue here). > It should not happen while running the browser in headless mode. > In case you need to run the browser in headful mode (headless: false), > we recommend pinning puppeteer version to 10.4.0 in actor package.json file.
    • ๐Ÿ—„ feat: stealth deprecation (#1314)
    • feat: allow passing a stream to KeyValueStore.setRecord (#1325)
    • ๐Ÿ›  fix: use correct apify-client instance for snapshotting (#1308)
    • ๐Ÿ›  fix: automatically reset RequestQueue state after 5 minutes of inactivity, closes #997
    • ๐Ÿ›  fix: improve guessing of chrome executable path on windows (#1294)
    • ๐Ÿ›  fix: prune CPU snapshots locally (#1313)
    • ๐Ÿ›  fix: improve browser launcher types (#1318)

    0 concurrency mitigation

    ๐Ÿš€ This release should resolve the 0 concurrency bug by automatically resetting the internal RequestQueue state after 5 minutes of inactivity.

    We now track last activity done on a RequestQueue instance:

    • โž• added new request
    • started processing a request (added to inProgress cache)
    • marked request as handled
    • reclaimed request

    If we don't detect one of those actions in last 5 minutes, and we have some requests in the inProgress cache, we try to reset the state. We can override this limit via CRAWLEE_INTERNAL_TIMEOUT env var.

    This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the inProgress cache.