Crawlee v2.2.0 release notes (2021-12-17)

« Changelog History

Crawlee v2.2.0 Release Notes

Release Date: 2021-12-17 // over 2 years ago

Proxy per page

💻 Up until now, browser crawlers used the same session (and therefore the same proxy) for 💻 all request from a single browser * now get a new proxy for each session. This means that with incognito pages, each page will get a new proxy, aligning the behaviour with CheerioCrawler.

0️⃣ This feature is not enabled by default. To use it, we need to enable useIncognitoPages flag under launchContext:
```
new Apify.Playwright({
    launchContext: {
        useIncognitoPages: true,
    },
    // ...
})
```
🐎 > Note that currently there is a performance overhead for using useIncognitoPages.

Use this flag at your own will.

0️⃣ We are planning to enable this feature by default in SDK v3.0.

Abortable timeouts

Previously when a page function timed out, the task still kept running. This could lead to requests being processed multiple times. In v2.2 we now have abortable timeouts that will cancel the task as early as possible.

Mitigation of zero concurrency issue

Several new timeouts were added to the task function, which should help mitigate the zero concurrency bug. Namely fetching of next request information and reclaiming failed requests back to the queue ⏱ are now executed with a timeout with 3 additional retries before the task fails. The timeout is always at least 300s (5 minutes), or requestHandlerTimeoutSecs if that value is higher.

Full list of changes
- 🛠 fix RequestError: URI malformed in cheerio crawler (#1205)
- only provide Cookie header if cookies are present (#1218)
- 🖐 handle extra cases for diffCookie (#1217)
- ➕ add timeout for task function (#1234)
- 💻 implement proxy per page in browser crawlers (#1228)
- ➕ add fingerprinting support (#1243)
- implement abortable timeouts (#1245)
- ➕ add timeouts with retries to runTaskFunction() (#1250)
- automatically convert google spreadsheet URLs to CSV exports (#1255)

Crawlee v2.2.0

Version Release Notes from December 17, 2021 (over 2 years ago)

« Changelog History

Crawlee v2.2.0 Release Notes

Proxy per page

Abortable timeouts

Mitigation of zero concurrency issue

Full list of changes