All Versions
23
Latest Version
Avg Release Cycle
21 days
Latest Release
526 days ago

Changelog History
Page 2

  • v2.3.0 Changes

    April 07, 2022
    • feat: accept more social media patterns (#1286)
    • πŸ‘ feat: add multiple click support to enqueueLinksByClickingElements (#1295)
    • πŸ”§ feat: instance-scoped "global" configuration (#1315)
    • feat: requestList accepts proxyConfiguration for requestsFromUrls (#1317)
    • ⚑️ feat: update playwright to v1.20.2
    • ⚑️ feat: update puppeteer to v13.5.2 > We noticed that with this version of puppeteer actor run could crash with > We either navigate top level or have old version of the navigated frame error > (puppeteer issue here). > It should not happen while running the browser in headless mode. > In case you need to run the browser in headful mode (headless: false), > we recommend pinning puppeteer version to 10.4.0 in actor package.json file.
    • πŸ—„ feat: stealth deprecation (#1314)
    • feat: allow passing a stream to KeyValueStore.setRecord (#1325)
    • πŸ›  fix: use correct apify-client instance for snapshotting (#1308)
    • πŸ›  fix: automatically reset RequestQueue state after 5 minutes of inactivity, closes #997
    • πŸ›  fix: improve guessing of chrome executable path on windows (#1294)
    • πŸ›  fix: prune CPU snapshots locally (#1313)
    • πŸ›  fix: improve browser launcher types (#1318)

    0 concurrency mitigation

    πŸš€ This release should resolve the 0 concurrency bug by automatically resetting the internal RequestQueue state after 5 minutes of inactivity.

    We now track last activity done on a RequestQueue instance:

    • βž• added new request
    • started processing a request (added to inProgress cache)
    • marked request as handled
    • reclaimed request

    If we don't detect one of those actions in last 5 minutes, and we have some requests in the inProgress cache, we try to reset the state. We can override this limit via CRAWLEE_INTERNAL_TIMEOUT env var.

    This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the inProgress cache.

  • v2.2.2 Changes

    February 14, 2022
    • πŸ›  fix: ensure request.headers is set
    • πŸ›  fix: lower RequestQueue API timeout to 30 seconds
    • πŸ‘Œ improve logging for fetching next request and timeouts
  • v2.2.1 Changes

    January 03, 2022
    • πŸ›  fix: ignore requests that are no longer in progress (#1258)
    • πŸ›  fix: do not use tryCancel() from inside sync callback (#1265)
    • πŸ›  fix: revert to puppeteer 10.x (#1276)
    • πŸ›  fix: wait when body is not available in infiniteScroll() from Puppeteer utils (#1238)
    • πŸ›  fix: expose logger classes on the utils.log instance (#1278)
  • v2.2.0 Changes

    December 17, 2021

    Proxy per page

    πŸ’» Up until now, browser crawlers used the same session (and therefore the same proxy) for πŸ’» all request from a single browser * now get a new proxy for each session. This means that with incognito pages, each page will get a new proxy, aligning the behaviour with CheerioCrawler.

    0️⃣ This feature is not enabled by default. To use it, we need to enable useIncognitoPages flag under launchContext:

    new Apify.Playwright({
        launchContext: {
            useIncognitoPages: true,
        },
        // ...
    })
    

    🐎 > Note that currently there is a performance overhead for using useIncognitoPages.

    Use this flag at your own will.

    0️⃣ We are planning to enable this feature by default in SDK v3.0.

    Abortable timeouts

    Previously when a page function timed out, the task still kept running. This could lead to requests being processed multiple times. In v2.2 we now have abortable timeouts that will cancel the task as early as possible.

    Mitigation of zero concurrency issue

    Several new timeouts were added to the task function, which should help mitigate the zero concurrency bug. Namely fetching of next request information and reclaiming failed requests back to the queue ⏱ are now executed with a timeout with 3 additional retries before the task fails. The timeout is always at least 300s (5 minutes), or requestHandlerTimeoutSecs if that value is higher.

    Full list of changes

    • πŸ›  fix RequestError: URI malformed in cheerio crawler (#1205)
    • only provide Cookie header if cookies are present (#1218)
    • πŸ– handle extra cases for diffCookie (#1217)
    • βž• add timeout for task function (#1234)
    • πŸ’» implement proxy per page in browser crawlers (#1228)
    • βž• add fingerprinting support (#1243)
    • implement abortable timeouts (#1245)
    • βž• add timeouts with retries to runTaskFunction() (#1250)
    • automatically convert google spreadsheet URLs to CSV exports (#1255)
  • v2.1.0 Changes

    October 07, 2021
    • πŸ“„ automatically convert google docs share urls to csv download ones in request list (#1174)
    • πŸ‘‰ use puppeteer emulating scrolls instead of window.scrollBy (#1170)
    • warn if apify proxy is used in proxyUrls (#1173)
    • fix YOUTUBE_REGEX_STRING being too greedy (#1171)
    • βž• add purgeLocalStorage utility method (#1187)
    • catch errors inside request interceptors (#1188, #1190)
    • βž• add support for cgroups v2 (#1177)
    • πŸ›  fix incorrect offset in fixUrl function (#1184)
    • πŸ‘Œ support channel and user links in YouTube regex (#1178)
    • πŸ›  fix: allow passing requestsFromUrl to RequestListOptions in TS (#1191)
    • πŸ‘ allow passing forceCloud down to the KV store (#1186), closes #752
    • πŸ”€ merge cookies from session with user provided ones (#1201), closes #1197
    • πŸ‘‰ use ApifyClient v2 (full rewrite to TS)
  • v2.0.7 Changes

    September 08, 2021
    • Fix casting of int/bool environment variables (e.g. APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE), closes #956
    • πŸ›  Fix incognito pages and user data dir (#1145)
    • βž• Add @ts-ignore comments to imports of optional peer dependencies (#1152)
    • πŸ‘‰ Use config instance in sdk.openSessionPool() (#1154)
    • βž• Add a breaking callback to infiniteScroll (#1140)
  • v2.0.6 Changes

    August 27, 2021
    • πŸ›  Fix deprecation messages logged from ProxyConfiguration and CheerioCrawler.
    • ⚑️ Update got-scraping to receive multiple improvements.
  • v2.0.5 Changes

    August 24, 2021
    • πŸ›  Fix error handling in puppeteer crawler
  • v2.0.4 Changes

    August 23, 2021
    • πŸ‘‰ Use sessionToken with got-scraping
  • v2.0.3 Changes

    August 20, 2021
    • πŸ’₯ BREAKING IN EDGE CASES * We removed forceUrlEncoding in requestAsBrowser because we found out that recent versions of the underlying HTTP client got already encode URLs and forceUrlEncoding could lead to weird behavior. We think of this as fixing a bug, so we're not bumping the major version.
    • Limit handleRequestTimeoutMillis to max valid value to prevent Node.js fallback to 1.
    • πŸ‘‰ Use got-scraping@^3.0.1
    • Disable SSL validation on MITM proxie
    • Limit handleRequestTimeoutMillis to max valid value