All Versions
23
Latest Version
Avg Release Cycle
21 days
Latest Release
82 days ago

Changelog History
Page 1

  • v3.1.2 Changes

    November 15, 2022

    πŸ› Bug Fixes

    • injectJQuery in context does not survive navs (#1661) (493a7cf)
    • πŸ‘‰ make router error message more helpful for undefined routes (#1678) (ab359d8)
    • MemoryStorage: correctly respect the desc option (#1666) (b5f37f6)
    • requestHandlerTimeout timing (#1660) (493ea0c)
    • πŸ‘― shallow clone browserPoolOptions before normalization (#1665) (22467ca)
    • πŸ‘Œ support headfull mode in playwright js project template (ea2e61b)
    • πŸ‘Œ support headfull mode in puppeteer js project template (e6aceb8)

    πŸ”‹ Features

  • v3.1.1 Changes

    November 07, 2022

    πŸ› Bug Fixes

    πŸ”‹ Features

    • βž• add static set and useStorageClient shortcuts to Configuration (2e66fa2)
    • βœ… enable migration testing (#1583) (ee3a68f)
    • playwright: disable animations when taking screenshots (#1601) (4e63034)
  • v3.1.0 Changes

    October 13, 2022

    πŸ› Bug Fixes

    • βž• add overload for KeyValueStore.getValue with defaultValue (#1541) (e3cb509)
    • βž• add retry attempts to methods in CLI (#1588) (9142e59)
    • πŸ‘ allow label in enqueueLinksByClickingElements options (#1525) (18b7c25)
    • basic-crawler: handle request.noRetry after errorHandler (#1542) (2a2040e)
    • πŸ— build storage classes by using this instead of the class (#1596) (2b14eb7)
    • correct some typing exports (#1527) (4a136e5)
    • do not hide stack trace of (retried) Type/Syntax/ReferenceErrors (469b4b5)
    • enqueueLinks: ensure the enqueue strategy is respected alongside user patterns (#1509) (2b0eeed)
    • enqueueLinks: prevent useless request creations when filtering by user patterns (#1510) (cb8fe36)
    • πŸ“¦ export Cookie from crawlee metapackage (7b02ceb)
    • πŸ– handle redirect cookies (#1521) (2f7fc7c)
    • http-crawler: do not hang on POST without payload (#1546) (8c87390)
    • βœ‚ remove undeclared dependency on core package from puppeteer utils (827ae60)
    • πŸ‘Œ support TypeScript 4.8 (#1507) (4c3a504)
    • wait for persist state listeners to run when event manager closes (#1481) (aa550ed)

    πŸ”‹ Features

    • βž• add Dataset.exportToValue (#1553) (acc6344)
    • βž• add Dataset.getData() shortcut (522ed6e)
    • βž• add utils.downloadListOfUrls to crawlee metapackage (7b33b0a)
    • βž• add utils.parseOpenGraph() (#1555) (059f85e)
    • βž• add utils.playwright.compileScript (#1559) (2e14162)
    • βž• add utils.playwright.infiniteScroll (#1543) (60c8289), closes #1528
    • βž• add utils.playwright.saveSnapshot (#1544) (a4ceef0)
    • βž• add global useState helper (#1551) (2b03177)
    • βž• add static Dataset.exportToValue (#1564) (a7c17d4)
    • πŸ‘ allow disabling storage persistence (#1539) (f65e3c6)
    • ⬆️ bump puppeteer support to 17.x (#1519) (b97a852)
    • core: add forefront option to enqueueLinks helper (f8755b6), closes #1595
    • don't close page before calling errorHandler (#1548) (1c8cd82)
    • enqueue links by clicking for Playwright (#1545) (3d25ade)
    • error tracker (#1467) (6bfe1ce)
    • πŸ‘‰ make the CLI download directly from GitHub (#1540) (3ff398a)
    • router: add userdata generic to addHandler (#1547) (19cdf13)
    • πŸ‘‰ use JSON5 for INPUT.json to support comments (#1538) (09133ff)
  • v3.0.4 Changes

    August 22, 2022

    πŸ”‹ Features

    • ⬆️ bump puppeteer support to 15.1

    πŸ› Bug Fixes

    • key value stores emitting an error when multiple write promises ran in parallel (#1460) (f201cca)
    • πŸ›  fix dockerfiles in project templates
  • v3.0.3 Changes

    August 11, 2022

    πŸ›  Fixes

    • βž• add missing configuration to CheerioCrawler constructor (#1432)
    • sendRequest types (#1445)
    • πŸ’» respect headless option in browser crawlers (#1455)
    • πŸ‘‰ make CheerioCrawlerOptions type more loose (d871d8c)
    • πŸ‘Œ improve dockerfiles and project templates (7c21a64)

    πŸ”‹ Features

    • βž• add utils.playwright.blockRequests() (#1447)
    • http-crawler (#1440)
    • prefer /INPUT.json files for KeyValueStore.getInput() (#1453)
    • jsdom-crawler (#1451)
    • βž• add RetryRequestError + add error to the context for BC (#1443)
    • βž• add keepAlive to crawler options (#1452)
  • v3.0.2 Changes

    July 28, 2022

    πŸ›  Fixes

    • regression in resolving the base url for enqueue link filtering (1422)
    • πŸ‘Œ improve file saving on memory storage (1421)
    • βž• add UserData type argument to CheerioCrawlingContext and related interfaces (1424)
    • always limit desiredConcurrency to the value of maxConcurrency (bcb689d)
    • wait for storage to finish before resolving crawler.run() (9d62d56)
    • using explicitly typed router with CheerioCrawler (07b7e69)
    • πŸ“¦ declare dependency on ow in @crawlee/cheerio package (be59f99)
    • πŸ‘‰ use [email protected]^3.0.0 in the CLI templates (6426f22)
    • πŸ›  fix building projects with TS when puppeteer and playwright are not installed (1404)
    • enqueueLinks should respect full URL of the current request for relative link resolution (1427)
    • 0️⃣ use desiredConcurrency: 10 as the default for CheerioCrawler (1428)

    πŸ”‹ Features

    • πŸ”§ feat: allow configuring what status codes will cause session retirement (1423)
    • πŸ‘ feat: add support for middlewares to the Router via use method (1431)
  • v3.0.1 Changes

    July 26, 2022

    πŸ›  Fixes

    • βœ‚ remove JSONData generic type arg from CheerioCrawler in (#1402)
    • 0️⃣ rename default storage folder to just storage in (#1403)
    • βœ‚ remove trailing slash for proxyUrl in (#1405)
    • 0️⃣ run browser crawlers in headless mode by default in (#1409)
    • πŸ“‡ rename interface FailedRequestHandler to ErrorHandler in (#1410)
    • 0️⃣ ensure default route is not ignored in CheerioCrawler in (#1411)
    • βž• add headless option to BrowserCrawlerOptions in (#1412)
    • πŸ–¨ processing custom cookies in (#1414)
    • enqueue link not finding relative links if the checked page is redirected in (#1416)
    • πŸ›  fix building projects with TS when puppeteer and playwright are not installed in (#1404)
    • πŸ’» calling enqueueLinks in browser crawler on page without any links in (385ca27)
    • πŸ‘Œ improve error message when no default route provided in (04c3b6a)

    πŸ”‹ Features

    • πŸ“œ feat: add parseWithCheerio for puppeteer & playwright in (#1418)
  • v3.0.0 Changes

    July 13, 2022

    πŸš€ This section summarizes most of the breaking changes between Crawlee (v3) and Apify SDK (v2). Crawlee is the spiritual successor to Apify SDK, so we decided to keep the versioning and release Crawlee as v3.

    Crawlee vs Apify SDK

    πŸ“¦ Up until version 3 of apify, the package contained both scraping related tools and Apify platform related helper methods. With v3 we are splitting the whole project into two main parts:

    • πŸ“¦ Crawlee, the new web-scraping library, available as crawlee package on NPM
    • πŸ“¦ Apify SDK, helpers for the Apify platform, available as apify package on NPM

    πŸ“¦ Moreover, the Crawlee library is published as several packages under @crawlee namespace:

    • @crawlee/core: the base for all the crawler implementations, also contains things like Request, RequestQueue, RequestList or Dataset classes
    • @crawlee/basic: exports BasicCrawler
    • @crawlee/cheerio: exports CheerioCrawler
    • πŸ’» @crawlee/browser: exports BrowserCrawler (which is used for creating @crawlee/playwright and @crawlee/puppeteer)
    • @crawlee/playwright: exports PlaywrightCrawler
    • @crawlee/puppeteer: exports PuppeteerCrawler
    • @crawlee/memory-storage: @apify/storage-local alternative
    • πŸ“¦ @crawlee/browser-pool: previously browser-pool package
    • @crawlee/utils: utility methods
    • @crawlee/types: holds TS interfaces mainly about the StorageClient

    Installing Crawlee

    πŸš€ > As Crawlee is not yet released as latest, we need to install from the next distribution tag!

    πŸ“¦ Most of the Crawlee packages are extending and reexporting each other, so it's enough to install just the one you plan on using, e.g. @crawlee/playwright if you plan on using playwright - it already contains everything from the @crawlee/browser package, which includes everything from @crawlee/basic, which includes everything from @crawlee/core.

    npm install [email protected]
    

    πŸ‘ Or if all we need is cheerio support, we can install only @crawlee/cheerio

    npm install @crawlee/[email protected]
    

    When using playwright or puppeteer, we still need to install those dependencies explicitly - this allows the users to be in control of which version will be used.

    npm install [email protected] playwright
    
  • v2.3.2 Changes

    May 05, 2022
    • πŸ›  fix: use default user agent for playwright with chrome instead of the default "headless UA"
    • πŸ›  fix: always hide webdriver of chrome browsers
  • v2.3.1 Changes

    May 03, 2022
    • πŸ›  fix: utils.apifyClient early instantiation (#1330)
    • feat: utils.playwright.injectJQuery() (#1337)
    • feat: add keyValueStore option to Statistics class (#1345)
    • πŸ›  fix: ensure failed req count is correct when using RequestList (#1347)
    • πŸ›  fix: random puppeteer crawler (running in headful mode) failure (#1348) > This should help with the We either navigate top level or have old version of the navigated frame bug in puppeteer.
    • πŸ›  fix: allow returning falsy values in RequestTransform's return type