Crawlee v3.0.0 Release Notes

Release Date: 2022-07-13 // almost 2 years ago
  • ๐Ÿš€ This section summarizes most of the breaking changes between Crawlee (v3) and Apify SDK (v2). Crawlee is the spiritual successor to Apify SDK, so we decided to keep the versioning and release Crawlee as v3.

    Crawlee vs Apify SDK

    ๐Ÿ“ฆ Up until version 3 of apify, the package contained both scraping related tools and Apify platform related helper methods. With v3 we are splitting the whole project into two main parts:

    • ๐Ÿ“ฆ Crawlee, the new web-scraping library, available as crawlee package on NPM
    • ๐Ÿ“ฆ Apify SDK, helpers for the Apify platform, available as apify package on NPM

    ๐Ÿ“ฆ Moreover, the Crawlee library is published as several packages under @crawlee namespace:

    • @crawlee/core: the base for all the crawler implementations, also contains things like Request, RequestQueue, RequestList or Dataset classes
    • @crawlee/basic: exports BasicCrawler
    • @crawlee/cheerio: exports CheerioCrawler
    • ๐Ÿ’ป @crawlee/browser: exports BrowserCrawler (which is used for creating @crawlee/playwright and @crawlee/puppeteer)
    • @crawlee/playwright: exports PlaywrightCrawler
    • @crawlee/puppeteer: exports PuppeteerCrawler
    • @crawlee/memory-storage: @apify/storage-local alternative
    • ๐Ÿ“ฆ @crawlee/browser-pool: previously browser-pool package
    • @crawlee/utils: utility methods
    • @crawlee/types: holds TS interfaces mainly about the StorageClient

    Installing Crawlee

    ๐Ÿš€ > As Crawlee is not yet released as latest, we need to install from the next distribution tag!

    ๐Ÿ“ฆ Most of the Crawlee packages are extending and reexporting each other, so it's enough to install just the one you plan on using, e.g. @crawlee/playwright if you plan on using playwright - it already contains everything from the @crawlee/browser package, which includes everything from @crawlee/basic, which includes everything from @crawlee/core.

    npm install crawlee@next
    

    ๐Ÿ‘ Or if all we need is cheerio support, we can install only @crawlee/cheerio

    npm install @crawlee/cheerio@next
    

    When using playwright or puppeteer, we still need to install those dependencies explicitly - this allows the users to be in control of which version will be used.

    npm install crawlee@next playwright