Popularity

7.9

Growing

Activity

9.8

Growing

Stars 12,044

Watchers 93

Forks 502

Last Commit 2 days ago

Description

Crawlee covers your crawling and scraping end-to-end and helps you build reliable scrapers. Fast.

Your crawlers will appear human-like and fly under the radar of modern bot protections, even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data, and store it to disk or cloud while staying configurable to suit your project's needs.

Crawlee is available as the crawlee NPM package.

Programming language: TypeScript

License: Apache License 2.0

Tags: Nodejs Automation TypeScript JavaScript Web Crawling Web Scraping

Latest version: v3.1.2

Crawlee alternatives and similar libraries

Based on the "NodeJS" category.
Alternatively, view crawlee alternatives based on common mentions on social networks and blogs.

NectarJS

5.3 0.0 Crawlee VS NectarJS

🔱 Javascript's God Mode. No VM. No Bytecode. No GC. Just native binaries.
Gluon

5.0 6.8 Crawlee VS Gluon

A new framework for creating desktop apps from websites, using system installed browsers and NodeJS

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App

With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

Promo surveyjs.io

pwa-asset-generator

4.7 3.6 Crawlee VS pwa-asset-generator

Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines.
rdflib.js

3.1 7.0 Crawlee VS rdflib.js

Linked Data API for JavaScript
rdfstore-js

3.1 0.0 Crawlee VS rdfstore-js

JS RDF store with SPARQL support
teachcode

2.4 4.6 Crawlee VS teachcode

A tool to develop and improve a student’s programming skills by introducing the earliest lessons of coding.
Stylify

2.3 6.5 Crawlee VS Stylify

💎 Monorepository for Stylify packages. Stylify uses CSS-like selectors to generate Extremely optimized utility-first CSS dynamically based on what you write 💎.
Vulcan Next

2.3 0.0 Crawlee VS Vulcan Next

The Next starter for GraphQL developers
Autometrics

1.5 8.0 Crawlee VS Autometrics

Easily add metrics to your system – and actually understand them using automatically customized Prometheus queries
Nebra 🌫️

1.5 8.6 Crawlee VS Nebra 🌫️

Type-safe NoSQL with Node & SQLite. 🌫️💽
DIOD

1.4 3.5 Crawlee VS DIOD

A very opinionated inversion of control (IoC) container and dependency injector for Typescript, Node.js or browser apps.
Brainyduck

1.1 4.2 Crawlee VS Brainyduck

🐥 A micro "no-backend" framework 🤯 Quickly build powerful BaaS using only your graphql schemas
github-star-search

1.0 0.0 Crawlee VS github-star-search

A CLI that search your github starred repositories offline through README , description and other fields.
jirax

1.0 0.0 Crawlee VS jirax

:sunglasses: :computer: Simple and flexible CLI Tool for your daily JIRA activity (supported on all OSes)
zeit

0.7 0.0 L5 Crawlee VS zeit

Clock and task scheduler for node.js applications, providing extensive control of time and callback scheduling in prod and test code
ts-pojo-error

0.6 0.0 Crawlee VS ts-pojo-error

🔥 Type safe pojo error will help you to easily create typed and serializable error.
PrivMX JS Crypto Lib

0.6 4.7 Crawlee VS PrivMX JS Crypto Lib

Javascript crypto library ...
chef-express

0.4 4.9 Crawlee VS chef-express

Command Line Interface Static Files Server written in TypeScript for Single Page Applications serving in Node with Express
chef-socket

0.3 4.9 Crawlee VS chef-socket

Command Line Interface Static Files Server written in TypeScript for Single Page Applications serving in Node with Socket.IO
struct-compile

0.3 8.5 Crawlee VS struct-compile

Easily parse binary data with C structure syntax
spurtcommerce

0.2 10.0 Crawlee VS spurtcommerce

Spurtcommerce is a complete ecommerce solution for Angular 4 , Nodejs , Mysql
Be notified of new signups in your app using Firebase Authentication and Google Chat

0.1 10.0 Crawlee VS Be notified of new signups in your app using Firebase Authentication and Google Chat

Be notified of new signups in your app directly in Google Chat

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Crawlee or a related project?

Add another 'NodeJS' Library

Popular Comparisons

README

A web scraping and browser automation library

ℹ️ Crawlee is the successor to Apify SDK. 🎉 Fully rewritten in TypeScript for a better developer experience, and with even more powerful anti-blocking features. The interface is almost the same as Apify SDK so upgrading is a breeze. Read the upgrading guide to learn about the changes. ℹ️

Crawlee covers your crawling and scraping end-to-end and helps you build reliable scrapers. Fast.

Your crawlers will appear human-like and fly under the radar of modern bot protections even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data, and store it to disk or cloud while staying configurable to suit your project's needs.

Crawlee is available as the crawlee NPM package.

👉 View full documentation, guides and examples on the Crawlee project website 👈

Installation

We recommend visiting the Introduction tutorial in Crawlee documentation for more information.

Crawlee requires Node.js 16 or higher.

With Crawlee CLI

The fastest way to try Crawlee out is to use the Crawlee CLI and choose the Getting started example. The CLI will install all the necessary dependencies and add boilerplate code for you to play with.

npx crawlee create my-crawler

cd my-crawler
npm start

Manual installation

If you prefer adding Crawlee into your own project, try the example below. Because it uses PlaywrightCrawler we also need to install Playwright. It's not bundled with Crawlee to reduce install size.

npm install crawlee playwright

import { PlaywrightCrawler, Dataset } from 'crawlee';

// PlaywrightCrawler crawls the web using a headless
// browser controlled by the Playwright library.
const crawler = new PlaywrightCrawler({
    // Use the requestHandler to process each of the crawled pages.
    async requestHandler({ request, page, enqueueLinks, log }) {
        const title = await page.title();
        log.info(`Title of ${request.loadedUrl} is '${title}'`);

        // Save results as JSON to ./storage/datasets/default
        await Dataset.pushData({ title, url: request.loadedUrl });

        // Extract links from the current page
        // and add them to the crawling queue.
        await enqueueLinks();
    },
    // Uncomment this option to see the browser window.
    // headless: false,
});

// Add first URL to the queue and start the crawl.
await crawler.run(['https://crawlee.dev']);

By default, Crawlee stores data to ./storage in the current working directory. You can override this directory via Crawlee configuration. For details, see Configuration guide, Request storage and Result storage.

🛠 Features

Single interface for HTTP and headless browser crawling
Persistent queue for URLs to crawl (breadth & depth first)
Pluggable storage of both tabular data and files
Automatic scaling with available system resources
Integrated proxy rotation and session management
Lifecycles customizable with hooks
CLI to bootstrap your projects
Configurable routing, error handling and retries
Dockerfiles ready to deploy
Written in TypeScript with generics

👾 HTTP crawling

Zero config HTTP2 support, even for proxies
Automatic generation of browser-like headers
Replication of browser TLS fingerprints
Integrated fast HTML parsers. Cheerio and JSDOM
Yes, you can scrape JSON APIs as well

💻 Real browser crawling

JavaScript rendering and screenshots
Headless and headful support
Zero-config generation of human-like fingerprints
Automatic browser management
Use Playwright and Puppeteer with the same interface
Chrome, Firefox, Webkit and many others

Usage on the Apify platform

Crawlee is open-source and runs anywhere, but since it's developed by Apify, it's easy to set up on the Apify platform and run in the cloud. Visit the Apify SDK website to learn more about deploying Crawlee to the Apify platform.

Support

If you find any bug or issue with Crawlee, please submit an issue on GitHub. For questions, you can ask on Stack Overflow, in GitHub Discussions or you can join our Discord server.

Contributing

Your code contributions are welcome, and you'll be praised to eternity! If you have any ideas for improvements, either submit an issue or create a pull request. For contribution guidelines and the code of conduct, see CONTRIBUTING.md.

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details.

*Note that all licence references and agreements mentioned in the Crawlee README section above are relevant to that project's source code only.