Popularity
9.0
Growing
Activity
8.7
Stable
18,087
442
1,263

Description

Tesseract.js is a javascript library that gets words in almost any language out of images. (Demo)

Tesseract.js works with script tags, webpack/browserify, and node. After you install it, using it is as simple as

Code Quality Rank: L3
Monthly Downloads: 0
Programming language: JavaScript
License: Apache License 2.0
Tags: Image Processing     Text     Ocr     Machine Learning     Images    
Latest version: v2.0.0

Tesseract.js alternatives and similar libraries

Based on the "Machine Learning" category

Do you think we are missing an alternative of Tesseract.js or a related project?

Add another 'Machine Learning' Library

README

Build Status Financial Contributors on Open Collective npm version Maintenance License Code Style Downloads Total Downloads Month

Version 2 beta is now available and under development in the master branch, read a story about v2 beta: Why I refactor tesseract.js v2? Check the support/1.x branch for version 1

Tesseract.js is a javascript library that gets words in [almost any language](./docs/tesseract_lang_list.md) out of images. (Demo)

Image Recognition

[fancy demo gif](./docs/images/demo.gif)

Video Real-time Recognition

Tesseract.js wraps an emscripten port of the Tesseract OCR Engine. It works in the browser using webpack or plain script tags with a CDN and on the server with Node.js. After you install it, using it is as simple as:

import Tesseract from 'tesseract.js';

Tesseract.recognize(
  'https://tesseract.projectnaptha.com/img/eng_bw.png',
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
  console.log(text);
})

Or more imperative

import { createWorker } from 'tesseract.js';

const worker = createWorker({
  logger: m => console.log(m)
});

(async () => {
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
  console.log(text);
  await worker.terminate();
})();

Check out the docs for a full explanation of the API.

Major changes in v2 beta

  • Upgrade to tesseract v4.1 (using emscripten 1.38.45)
  • Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
  • Supported image formats: png, jpg, bmp, pbm
  • Support WebAssembly (fallback to ASM.js when browser doesn't support)
  • Support Typescript

Installation

Tesseract.js works with a <script> tag via local copy or CDN, with webpack via npm and on Node.js with npm/yarn.

CDN

<!-- v2 -->
<script src='https://unpkg.com/tesseract.js@v2.0.0-beta.1/dist/tesseract.min.js'></script>

<!-- v1 -->
<script src='https://unpkg.com/tesseract.js@1.0.19/src/index.js'></script>

After including the script the Tesseract variable will be globally available.

Node.js

Tesseract.js currently requires Node.js v6.8.0 or higher

# For v2
npm install tesseract.js@next
yarn add tesseract.js@next

# For v1
npm install tesseract.js
yarn add tesseract.js

Documentation

  • [Examples](./docs/examples.md)
  • [Image Format](./docs/image-format.md)
  • [API](./docs/api.md)
  • [Local Installation](./docs/local-installation.md)
  • [FAQ](./docs/faq.md)

Use tesseract.js the way you like!

Contributing

Development

To run a development copy of Tesseract.js do the following:

# First we clone the repository
git clone https://github.com/naptha/tesseract.js.git
cd tesseract.js

# Then we install the dependencies
npm install

# And finally we start the development server
npm start

The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. It will automatically rebuild tesseract.dev.js and worker.dev.js when you change files in the src folder.

You can also run the development server in Gitpod ( a free online IDE and dev environment for GitHub that will automate your dev setup ) with a single click.

Open in Gitpod

Building Static Files

To build the compiled static files just execute the following:

npm run build

This will output the files into the dist directory.

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [[Contribute](CONTRIBUTING.md)].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]


*Note that all licence references and agreements mentioned in the Tesseract.js README section above are relevant to that project's source code only.