Changelog History
Page 2
-
v2.3.0 Changes
April 07, 2022- feat: accept more social media patterns (#1286)
- π feat: add multiple click support to
enqueueLinksByClickingElements
(#1295) - π§ feat: instance-scoped "global" configuration (#1315)
- feat: requestList accepts proxyConfiguration for requestsFromUrls (#1317)
- β‘οΈ feat: update
playwright
to v1.20.2 - β‘οΈ feat: update
puppeteer
to v13.5.2 > We noticed that with this version of puppeteer actor run could crash with >We either navigate top level or have old version of the navigated frame
error > (puppeteer issue here). > It should not happen while running the browser in headless mode. > In case you need to run the browser in headful mode (headless: false
), > we recommend pinning puppeteer version to10.4.0
in actorpackage.json
file. - π feat: stealth deprecation (#1314)
- feat: allow passing a stream to KeyValueStore.setRecord (#1325)
- π fix: use correct apify-client instance for snapshotting (#1308)
- π fix: automatically reset
RequestQueue
state after 5 minutes of inactivity, closes #997 - π fix: improve guessing of chrome executable path on windows (#1294)
- π fix: prune CPU snapshots locally (#1313)
- π fix: improve browser launcher types (#1318)
0 concurrency mitigation
π This release should resolve the 0 concurrency bug by automatically resetting the internal
RequestQueue
state after 5 minutes of inactivity.We now track last activity done on a
RequestQueue
instance:- β added new request
- started processing a request (added to
inProgress
cache) - marked request as handled
- reclaimed request
If we don't detect one of those actions in last 5 minutes, and we have some requests in the
inProgress
cache, we try to reset the state. We can override this limit viaCRAWLEE_INTERNAL_TIMEOUT
env var.This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the
inProgress
cache. -
v2.2.2 Changes
February 14, 2022- π fix: ensure
request.headers
is set - π fix: lower
RequestQueue
API timeout to 30 seconds - π improve logging for fetching next request and timeouts
- π fix: ensure
-
v2.2.1 Changes
January 03, 2022- π fix: ignore requests that are no longer in progress (#1258)
- π fix: do not use
tryCancel()
from inside sync callback (#1265) - π fix: revert to puppeteer 10.x (#1276)
- π fix: wait when
body
is not available ininfiniteScroll()
from Puppeteer utils (#1238) - π fix: expose logger classes on the
utils.log
instance (#1278)
-
v2.2.0 Changes
December 17, 2021Proxy per page
π» Up until now, browser crawlers used the same session (and therefore the same proxy) for π» all request from a single browser * now get a new proxy for each session. This means that with incognito pages, each page will get a new proxy, aligning the behaviour with
CheerioCrawler
.0οΈβ£ This feature is not enabled by default. To use it, we need to enable
useIncognitoPages
flag underlaunchContext
:new Apify.Playwright({ launchContext: { useIncognitoPages: true, }, // ... })
π > Note that currently there is a performance overhead for using
useIncognitoPages
.Use this flag at your own will.
0οΈβ£ We are planning to enable this feature by default in SDK v3.0.
Abortable timeouts
Previously when a page function timed out, the task still kept running. This could lead to requests being processed multiple times. In v2.2 we now have abortable timeouts that will cancel the task as early as possible.
Mitigation of zero concurrency issue
Several new timeouts were added to the task function, which should help mitigate the zero concurrency bug. Namely fetching of next request information and reclaiming failed requests back to the queue β± are now executed with a timeout with 3 additional retries before the task fails. The timeout is always at least 300s (5 minutes), or
requestHandlerTimeoutSecs
if that value is higher.Full list of changes
- π fix
RequestError: URI malformed
in cheerio crawler (#1205) - only provide Cookie header if cookies are present (#1218)
- π handle extra cases for
diffCookie
(#1217) - β add timeout for task function (#1234)
- π» implement proxy per page in browser crawlers (#1228)
- β add fingerprinting support (#1243)
- implement abortable timeouts (#1245)
- β add timeouts with retries to
runTaskFunction()
(#1250) - automatically convert google spreadsheet URLs to CSV exports (#1255)
- π fix
-
v2.1.0 Changes
October 07, 2021- π automatically convert google docs share urls to csv download ones in request list (#1174)
- π use puppeteer emulating scrolls instead of
window.scrollBy
(#1170) - warn if apify proxy is used in proxyUrls (#1173)
- fix
YOUTUBE_REGEX_STRING
being too greedy (#1171) - β add
purgeLocalStorage
utility method (#1187) - catch errors inside request interceptors (#1188, #1190)
- β add support for cgroups v2 (#1177)
- π fix incorrect offset in
fixUrl
function (#1184) - π support channel and user links in YouTube regex (#1178)
- π fix: allow passing
requestsFromUrl
toRequestListOptions
in TS (#1191) - π allow passing
forceCloud
down to the KV store (#1186), closes #752 - π merge cookies from session with user provided ones (#1201), closes #1197
- π use
ApifyClient
v2 (full rewrite to TS)
-
v2.0.7 Changes
September 08, 2021- Fix casting of int/bool environment variables (e.g.
APIFY_LOCAL_STORAGE_ENABLE_WAL_MODE
), closes #956 - π Fix incognito pages and user data dir (#1145)
- β Add
@ts-ignore
comments to imports of optional peer dependencies (#1152) - π Use config instance in
sdk.openSessionPool()
(#1154) - β Add a breaking callback to
infiniteScroll
(#1140)
- Fix casting of int/bool environment variables (e.g.
-
v2.0.6 Changes
August 27, 2021- π Fix deprecation messages logged from
ProxyConfiguration
andCheerioCrawler
. - β‘οΈ Update
got-scraping
to receive multiple improvements.
- π Fix deprecation messages logged from
-
v2.0.5 Changes
August 24, 2021- π Fix error handling in puppeteer crawler
-
v2.0.4 Changes
August 23, 2021- π Use
sessionToken
withgot-scraping
- π Use
-
v2.0.3 Changes
August 20, 2021- π₯ BREAKING IN EDGE CASES * We removed
forceUrlEncoding
inrequestAsBrowser
because we found out that recent versions of the underlying HTTP clientgot
already encode URLs andforceUrlEncoding
could lead to weird behavior. We think of this as fixing a bug, so we're not bumping the major version. - Limit
handleRequestTimeoutMillis
to max valid value to prevent Node.js fallback to1
. - π Use
got-scraping@^3.0.1
- Disable SSL validation on MITM proxie
- Limit
handleRequestTimeoutMillis
to max valid value
- π₯ BREAKING IN EDGE CASES * We removed