Skip to content

Commit

Permalink
chore: allow adjusting the chromium maximum concurrency through an en…
Browse files Browse the repository at this point in the history
…vironment variable
  • Loading branch information
sneko committed Feb 26, 2024
1 parent d759330 commit 6cbdc09
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 4 deletions.
1 change: 1 addition & 0 deletions .env.model
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ APP_BASE_URL=
DATABASE_URL=
LLM_MANAGER_MOCK=false
MAINTENANCE_API_KEY=
CHROMIUM_MAXIMUM_CONCURRENCY=1
NEXT_PUBLIC_CRISP_WEBSITE_ID=
NEXT_PUBLIC_SENTRY_DSN=
NEXT_PUBLIC_MATOMO_URL=
Expand Down
1 change: 1 addition & 0 deletions .env.test
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ PORT=3000
DATABASE_URL=postgresql://postgres:changeme@localhost:5432/postgres
LLM_MANAGER_MOCK=true
MAINTENANCE_API_KEY=random
CHROMIUM_MAXIMUM_CONCURRENCY=1
NEXT_PUBLIC_CRISP_WEBSITE_ID=random-one-since-cannot-work-without-a-remote-crisp-account
NEXT_PUBLIC_SENTRY_DSN=
NEXT_PUBLIC_MATOMO_URL=
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,12 +193,13 @@ For each build and runtime (since they are shared), you should have set some env
- `APP_MODE`: `prod` _(can be `dev` in case you would like to deploy a development environment)_
- `DATABASE_URL`: `$POSTGRESQL_ADDON_URI` _(you must copy/paste the value provided by Clever Cloud into `$POSTGRESQL_ADDON_URI`, and note you must add as query parameter `sslmode=prefer`)_
- `MAINTENANCE_API_KEY`: [SECRET] _(random string that can be generated with `openssl rand -base64 32`. Note this is needed to perform maintenance through dedicated API endpoints)_
- `MISTRAL_API_KEY`: [SECRET] _(you can create an API key from your MistralAI "La plateforme" account)_
- `CHROMIUM_MAXIMUM_CONCURRENCY`: [TO_DEFINE] _(by default it will be `1` but it takes a long time when analyzing thousands of websites through the headless Chromium. After some testing we think on Clever Cloud having `4` is fine for the `S` plan (and `8` for `XL` plan for a quick test to speed things up), and locally it will depend on your hardware. Consider to lower the value when having more than 10% of analyses timed out)_
- `NEXT_PUBLIC_APP_BASE_URL`: [TO_DEFINE] _(must be the root URL to access the application, format `https://xxx.yyy.zzz`)_
- `NEXT_PUBLIC_CRISP_WEBSITE_ID`: [TO_DEFINE] _(this ID is defined in your Crisp account and depends on the development or production environment)_
- `NEXT_PUBLIC_SENTRY_DSN`: [SECRET] _(format `https://xxx.yyy.zzz/nn`)_
- `NEXT_PUBLIC_MATOMO_URL`: [PROVIDED] _(format `https://xxx.yyy.zzz/`)_
- `NEXT_PUBLIC_MATOMO_SITE_ID`: [GENERATED] _(format `https://xxx.yyy.zzz/`)_
- `MISTRAL_API_KEY`: [SECRET] _(you can create an API key from your MistralAI "La plateforme" account)_

**IMPORTANT: When building Next.js in a standalone mode the frontend `NEXT_PUBLIC_*` environement variables are hardcoded. It means you have to set them into the build environment too. For more information have a look at the build step comment into `.github/workflows/ci.yml`.**

Expand Down
7 changes: 4 additions & 3 deletions src/features/domain.ts
Original file line number Diff line number Diff line change
Expand Up @@ -598,9 +598,10 @@ export async function updateWebsiteDataOnDomains() {
});
try {
// Since the underlying content fetching is based on waiting a timeout on the website "to be sure" single page applications (SPA)
// have rendered their content, it takes some time in the iteration are consecutives. Due to that we made the loop batching a few for each iteration
// Note: previously in average it was 6 seconds per website (since to 2 pages renderings with timeout), we tried to keep it short (others long-running jobs are bout ~50ms per page loaded)
await eachOfLimit(rawDomains, 15, async function (rawDomain, rawDomainIndex) {
// have rendered their content, it takes some time in the iteration are consecutives. Due to that we allowed the loop batching
// Note: for a concurren of 1 in average it was 6 seconds per website (since to 2 pages renderings with an await)
const maxConcurrency = !!process.env.CHROMIUM_MAXIMUM_CONCURRENCY ? parseInt(process.env.CHROMIUM_MAXIMUM_CONCURRENCY, 10) : 1;
await eachOfLimit(rawDomains, maxConcurrency, async function (rawDomain, rawDomainIndex) {
watchGracefulExitInLoop();

console.log(
Expand Down

0 comments on commit 6cbdc09

Please sign in to comment.