Skip to content

Commit

Permalink
chore: Update Dockerfile to install Google Chrome for Puppeteer
Browse files Browse the repository at this point in the history
  • Loading branch information
n4ze3m committed Jun 17, 2024
1 parent c1f6c18 commit f78da2a
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 2 deletions.
10 changes: 10 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ RUN yarn config set registry https://registry.npmjs.org/
RUN yarn config set network-timeout 1200000

RUN apt update && apt -y install --no-install-recommends ca-certificates git git-lfs openssh-client curl jq cmake sqlite3 openssl psmisc python3

RUN apt-get update && apt-get install gnupg wget -y && \
wget --quiet --output-document=- https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor > /etc/apt/trusted.gpg.d/google-archive.gpg && \
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' && \
apt-get update && \
apt-get install google-chrome-stable -y --no-install-recommends && \
rm -rf /var/lib/apt/lists/*

RUN apt -y install g++ make
# RUN npm install -g node-gyp
RUN apt-get clean autoclean && apt-get autoremove --yes && rm -rf /var/lib/{apt,dpkg,cache,log}/
Expand All @@ -51,4 +59,6 @@ RUN yarn install --production --frozen-lockfile

ENV NODE_ENV=production

ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable

CMD ["yarn", "start"]
7 changes: 5 additions & 2 deletions server/src/utils/crawl.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ export const crawl = async (

while (queue.length > 0 && visitedLinks.size < maxLinks) {
const batch = queue.splice(0, Math.min(queue.length, maxLinks - visitedLinks.size));

await Promise.all(
batch.map(async ({ url, depth }) => {
if (visitedLinks.has(url) || depth > maxDepth) {
Expand All @@ -29,7 +29,10 @@ export const crawl = async (

try {
const response = await axios.get(url, {
headers: { Accept: "text/html" },
headers: {
Accept: "text/html",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
},
});

const contentType = response.headers['content-type'];
Expand Down

0 comments on commit f78da2a

Please sign in to comment.