Unofficial Node.js client for mugshots.com. Exposes both a Readable Stream and an Async Iterator API for streaming Mugshot objects. 🚔👮
npm i mugshots-client --s
import { MugshotStream, Mugshot } from 'mugshots-client';
const { MugshotStream } = require('mugshots-client');
import { MugshotStream, Mugshot } from 'mugshots-client';
(async () => {
const mugshotStream = await MugshotStream({ maxChunkSize: 10 });
console.log('Stream created.');
mugshotStream.on('error', (error) => {
console.log(error);
});
mugshotStream.on('close', () => {
console.log('Stream closed.');
});
mugshotStream.on('data', (mugshots: Mugshot[]) => {
console.log('data', mugshots);
});
})();
import * as puppeteer from 'puppeteer';
import {
CountyIterable,
MugshotUrlChunkIterable,
scrapeMugshots,
PagePool,
Mugshot
} from 'mugshots-client';
(async () => {
const browser = await puppeteer.launch();
const pagePool = PagePool(browser, { max: 10 });
const page = await pagePool.acquire();
const counties = await CountyIterable(page);
for await (const county of counties) {
const mugshotUrls = await MugshotUrlChunkIterable(page, county);
for await (const chunk of mugshotUrls) {
const mugshots = await scrapeMugshots(pagePool, chunk, { maxChunkSize: 20 });
console.log(mugshots);
}
}
})();
Why'd you make this? Isn't www.mugshots.com immoral?
My goals are to:
- subvert mugshots.com by making the watermarked records they re-publish from the public domain freely available for anyone to use
- bring attention to the moral implications for open records on the internet
- More on NPR's Planet Money podcast, Episode 878: Mugshots For Sale
- use this library for inequality and social justice research
I chose Puppeteer to provide a path forward for obscuring scraping, to future-proof this software against censorship or TOS changes.
Here is an article on making headless Chrome undetectable. My goal is to provide an API for making an undetectable scraper. It will be impossible to detect scraping if we manipulate the Chrome browser's behavior and properties to mimic a human user's browser.