Algorithmic Audit: Modeling the New York City Civilian Complaint Review Board and Auditing the Model Fairness

For Full Paper, Click Here.

For Paper Covering The Background/Invesitgation of Inequity (Paper 1), Click Here

Please note that this README functions only as an overview of the work done! Please take a look at the final papers (linked above) or their corresponding Python notebooks:

Background

This paper and analysis was done during Spring 2022 as part of the course DSC 167: Fairness and Algorithmic Decision Making at UC San Diego's Halıcıoğlu Data Science Institute. The purpose of the assignment and aim of this paper was to find a system or mechanism that has a potential inequity, model the decision making process, and evaluate/audit this model using parity measures and frameworks of distributive justice. I hope that this paper can teach data scientists or those interested in the field about how ethics and fairness come into play in data science and the algorithms us data scientists create.

The New York City Civilian Complaint Review Board (CCRB), the system in this context, is an agency independent from the New York Police Department that investigates complaints against NYPD officers regarding allegations of excessive or unnecessary force, abuse of authority, discourtesy, or offensive language. Throughout the investigation process, the review board may obtain further evidence from the NYPD (e.g., body camera footage) and conduct interviews of anyone involved to use alongside any details the complainant submitted in their original complaint. Specific complainant/victim attributes collected include age, gender, and ethnicity. The complainant may also provide details on the officer’s sex, race, and physical appearance.

The potential inequity I investigated involves the "disposition" of a given complaint, or the ruling the CCRB determines. A substantiated complaint is one for which the alleged conduct was found to have happened and violated NYPD rules, potentially leading to punishment or repercussions for the accused officer. The data shows that complaints submitted by Black complainants were less often substantiated than non-Black complainants. This discrepancy was the launching pad for my investigation of this inequity and audit of the CCRB.

(Back to top)

Tools Used

Python
Jupyter Notebooks
Scikit Learn for Modelling
Pandas for Data Cleaning, Exploration, etc.
NumPy for Data Cleaning, Exploration, etc.
Matplotlib for Data Visualizations

Python logo, Scikit learn logo, Pandas logo, NumPy logo, Jupyter logo, Matplotlib logo.

(Back to top)

The Data

The data originally comes from ProPublica and contains information regarding complaints made starting in 1985 up to 2020. The model uses data starting in 2000 because key demographic features like complainant ethnicity were not tracked before then. All complaints include the type of complaint made, the precinct in which the alleged action took place, demographics on the complainant, and demographics on the accused officer. See allegations.csv and allegations_cleaned.csv.

Additional Data

Borough was not a column initially included in the dataset, so the precincts were mapped to their apporpriate boroughs using data pulled from the NYPD's website. See precinct_boroughs.csv

In order to understand the different categories of complaints and the potential consequences an officer may face, I referenced the CCRB's rules. This information was used in feature engineering.

In the first, investigatory paper, data on arrests in NYC and stop, question, and frisk patterns were investigated. Please visit this link for the arrests data and this link for the stop, question, and frisk data.

(Back to top)

The Logistic Regression Model

A simple logistic regression model with with L2 regularization was used to model the CCRB's decision process. To train the model, a simple 75%-25% train-test split was used. To combat the class imbalance present (only about 25% of complaints were deemed substantiated), the class_weight hyperparameter, which assigns a weight to each class that the model uses for penalizing, was used. In order to determine the proper decision thereshold, different utility functions were compared, ultimately leading to a threshold of 0.527 instead of the default 0.5. This means that anything that the model classifies points as substantiated if the resulting regression prediction is at least 0.527. See the paper for more details.

Features

The features used in the model are:

contact_reason (or text indicating why the officer approached the civilian)
mos_ethinicity (officer’s ethnicity), rank_incident (officer’s rank at time of incident)
mos_gender (officer’s gender)
complainant_gender (complainant’s gender)
mos_age_incident (officer’s age at time of incident)
complainant_age_incident (complainant’s age at time of incident)
borough (the borough in which the incident took place)
black (whether the complainant is Black)
allegation (brief description of the allegation)
fado_type (type of complaint)
time/date related features (month_received, year_received, month_closed, and year_closed)

All categorical features except allegation and fado_type were one-hot encoded while the exceptions were ordinal encoded. The numerical features were scaled.

Evaluation Metrics

The class imbalance makes accuracy ill-suited for this model, so the F1 score was used instead. The test performance metrics for all groups is as follows:

Accuracy Score: 0.623882406425216
Recall score: 0.526413921690491
Precision score: 0.3299571484222828
F1 score: 0.4056513409961685

(Back to top)

Overview of Parity Measures and Fairness Results

For a breakdown of different parity measures and fairness in machine learning, take a look at Google's Machine Learning Glossary: Fairness.

With the original model, a singular threshold was used. Under this model, both demographic parity and equalized odds were not satisfied, but predictive value parity was met. The original model substantiated complaints from Black civilians far less often than it did for non-Black complainants. The violation in equalized odds means the model gave more complaints submitted by non-Black complainants the “benefit of the doubt” and classified more CCRB-unsubstantiated complaints as substantiated than for Black complainants.

To try and improve these metrics for fairness, 2 different threshold tests were conducted to find better thresholds based on groups (complainant ethnicity in this case). Firstly, the thresholds that maximize utility for the individual group were used. These thresholds satisfied demographic parity and equalized odds, but not predictive value parity. Secondly, the thresholds that satisfy the 3 parity measures (demographic parity, equalized odds, and predictive value parity) were found using the training data and then tested with a separate dataset. See the paper for a more in depth breakdown on how these tests were conducted.

Depending on what is most important to the decision-making body (CCRB), different threshold(s) would be chosen. Maximizing utility overall enforces equality for all complainants, while maximizing utility for each group is more equitable. Satisfying demographic parity ensures that substantiation doesn’t depend on complainant ethnicity. Equality of odds would focus on ensuring that the chance of obtaining the benefit of substantiation is equal across groups, while predictive value parity focuses on ensuring deserving complainants receive substantiation and underserving complainants don’t. The table below demonstrates different group thresholds that could be used, depending on the goal (maximizing utility or enforcing a parity measure).

	Threshold for Black Complainants	Threshold for non-Black Complainants	Test	Demographic Parity	Equality of Odds	Predictive Value Parity
0	0.527	0.527	Max utility overall	Not satisfied	Not satisfied	Satisfied
1	0.522	0.546	Max utility per group	Satisfied	Satisfied	Not satisfied
2	0.522	0.541	Enforce equality of odds	N/A	Strictly satisfied	N/A
3	0.522	0.535	Enforce predictive value parity	N/A	N/A	Not satisfied

(Back to top)

Conclusion

Hopefully this paper shows that often times, an algorithm or model may (unintentionally) be unfair or inequitable if such parity measures or metrics of fairness are not considered. As for this specific example of the CCRB, the model missses out on lots of data perhaps gathered in the investigation process, but still demonstrates evidence of inequities or unfairness. The bottom line is that any model or algorithm absolutely should consider its fairness!

(Back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Paper 1 - Inequities in Civilian Complaints to the NYPD.ipynb		Paper 1 - Inequities in Civilian Complaints to the NYPD.ipynb
Paper 2 - Modeling the New York City Civilian Complaint Review Board and Auditing the Model Fairness.ipynb		Paper 2 - Modeling the New York City Civilian Complaint Review Board and Auditing the Model Fairness.ipynb
README.md		README.md
Screen Shot 2022-09-08 at 5.37.33 PM.png		Screen Shot 2022-09-08 at 5.37.33 PM.png
allegations.csv		allegations.csv
allegations_cleaned.csv		allegations_cleaned.csv
paper1.pdf		paper1.pdf
paper2.pdf		paper2.pdf
precinct_borough.csv		precinct_borough.csv
tools_used_pic.png		tools_used_pic.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algorithmic Audit: Modeling the New York City Civilian Complaint Review Board and Auditing the Model Fairness

For Full Paper, Click Here.

For Paper Covering The Background/Invesitgation of Inequity (Paper 1), Click Here

Table of contents

Background

Tools Used

The Data

Additional Data

The Logistic Regression Model

Features

Evaluation Metrics

Overview of Parity Measures and Fairness Results

Conclusion

About

Releases

Packages

Languages

jacquelinekclee/algorithmic-audit-nyc-ccrb

Folders and files

Latest commit

History

Repository files navigation

Algorithmic Audit: Modeling the New York City Civilian Complaint Review Board and Auditing the Model Fairness

For Full Paper, Click Here.

For Paper Covering The Background/Invesitgation of Inequity (Paper 1), Click Here

Table of contents

Background

Tools Used

The Data

Additional Data

The Logistic Regression Model

Features

Evaluation Metrics

Overview of Parity Measures and Fairness Results

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages