This project is focused on building a robust credit card fraud detection system using various machine learning and deep learning models. The goal is to identify fraudulent transactions with high accuracy, ensuring financial institutions can reduce losses and enhance security measures.
- Analyze and preprocess imbalanced credit card transaction data.
- Explore and visualize patterns in fraudulent and non-fraudulent transactions.
- Build and evaluate classification models to detect fraudulent transactions.
- Compare multiple approaches, including Random Forest, SVM, Neural Networks, and PyTorch-based models.
- Dataset Size: ~284,000 transactions.
- Features: 30 anonymized features (V1-V28), Time, Amount, and Class.
- Target Variable:
Class
: 0 = Non-fraudulent, 1 = Fraudulent.
- Imbalance: Only 0.17% of transactions are fraudulent.
- Class Imbalance: Severe imbalance with a majority of transactions being non-fraudulent.
- Feature Correlation: Analyzed feature relationships using heatmaps and pairwise plots.
- Temporal Analysis: Explored fraud trends across different times of the day.
- Transaction Amounts: Compared fraud vs. non-fraud distributions.
- Handled missing data by using interpolation techniques.
- Normalized transaction amounts to standardize input features.
- Created additional features, such as
Hour
from theTime
variable, for temporal insights.
- Feature importance analysis for model interpretability.
- Achieved [85.3%].
- Evaluated performance with a radial basis kernel.
- Accuracy: [80%].
- Developed deep learning models using TensorFlow and PyTorch.
- Included techniques like dropout layers to prevent overfitting.
- Performance: [96%].
- Used metrics such as ROC-AUC Score, Accuracy, and Confusion Matrix to evaluate models.
- Focused on minimizing false negatives, which have a significant business impact.
- Karthik Mahalingam
- Ankitha Dongerkerry Pai
- Programming Language: Python
- Libraries:
pandas
,numpy
,matplotlib
,seaborn
,scikit-learn
,TensorFlow
,PyTorch
,XGBoost
,LightGBM
. - Visualization:
Plotly
,Matplotlib
,Seaborn
.
- Random Forest: Identified as the most effective model for this dataset due to its ability to handle class imbalance.
- Neural Networks: Provided high precision but required more computational resources.
- Feature Importance:
- Key features:
V4
,V12
,Amount
,V17
.
- Key features:
Credit card fraud causes significant financial losses, damages brand trust, and leads to operational inefficiencies in dispute resolution.
- Fraud Mitigation:
- Reduced false negatives to identify fraud early and prevent unauthorized transactions.
- Cost Efficiency:
- Decreased operational costs related to manual review of flagged transactions.
- Customer Trust:
- Enhanced customer trust through proactive fraud detection, leading to improved brand loyalty.
- Scalability:
- The models are scalable for real-time implementation in transaction monitoring systems.
-
Short-Term:
- Deploy Random Forest for immediate fraud detection.
- Monitor model performance regularly to ensure accuracy with new data.
-
Long-Term:
- Transition to deep learning models for larger datasets.
- Integrate the model with real-time systems for live fraud detection.
- Regularly retrain models to adapt to evolving fraud patterns.
- Clone the repository:
git clone https://github.com/your-username/credit-card-fraud-detection.git
- Incorporate additional data sources, such as IP addresses and transaction locations.
- Implement ensemble models for enhanced performance.
- Explore explainable AI methods to improve model transparency.