Skip to content

KowshikB03/AI-Handwriting-Detection

Repository files navigation

AI Handwriting detection of digit 2 with one input image

The idea is to use only 1 image as input and train a model on a large dataset generated from that single image. This approach allows the model to learn effectively even when starting with only one image.

I train CNN with VAE generated images instead of augmented images because,

  1. Captures the underlying data distribution, generating images consistent with the training data.
  2. Produces more diverse and realistic samples by sampling from the latent space.
  3. Enables smooth interpolation between data points, creating meaningful intermediate images.
  4. Improves generalization by generating samples that better represent unseen data.
  5. Handles noise and uncertainty naturally through the probabilistic nature of the latent space.
  6. Provides a compact representation of data, making generation efficient and scalable.
  7. Allows controlled generation by manipulating specific dimensions of the latent space.
  8. Helps address class imbalance by generating additional samples for underrepresented classes.
  9. Learns end-to-end, optimizing both reconstruction and generation for the task.
  10. Scales easily to generate a large number of new images by sampling from the latent space.
  11. Offers an interpretable latent space for understanding and manipulating image features.
  12. Ignores outliers by focusing on the underlying data distribution.
  13. Combines reconstruction and generation capabilities, making it versatile for various tasks.
  14. Performs well in limited data scenarios by generating realistic samples from small datasets.
  15. Enables feature disentanglement, allowing control over specific attributes like rotation or thickness.
  16. Reduces overfitting by generating new samples rather than relying on fixed transformations.
  17. Provides a probabilistic framework for understanding data variability and uncertainty.
  18. Enhances model robustness by exposing it to a wider range of realistic variations.
  19. Supports creative applications like style transfer and image editing through latent space manipulation.
  20. Outperforms traditional augmentation in generating high-quality, diverse, and meaningful samples.

Convolutional Neural Networks (CNNs) are the best neural networks for image recognition because they are specifically designed to process and analyze visual data. Unlike traditional neural networks, CNNs use convolutional layers to automatically and efficiently extract spatial features like edges, textures, and patterns from images. These layers apply filters that slide over the input, capturing local relationships and reducing the number of parameters needed, making CNNs computationally efficient. Additionally, pooling layers in CNNs downsample the feature maps, further reducing dimensionality and enhancing translation invariance. CNNs also leverage hierarchical learning, where early layers detect simple features (e.g., edges) and deeper layers identify complex structures (e.g., shapes or objects). This architecture, combined with their ability to handle large-scale image datasets, makes CNNs highly effective and widely used for tasks like image classification, object detection, and segmentation.

Image Augmentation

Goal

The goal of this section is to create variations of an input image (e.g., a handwritten digit) by applying various transformations. These variations are used to train the VAE models.

Steps

  1. Load the Input Image:The input image is loaded and converted to grayscale.
  2. Morphological Operations: binary_erosion and binary_dilation modify the edges of the digit, making it thicker or thinner.
  3. Geometric Transformations: Random rotations and translations introduce diversity in the dataset.
  4. Brightness/Contrast Adjustments: These changes simulate different lighting conditions.
  5. Blurring: Adds noise and smoothness to the images.
  6. Generate 100 Variations: Finally applying these variations create new 100 images

Image Augmentation 100 variations

Variational Autoencoder (VAE)

Goal

The VAE is used to generate new images by sampling from the latent space. It learns a lower-dimensional representation of the input data (latent space) of the 100 input images from augmentation step and generates new samples by decoding random points in this latent space.

Steps

  1. Encoder:
  • Maps the input image to a latent space representation (mean and log variance).
  • Uses fully connected layers to compress the input into a lower-dimensional latent space.
  1. Reparameterization Trick:
  • Samples a point in the latent space using the mean and log variance.
  • Allows backpropagation through the sampling process.
  1. Decoder:
  • Maps the sampled latent point back to the original image space.
  • Uses fully connected layers to reconstruct the image.
  1. Loss Function: Combines reconstruction loss (how well the input is reconstructed) and KL divergence (how close the latent space is to a standard Gaussian).
  2. Generate New Images:
  • Sample random points from the latent space (prior distribution, usually a standard Gaussian).
  • Decode these points to generate new images.

VAE Generated 100 images

Latent Space Analysis

This latent space visualization uses t-SNE to reduce high-dimensional data to a 2D representation while maintaining its structure. Here are some key advantages of this latent space:

  1. Clear Cluster Separation:
  • The visualization shows distinct clusters, indicating that the latent space effectively separates different data points.
  • This suggests that the model has learned meaningful representations of the underlying structure in the data.
  1. Balanced Distribution:
  • The points are not overly concentrated in a single region, preventing issues like overfitting or mode collapse.
  • Well-distributed clusters suggest the latent space captures diverse data patterns effectively.

Latent space analysis

Convolutional Neural Network (CNN)

Goal

The CNN is trained to classify images into three categories:

  1. Your handwriting (digit "2").
  2. MNIST digit "2".
  3. Other digits.

Steps

  1. Prepare Dataset:
  • Combined my handwritten 2 digit samples, MNIST digit "2" samples, and other MNIST digits.
  • Normalize and reshape the data for CNN input.
  1. Build the CNN:
  • Use convolutional layers to extract features from the images.
  • Use max-pooling layers to reduce spatial dimensions.
  • Use fully connected layers for classification.
  1. Convolutional Layers: Extract spatial features from the images. CNN Layer 1 CNN Layer 2 CNN Layer 3
  2. Max-Pooling Layers: Reduce the spatial dimensions while retaining important features.
  3. Fully Connected Layers: Combine features for classification.
  4. Softmax Activation: Outputs probabilities for the three classes.
  5. Train the CNN:
  • Use categorical cross-entropy as loss function and the Adam optimizer.
  • Train for 5 epochs.

CNN Architecture CNN Prediction

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published