WordDetection-Data-Generator

This python script will generate n pages of words with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord

Compatibility

The code is tested and developed in Ubuntu 20.04 and using Pyton 3.8.But the code has the realiability to run on most of the configuration . If you face issues , do open up an issue for this repo .All the package dependencies are mentioned in requirements.txt.

Arguments

For Word Generator
------------------
--output_dir: The datset images to be stored (default: dataset/)
--input_file: Text file contain random words for generator dataset pages
--background: Background Color (default: white)
--font_dir: Fonts to be used for generating dataset (default: fonts/)
--num_pages: Number of images of dataset need to be generated (default: 10)
--width: Width of the image (default: 600) in pixel
--height: Height of the image (default: 800) in pixel

For TFRecord Generator
----------------------
--csv_input: Ground truth labels csv file (default: ground_truth.csv)
--output_path: Location for tfrecord file need to be saved (default: dataset.tfrecord)
--dataset_dir: Dataset dir need to be used for images (default: dataset/)

Get Started

Install python 3.8 and requirements.txt to install the necessary dependencies
To run the word detection generator. python insert_word.py --input_file words.txt --num_pages 100
The dataset will be stored in dataset folder and the coordinates, ground truth values will be save in ground_truth.csv
To export as tfrecord file, python generate_tfrecord.py
To check the bbox drawn in image use cv_doc.py
Enjoy the dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

WordDetection-Data-Generator

Compatibility

Arguments

Get Started

Sample Image of Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

WordDetection-Data-Generator

Compatibility

Arguments

Get Started

Sample Image of Dataset