This repository accompanies the paper "A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation". It provides the implementation of ColonTCN, a Temporal Convolutional Network-based approach for segmenting colonoscopy videos into anatomical sections and procedural phases. The project leverages a benchmark dataset derived from the annotated REAL-Colon (RC) dataset, which features 2.7 million frames across 60 full-procedure videos, and proposed two k-fold validation splits and metrics to evaluate model performance.
Clone the repository and set up a virtual environment
git clone
cd temporal_segmentation
python -m venv venv && source venv/bin/activate # On macOS/Linux
venv\Scripts\activate # On Windows
Install the necessary dependencies from the requirements.txt
pip install -r requirements.txt
The benchmark dataset used in this project is the REAL-Colon (RC) dataset. Click here for instructions on automatically downloading, extracting, and preparing data splits for benchmarking temporal segmentation models.
Models are trained in a 4-fold or 5-fold setting on RC using the following command and specific configuration files for each fold.
CUDA_VISIBLE_DEVICES=0 python src/ -parFile ymls/training/colontcn_4fold/training_colontcn_4fold_fold1.yml
All configuration files for training a ColonTCN model in the 4-fold or 5-fold setting are reported at:
To test models in the 4-fold or 5-fold setting src/ on RC using the following command and specific configuration files for each fold.
CUDA_VISIBLE_DEVICES=0 python3 src/ -parFile ymls/inference/inference_testing_4fold_colontcn.yml
CUDA_VISIBLE_DEVICES=0 python3 src/ -parFile ymls/inference/inference_testing_5fold_colontcn.yml
To profile a model for its computational efficiency such as inference time and memory usage.
CUDA_VISIBLE_DEVICES=0 python src/ --config ymls/profiling/colontcn_4fold.yml
CUDA_VISIBLE_DEVICES=0 python src/ --config ymls/profiling/colontcn_5fold.yml
The following is an overview of the repository structure.
Files and directories marked as "(ignored)" are not included in the repository due to .gitignore
├── data/
│ ├── # Script to embed RC videos into video latent representations using a frame encoder
│ ├── dataset/
│ │ ├── RC_annotation/ # RC dataset annotations (CSVs) released with this work (ignored)
│ │ ├── RC_dataset/ # Raw RC dataset downloaded from Figshare (ignored)
│ │ ├── RC_embedded_dataset/ # RC dataset videos embedded with a frame encoder (ignored)
│ │ ├── RC_lists/ # Fold-based data splits (4-fold and 5-fold) for model benchmarking
│ ├── images/ # Images used in the repository (e.g., visualizations, results)
│ ├── ymls/ # YAML config files for dataset processing
│ ├── # Documentation for the `data/` directory
├── experiments/
│ ├── outputs/ # Training folders and Inference/testing results (ignored)
│ ├── visualizations/ # Output visualizations (ignored)
├── src/ # Main source code directory
│ ├── data_loader/
│ │ ├── # Data loader for embedding-based datasets
│ ├── feature_extraction/
│ │ ├── # Feature extraction module for processing RC videos
│ │ ├── # Frame-wise classification model
│ │ ├── # Handles video file reading and frame extraction
│ │ └── ymls/ # YAML config files for feature extraction
│ │ ├── feature_extraction_1x_RC.yml
│ │ ├── feature_extraction_5x_aug_RC.yml
│ ├── # Script for performing inference on the trained model
│ ├── # Script for testing inference across multiple data folds
│ ├── models/
│ │ ├── # Implementation of the Colontcn model
│ │ ├── # Model factory for loading different architectures
│ │ ├── # Custom model layers
│ ├── optimizers/
│ │ ├── # Optimizer builder functions
│ │ ├── # Loss functions for training
│ ├── # Profiling script to analyze performance
│ ├── # Unit tests for model evaluation
│ ├── # Main training script
│ └── utils/
│ ├── # Utility functions for file I/O operations
├── .gitignore # Specifies ignored files for version control
├── # Main project documentation
├── ymls/ # Folder containing Training/Testing/Profiling config files
[1] A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation - Paper under review.
[2] Biffi, C., Antonelli, G., Bernhofer, S., Hassan, C., Hirata, D., Iwatate, M., Maieron, A., Salvagnini, P., & Cherubini, A. (2024). REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Scientific Data, 11(1), 539.
For any inquiries, please open an issue in this repository or write at