Skip to content

GSoC 2024

Boris Sekachev edited this page Feb 5, 2024 · 25 revisions

CVAT Google Summer of Code 2024

GSoC 2024 Homepage

CVAT accepted projects

Date Description Comment
February 6, 2024 Mentoring organization application deadline 👍


CVAT project ideas list

Mailing list to discuss: cvat-gsoc-2024 mailing list

Index to Ideas Below

  1. Load and visualize 16-bit medical images
  2. Keyboard shortcuts customization
  3. Quality control: consensus
  4. Quality control: honeypot
  5. Internationalization and localization

Idea Template

All work is in Python and TypeScript unless otherwise noted.


  1. IDEA: Load and visualize 16-bit medical images

    • Description: All digital projection X-ray in DICOM is more than 8 bits and hence encoded in two bytes, even if not all 16 bits are used. Right now CVAT converts 16-bit images into 8-bit. For medical images it leads to losing important information and it isn't possible to annotate such data efficiently. A doctor should adjust the contract of some regions manually to annotate such visual data.
    • Expected Outcomes:
      • Upload digital projection X-ray in DICOM and convert it to 16-bit PNG.
      • Visualize 16-bit PNG image in the browser using WebGL.
      • Implement brightness, inverting, contract, saturation using WebGL.
      • Import/Export datasets in CVAT format.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript, WebGL
    • Possible Mentors: Boris Sekachev
    • Difficulty: Hard
  2. IDEA: Keyboard shortcuts customization

    • Description: In many case to have good data annotation speed users need to use mouse, keyboard, and other input devices effectively. One way is to customize keyboard shortcuts and adapt them for a specific use case. For example, if you have several labels in your task, it can be important to assign a shortcut for each label and use them to switch quickly between them and annotate faster. Other users want to lock/unlock an object quickly.
    • Expected Outcomes:
      • It should be possible to configure shortcuts in settings and save them per user.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: TypeScript, React
    • Possible Mentors: Maria Khrustaleva, Kirill Lakhov
    • Difficulty: Medium
  3. IDEA: Quality control: consensus

    • Description: If you use crowd to annotate an image, the easiest way to get high quality annotations for a task is to annotate the same image multiple times. After that you can compare labels from multiple annotators to produce high-quality results. Let's say you try to estimate age of people. The task is very subjective. An averaged answer from multiple annotators can help you predict more precise age for a person.
    • Expected Outcomes:
      • It should be possible to create multiple jobs for the same segment of images (
      • Support a number of built-in algorithms to merge annotations for a segment: voting, averaging, raw (put all annotations as is)
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, Django
    • Possible Mentors: Maxim Zhiltsov
    • Difficulty: Medium
  4. IDEA: Internationalization and localization

    • Description: Typical users of CVAT are data annotators from different countries without good knowledge of English. It is very difficult for them to work with a tool which cannot show them messages, hints on their native language. The goal of internationalization and localization is to allow a single web application to offer its content in languages and formats tailored to the audience.
    • Expected Outcomes:
      • CVAT supports one more language. It should be easy to add a new language for a non-technical person.
      • It should be possible to choose a language in UI (e.g., en/fr).
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript
    • Possible Mentors: Andrey Zhavoronkov
    • Difficulty: Hard
  5. IDEA: Enhanced multi-object tracking

    • Description: Computer Vision Annotation Tool supports tracks (aka objects that detect something on a range of frames, e.g. a person, walking on a videofile). It would be nice to develop a feature to track a segmentation mask automatically with using modern deep learning approaches. Now the tool only supports single-object trackers. It consumes huge time when users run tracker for many objects. Moreover it supports only bounding boxes and can't be used for more complex objects (e.g. polygons or binary masks).
    • Expected Outcomes:
      • User uploads a video to CVAT, initiates automatic tracking process through the user interface (by drawing a bounding box, or polygon around the object, or pressing a dedicated button). Server side algorithm performs tracking on multiple frames and returns result to client. So, labeling speed is accelerated significantly.
    • Resources:

    • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
    • Possible Mentors: Boris Sekachev
    • Difficulty: Medium
  6. IDEA: Annotate everything automatically

    • Description: The feature suggests an idea to get instance segmentation for an image automatically for a wide range of classes. That may be achieved by using state-of-the art deep learning approaches (e.g. Grounding DINO and Segment Anything collaboration). These models may be integrated into CVAT to provide powerful feature for automatic annotation. It will allow data researchers to accelerate their annotation speed.
    • Expected Outcomes:
      • User uploads set of images to CVAT. For a dedicated image user may give text prompt to the model or just click a button in the user interface to get automatica predictions. A deep learning model is running on server on GPU.
    • Resources:

  • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
  • Possible Mentors: Boris Sekachev
  • Difficulty: Medium

Idea Template

1. #### _IDEA:_ <Descriptive Title>
   * ***Description:*** 3-7 sentences describing the task
   * ***Expected Outcomes:***
      * < Short bullet list describing what is to be accomplished >
      * <i.e. create a new module called "bla bla">
      * < Has method to accomplish X >
      * <...>
   * ***Resources:***
         * [For example a paper citation](
         * [For example an existing feature request](
         * [Possibly an existing related module]( that includes OpenCV JavaScript library.
   * ***Skills Required:*** < for example mastery plus experience coding in Python, college course work in vision that covers AI topics, python. Best if you have also worked with deep neural networks. >
   * ***Possible Mentors:*** < your name goes here >
   * ***Difficulty:*** <Easy, Medium, Hard>

Potential mentors list

Nikita Manovich
Boris Sekachev
Maxim Zhiltsov
Roman Donchenko
Andrey Zhavoronkov
Maria Khrustaleva
Kirill Lakhov


Nikita Manovich
Boris Sekachev 
Clone this wiki locally