Skip to content

OpenArt is a Python-based project designed to make open art data accessible and ready for use. It currently supports data from the National Gallery of Art and the Metropolitan Museum of Art, automating the process of downloading, extracting, and converting open data into a SQLite database.

License

Notifications You must be signed in to change notification settings

Wartem/open_art

Repository files navigation

Dependencies

Python attrs beautifulsoup4 chromedrivermanager numpy pandas python-dotenv pywin32 requests selenium webdriver-manager wget wikipedia

OpenArt

Making open art data ready to use

Currently Supported:

  • National Gallery of Art:
    CSV files are automatically downloaded, extracted from open data ZIP files, and converted into a SQLite database file.

  • Metropolitan Museum of Art:
    This module, currently used as a script, processes open data from the Metropolitan Museum of Art (The Met). It downloads and filters the museum's public domain painting data, then enriches it by scraping image URLs from The Met's website. Key features include:

    • Reads and filters The Met's open access CSV file
    • Uses Selenium WebDriver to fetch high-resolution image URLs for each painting
    • Creates a new CSV file with filtered and enriched painting data
    • Implements incremental updates, avoiding duplicate entries
    • Handles errors and continues processing if issues arise with specific entries

    The resulting dataset includes detailed information about public domain paintings from The Met, complete with direct links to high-quality images, ready for integration into the Open Art Viewer.

OpenArt Project with focus on National Gallery of Art (NGA)

Overview

OpenArt is a Python-based project designed to download, process, and manage art data from the National Gallery of Art (NGA). As mentioned above, Metropolitan Museum of Art is currently standalone. OpenArt automates the retrieval of open data, processes it, and prepares it for further use or analysis.

How It Works

  1. Data Retrieval

    • The NGA class handles downloading data from the National Gallery of Art's GitHub repository.
    • It checks for updates by comparing local file dates with the latest commit date on GitHub.
    • If an update is needed, it downloads a ZIP file containing the latest data.
  2. Data Extraction and Processing

    • The downloaded ZIP file is extracted to a specified directory.
    • CSV files, particularly 'objects.csv' and 'published_images.csv', are processed.
    • The fix_nga_csv_in_folder method prepares these files for merging.
  3. Data Merging and Cleaning

    • The merge method combines data from 'objects.csv' and 'published_images.csv'.
    • Unwanted columns are removed, and image properties are adjusted.
    • The resulting data is saved back to a CSV file, with redundant files removed.
  4. User Interface

    • A menu-driven interface (console) allows users to:
      • Download new data
      • Extract and process existing data
      • View file information
      • Perform database operations (SQLite)
  5. Constants and Configuration

    • The Constants class centralizes important variables and paths used throughout the project.
  6. File Handling

    • The project uses both os and pathlib for robust file and directory management across different operating systems.
  7. Error Handling and Logging

    • The code includes error handling for download issues, file processing errors, and API requests.
  8. Third-party Libraries

    • Utilizes libraries like requests for API calls, pandas for data manipulation, and BeautifulSoup for web scraping.

Code Authorship Declaration

To the best of my knowledge, all or nearly all code in this project was written by me (Wartem), rather than generated by artificial intelligence or automated code generation tools. I utilized AI assistance to help identify and fix bugs in August 2024.

Moving forward, I plan to refactor this codebase with the aid of AI tools. However, this refactoring process will be conducted in a balanced manner, ensuring that the project's core structure and logic remain primarily my own work. The AI will be used as a tool to suggest improvements, optimize code, and help with best practices, but all final decisions and implementations will be made by me. This approach aims to enhance the project while maintaining my role as the primary author and architect of the codebase. This declaration reflects my commitment to transparency about the use of AI in the development process, while also affirming my central role in the project's creation and evolution.

To-Do

Expand Open Data Collection

  • Objective: Enhance the project by integrating open data from additional museums beyond the National Gallery of Art (NGA), like Metropolitan Museum of Art.

Open Art Viewer - Example Usage: GUI with Unity

The SQLite file created by OpenArt can be directly used with the Open Art Viewer.
Art Viewer You can download it here: Open Art Viewer 1.0 (SQLite file is already included).

Screenshot

Open Art Web Viewer

This art viewer is based on Flask and HTML: Open Art Web Viewer Project

Live demo

OpenArtWebViewer 2024-08-27 113110

About

OpenArt is a Python-based project designed to make open art data accessible and ready for use. It currently supports data from the National Gallery of Art and the Metropolitan Museum of Art, automating the process of downloading, extracting, and converting open data into a SQLite database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages