Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce user datasets #72

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

CuriousDolphin
Copy link
Member

@CuriousDolphin CuriousDolphin commented Feb 6, 2025

Add User Cloud Dataset support

CONTEXT

The PR introduces cloud dataset management capabilities to the Focoos SDK, allowing users to create, upload, download, and manage datasets in the cloud. This enhancement enables better data organization and sharing capabilities for training models.

KEY CHANGES

  • Add new RemoteDataset class for managing cloud datasets
    • get_info
    • delete
    • delete-data
    • upload-data
    • download-data
  • Add dataset management methods to Focoos class:
    • list_datasets()
    • add_remote_dataset()
    • get_remote_dataset()
  • Refactor HTTP client into dedicated ApiClient class with improved file upload/download capabilities
  • Add dataset preview and spec models to support dataset metadata

IMPACT

  • Affects all SDK components that interact with remote datasets
  • Changes how HTTP requests are handled throughout the codebase
  • Introduces new dataset management workflows for users

- Implement `list_datasets()` method to retrieve dataset previews
- Add `add_remote_dataset()` method to create new remote datasets
- Introduce `get_remote_dataset()` method to fetch remote dataset by reference
- Update `ports.py` to include optional dataset specification
- Create new `remote_dataset.py` with RemoteDataset class for dataset operations
Copy link

github-actions bot commented Feb 7, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
__init__.py110100% 
focoos.py1352184%302–303, 345–356, 374, 377–381, 393
ports.py2440100% 
remote_dataset.py66493%71–72, 102, 116
remote_model.py1393674%230–232, 237, 241–242, 244–245, 249–250, 278–286, 288–291, 295–298, 302–307, 309–310, 367
utils
   api_client.py692957%40, 42, 141–144, 175, 191–203, 205–207, 209, 219–223
   system.py60788%11–12, 106–110
TOTAL115219583% 

Tests Skipped Failures Errors Time
118 0 💤 0 ❌ 0 🔥 15.271s ⏱️

Copy link

github-actions bot commented Feb 7, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
__init__.py110100% 
focoos.py1391986%362–373, 376, 379–383, 386
ports.py2530100% 
remote_dataset.py574521%13–15, 18–19, 22–28, 31–32, 34–39, 42–50, 54–59, 61, 67–69, 71–72, 75–78
remote_model.py1393674%230–232, 237, 241–242, 244–245, 249–250, 278–286, 288–291, 295–298, 302–307, 309–310, 367
utils
   api_client.py43783%37, 39, 138–141, 161
   system.py57591%102–106
TOTAL112720981% 

Tests Skipped Failures Errors Time
109 0 💤 0 ❌ 0 🔥 3.219s ⏱️

@fcdl94 fcdl94 marked this pull request as ready for review February 8, 2025 14:58
@fcdl94 fcdl94 marked this pull request as draft February 8, 2025 14:58
- Extract file download logic to ApiClient's new `download_file()` method
- Reduce code duplication in model download process
- Improve error handling and logging for file downloads
- Ensure consistent directory creation and file download behavior
…hods

- Add detailed docstrings for RemoteDataset class and its methods
- Improve logging with more informative messages
- Implement `download_data()` method to retrieve dataset files
- Refactor existing methods with better error handling and logging
- Add context-specific logging for dataset operations
- Enhance error handling in model download process
- Add more precise error logging for download failures
- Ensure metadata is only written after successful model download
- Update test cases to reflect new error handling behavior
@CuriousDolphin CuriousDolphin changed the title feat: add user dataset support feat: introduce user datasets Feb 12, 2025
… workflows

This commit updates the dataset management notebook with comprehensive functionality:
- Refactored dataset listing to show detailed dataset information
- Added methods for creating and uploading user datasets
- Implemented dataset download and cleanup functionality
- Updated notebook to use environment variables for API authentication
- Removed redundant datasets notebook and consolidated workflows
@CuriousDolphin CuriousDolphin marked this pull request as ready for review February 12, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant