Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download Images asynchronously in Image Processing #445

Merged
merged 4 commits into from
Jan 25, 2025
Merged

Conversation

nrjadkry
Copy link
Member

@nrjadkry nrjadkry commented Jan 23, 2025

Enhance Asynchronous S3 Image Downloading

  • Refactor DroneImageProcessor to support efficient async image downloads
  • Implement batched downloading with asyncio and aiohttp
  • Add concurrent download capabilities with configurable batch size
  • Improve error handling and logging during image retrieval

Changes include:

  • Updated download_images_from_s3 method to support async downloads
  • Implemented batch processing of image downloads
  • Added error handling and logging
  • Optimized pre-signed URL generation and download process

@spwoodcock
One question I would like to ask you is if we can support more than 10 parallel uploads. There is an option for parallel uploads in ODM. By default, it is 10 parallel downloads.

image

@nrjadkry nrjadkry requested a review from spwoodcock January 23, 2025 05:15
@github-actions github-actions bot added the backend Related to backend code label Jan 23, 2025
@nrjadkry nrjadkry self-assigned this Jan 23, 2025
Copy link
Member

@spwoodcock spwoodcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

See comment.

This isn't a perfect solution due to the difficulty of requiring pre-signed URLs.

Is having a private S3 bucket an integral part of the DroneTM platform? It adds significant complications (we can't have NodeODM simply download the files itself - best case we have to relay via our API for presigned URLs)


# generate pre-signed URL for each object
object_urls = [
get_presigned_url(bucket_name, obj.object_name, 12) for obj in objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great, except this is synchronous and may cause a bottleneck.

How did this perform during testing?

Any speed or memory issues as previously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading looks good with this implementation.
Also, the images are public in S3. So, instead of using presigned url, we can directly use download urls.
I will update this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! That makes things much easier and reduces the number of requests by half 😁

You can probably list all the image file names with one S3 call, then pass through the list of constructed URLs to the async downloader

@nrjadkry nrjadkry merged commit 85e9901 into develop Jan 25, 2025
2 checks passed
@spwoodcock
Copy link
Member

spwoodcock commented Jan 25, 2025

Do we need presigned URLs for the downloads @nrjadkry ?

It will significantly slow down the download process.

Can't we just construct the URL ourselves and download directly from the public bucket?

The purpose of pre signing is:

  • Private buckets
  • Access from a frontend that has no credentials for the private bucket

@nrjadkry
Copy link
Member Author

nrjadkry commented Jan 25, 2025

I have kept the presigned urls in case the bucket is private.
I have used both the cases here assuming users do not have download url in the environment if the bucket is private.
Here, in our case, since the bucket is public, it wont generate the presigned urls and instead construct the urls directly.

        s3_download_url = settings.S3_DOWNLOAD_ROOT
        if s3_download_url:
            object_urls = [f"{s3_download_url}/{obj.object_name}" for obj in objects]
        else:
            # generate pre-signed URL for each object
            object_urls = [
                get_presigned_url(bucket_name, obj.object_name, 12) for obj in objects
            ]

@spwoodcock
Copy link
Member

I have kept the presigned urls in case the bucket is private.
I have used both the cases here assuming users do not have download url in the environment if the bucket is private.
Here, in our case, since the bucket is public, it wont generate the presigned urls and instead construct the urls directly.

        s3_download_url = settings.S3_DOWNLOAD_ROOT
        if s3_download_url:
            object_urls = [f"{s3_download_url}/{obj.object_name}" for obj in objects]
        else:
            # generate pre-signed URL for each object
            object_urls = [
                get_presigned_url(bucket_name, obj.object_name, 12) for obj in objects
            ]

Oooh thanks for that snippet - missed it reviewing on my phone!

Makes sense, thanks 😁

@nrjadkry
Copy link
Member Author

@spwoodcock
This is working nicely. Thank you.
Also, can we do anything on uploads to ODM ? Have you got any idea ?

@spwoodcock
Copy link
Member

@spwoodcock
This is working nicely. Thank you.
Also, can we do anything on uploads to ODM ? Have you got any idea ?

So sorry @nrjadkry, its going to be a good few days until I can look at this. It needs some dedicated time to work out. I believe I had some ideas / possible solutions in threads here somewhere though.

OAM had a second issue today & I'm so far behind with FMTM 😅

Ping me if I forget please! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants