Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] BackingImage does not download URL correctly in some situation #7914

Closed
votdev opened this issue Feb 12, 2024 · 2 comments
Closed

[BUG] BackingImage does not download URL correctly in some situation #7914

votdev opened this issue Feb 12, 2024 · 2 comments
Assignees
Labels
area/backing-image Backing image related backport/1.5.4 backport/1.6.1 kind/bug require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@votdev
Copy link

votdev commented Feb 12, 2024

Describe the bug

This issue refers to harvester/harvester#5126

When a backing image is created from URL, e.g. an ISO hosted on SourceForge, then the ISO is not downloaded, instead the HTML code of the download page is downloaded.

This is because the Go http client is automatically adding the Referer header on each request. For retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them this is a correct behaviour. But as explained below, this seems not the default behaviour of well-known tools.

To Reproduce

Create a backing image with Download from URL using URL https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso. The file is download, but not the 500 MiB ISO, instead the HTML code of the download page which is about 100 KiB.

grafik

Expected behavior

The ISO should be downloaded, not the HTML code of the download page.

The default behaviour of the Go http client should be overwritten, so that the Referer header is not sent.

The well known tools wget and curl behave different to the Go http client. You need to explicitly enable the Referer behaviour for both tools. Out-of-the-box both can download the mentioned ISO without problems.

Support bundle for troubleshooting

n/a

Environment

Vagrant based Harvester cluster (developer environment) using Longhorn 1.6

Additional context

To understand why curl is able to download the ISO image, here is a condensed output of the download:

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*

< HTTP/2 301 
< location: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/
< set-cookie: VISITOR=e7cd4d94-a480-4cf6-90c2-4c97923c5b28; Max-Age=315360000; Path=/; expires=Thu, 09-Feb-2034 09:03:47 GMT; secure; HttpOnly

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/ HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*

< HTTP/2 301 
< location: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/download
< set-cookie: VISITOR=f5da466d-de96-40fe-8a86-7b4376b54de1; Max-Age=315360000; Path=/; expires=Thu, 09-Feb-2034 09:03:48 GMT; secure; HttpOnly

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/download HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*

< HTTP/2 302 
< location: https://downloads.sourceforge.net/project/clonezilla/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso?ts=gAAAAABlyd700SFHA3vzPtby73BPQXfgWQotboIYp2t7gctCQl46xTTTWdx0oQ-CzPw9cZb4VWIunUAm5ghF0VvWK-X5mbL_Cw%3D%3D&use_mirror=altushost-swe&r=
< set-cookie: VISITOR=e90e54f1-81d0-45b7-9282-c17406d3bc8c; Max-Age=315360000; Path=/; expires=Thu, 09-Feb-2034 09:03:48 GMT; secure; HttpOnly

> GET /project/clonezilla/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso?ts=gAAAAABlyd700SFHA3vzPtby73BPQXfgWQotboIYp2t7gctCQl46xTTTWdx0oQ-CzPw9cZb4VWIunUAm5ghF0VvWK-X5mbL_Cw%3D%3D&use_mirror=altushost-swe&r= HTTP/2
> Host: downloads.sourceforge.net
> user-agent: curl/7.81.0
> accept: */*

< HTTP/2 302 
< location: https://altushost-swe.dl.sourceforge.net/project/clonezilla/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso
< set-cookie: sf_mirror_attempt=clonezilla:altushost-swe:clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso; Max-Age=120; Path=/; expires=Mon, 12-Feb-2024 09:05:49 GMT

> GET /project/clonezilla/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso HTTP/1.1
> Host: altushost-swe.dl.sourceforge.net
> User-Agent: curl/7.81.0
> Accept: */*

As shown, curl does not send a Referer header, even when following the Location header on redirects.

Cross-check... make curl automatically set the previous URL when it follows a Location header.

$ curl --referer ";auto" -L -v -s -o /dev/null -D - https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*

< HTTP/2 301 
< location: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/ HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*
> referer: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso

< HTTP/2 301 
< location: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/download

> GET /projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/download HTTP/2
> Host: sourceforge.net
> user-agent: curl/7.81.0
> accept: */*
> referer: https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso/

< HTTP/2 200 

This will download the HTML download page only, not the ISO file.

wget will also download the ISO by default, e.g. with

$ wget -v -d -O /dev/null https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso

Test case

  1. Go to Setting > Backing Image and press Create Backing Image.
  2. Enter https://sourceforge.net/projects/clonezilla/files/clonezilla_live_alternative/20240116-mantic/clonezilla-live-20240116-mantic-amd64.iso as URL and press OK.
  3. The ISO image should be downloaded after some while with a size of ~477MiB.
@longhorn-io-github-bot
Copy link
Collaborator

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

  • Which areas/issues this PR might have potential impacts on?
    Area
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@yangchiu
Copy link
Member

yangchiu commented Mar 1, 2024

Verified passed on master-head (backing-image-manager 5c9b6e5) following the test plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backing-image Backing image related backport/1.5.4 backport/1.6.1 kind/bug require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
Status: No status
Status: Closed
Development

No branches or pull requests

4 participants