Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Add support for poTokens #11955

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
Open

Conversation

Stypox
Copy link
Member

@Stypox Stypox commented Jan 25, 2025

What is it?

  • Bugfix (user facing)

Description of the changes in your PR

General information about poTokens and about this PR structure:

  • YouTube now requires integrity checks to access their clients. The most "vulnerable" client is the WEB client, since they can't enforce integrity checks on all web browsers, so that's the only client (for now) that we have found a way to obtain an integrity token for.
  • In order to obtain a poToken, we need to run BotGuard, an obfuscated virtual machine implemented in JavaScript that performs the integrity checks and gives us an integrity token. In order to make the integrity checks succeed, we need to run this VM in an environment that resembles a browser as much as possible. The integrity token can be used to generate multiple poTokens. Two network requests are needed: Create to obtain the VM code, GenerateIT to obtain the integrity token after running the VM code. See the README here for the detailed steps.
  • PoTokenGenerator is the base class for all poToken generators. It has a factory method that allows asynchronously obtaining a new instance of a PoTokenGenerator, and then two methods to generate a poToken given a specific identifier, and a method to check if the integrity token has expired.
  • PoTokenWebView is currently the only implementation of PoTokenGenerator, but we might want to add other implementations in the future, e.g. ones that do not rely on WebView.
  • PoTokenProviderImpl implements the extractor interface and is supposed to take care of possibly multiple PoTokenGenerators (although right now there is only one based on WebView). It takes care of retrying in case of problems, recreates a new PoTokenGenerator if the current one expired, and finally returns a PoTokenResult. A PoTokenResult contains two poTokens: one for the specific requested video id (used to fetch the player), and another that can be generated only once as the first thing and is specific to a visitor data (used in streaming urls).

TODO:

  • The JavaScript poToken implementation comes from https://github.com/LuanRT/BgUtils
  • Obtaining a poToken via WebView
  • Obtaining a poToken with something like HtmlUnit not doable unfortunately
  • Handling devices that don't have a WebView (needs to be tested)
  • Passing the poToken to the extractor when requested
  • Passing the poToken to player network requests (not sure if needed?)
  • Understand whether we need to change user agent everywhere

You can test whether the poTokens generated work also using the latest yt-dlp commit from their git repo (older commits won't work!), this way (take PLAYER_POT, STREAMING_POT and VISITOR_DATA from logcat):

yt-dlp "https://www.youtube.com/watch?v=i_SsnRdgitA" --extractor-args 'youtube:player_client=web;player-skip=webpage,configs;po_token=web.player+PLAYER_POT,web.gvs+STREAMING_POT;visitor_data=VISITOR_DATA'

Fixes the following issue(s)

Relies on the following changes

APK testing

The APK can be found by going to the "Checks" tab below the title. On the left pane, click on "CI", scroll down to "artifacts" and click "app" to download the zip file which contains the debug APK of this PR. You can find more info and a video demonstration on this wiki page.

Re-uploaded version of the version built in this CI run: app.zip

Due diligence

@Stypox
Copy link
Member Author

Stypox commented Jan 26, 2025

Now the PR builds fine based on TeamNewPipe/NewPipeExtractor#1247, you can download the APK which uses poTokens! Let us know if you notice any issues.

private val TAG = PoTokenWebView::class.simpleName
private const val GOOGLE_API_KEY = "AIzaSyDyT5W0Jh49F30Pqqtyfdf7pDLFKLJoAnw"
private const val REQUEST_KEY = "O43z0dpjhgX20SCx4KAo"
private const val USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.3"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be Firefox ESR like in DownloaderImpl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for some reason it does not work with the Firefox user agent. It would work with the curl user agent though, I don't know why...

Copy link

@ale5000-git ale5000-git Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps checks on Firefox are stricter.
Just a deduction of what may be the cause:
Some webservers may also check the "case" of headers (User-Agent vs user-agent; this also apply to other headers) and even the order in which the headers are sent to the server may matter.
The WebView is more likely to have an implementation similar to Chrome, so it will be more likely to fail with Firefox User-Agent since the implementation differs completely.

@gechoto
Copy link

gechoto commented Jan 27, 2025

Would it be possible to move the po token implementation to a library?

Currently this is in NewPipe (the app repo) which makes it inaccessible by other apps which also have the need for po tokens.

This will lead to a lot of duplicate code because it needs to be implement over and over again for each YT client app.

Would be cool if this can be maintained in just one place (and multiple apps could benefit like it is already the case with NewPipeExtractor).

@Figim
Copy link

Figim commented Jan 27, 2025

Would it be possible to move the po token implementation to a library?

Currently this is in NewPipe (the app repo) which makes it inaccessible by other apps which also have the need for po tokens.

This will lead to a lot of duplicate code because it needs to be implement over and over again for each YT client app.

Would be cool if this can be maintained in just one place (and multiple apps could benefit like it is already the case with NewPipeExtractor).

You can recreate this PR in your own application.

This simply connects to the extractor to support the Potoken stream. You will need to do this separately in your application. It should have been like this.

@gechoto
Copy link

gechoto commented Jan 27, 2025

You can recreate this PR in your own application.

my point was this would be inefficient

If you want to implement this over and over again for each app - sure, go ahead.

Keep in mind that this will likely not be "done" after the initial implementation.
YT will probably try to break this solution every few months.

You will have to update the implementation in many places again. And again. And again...
What a great way to waste time.

If this was implemented in just one place as a library it would be easier for more developers to share efforts.
To me this sounds like a reasonable thing to discuss - if possible.

Comment on lines +48 to +55
// an asynchronous function runs in the background and it will eventually call
// `vmFunctionsCallback`, however we need to manually tell JavaScript to pass
// control to the things running in the background by interrupting this async
// function in any way, e.g. with a delay of 1ms. The loop is most probably not
// needed but is there just because.
for (let i = 0; i < 10000 && !this.vmFunctions.asyncSnapshotFunction; ++i) {
await new Promise(f => setTimeout(f, 1))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I … don’t think this is how async works. The timeout is just gonna be scheduled on a new task, but the code before the loop still runs on a microtask on the previous task.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this.vm.a seems to start a standalone task in the background or something like that, and we need to explicitly pass control back to the event loop by pausing this async execution, for the background task to finish executing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop actually executes only once as far as I know, I still put a loop because you never know

@Profpatsch
Copy link
Contributor

Can there be an architecture overview of this somewhere? From a skim of the code I don’t get any idea of what problem this solves or how the solution is structured.

@Stypox
Copy link
Member Author

Stypox commented Jan 27, 2025

  • YouTube now requires integrity checks to access their clients. The most "vulnerable" client is the WEB client, since they can't enforce integrity checks on all web browsers, so that's the only client (for now) that we have found a way to obtain an integrity token for.
  • In order to obtain a poToken, we need to run BotGuard, an obfuscated virtual machine implemented in JavaScript that performs the integrity checks and gives us an integrity token. In order to make the integrity checks succeed, we need to run this VM in an environment that resembles a browser as much as possible. The integrity token can be used to generate multiple poTokens. Two network requests are needed: Create to obtain the VM code, GenerateIT to obtain the integrity token after running the VM code. See the README here for the detailed steps.
  • PoTokenGenerator is the base class for all poToken generators. It has a factory method that allows asynchronously obtaining a new instance of a PoTokenGenerator, and then two methods to generate a poToken given a specific identifier, and a method to check if the integrity token has expired.
  • PoTokenWebView is currently the only implementation of PoTokenGenerator, but we might want to add other implementations in the future, e.g. ones that do not rely on WebView.
  • PoTokenProviderImpl implements the extractor interface and is supposed to take care of possibly multiple PoTokenGenerators (although right now there is only one based on WebView). It takes care of retrying in case of problems, recreates a new PoTokenGenerator if the current one expired, and finally returns a PoTokenResult. A PoTokenResult contains two poTokens: one for the specific requested video id (used to fetch the player), and another that can be generated only once as the first thing and is specific to a visitor data (used in streaming urls).

Let me know which places are not documented enough.

@Profpatsch
Copy link
Contributor

@Stypox I think it would be good to include this documentation into the source code somewhere, maybe in the interface module.

@Profpatsch
Copy link
Contributor

So that people who want to understand the code later don’t have to find this PR and looks through lots of issues first

@AudricV AudricV added bug Issue is related to a bug ASAP Issue needs to be fixed as soon as possible youtube Service, https://www.youtube.com/ labels Jan 31, 2025
@AudricV AudricV changed the title PoToken implementation to solve 403 errors [YouTube] Add support for PoTokens Jan 31, 2025
@Profpatsch
Copy link
Contributor

As a general comment, I don’t think it’s wise to add interfaces before even having need of a second implementation, it just makes the code more indirect and harder to read than necessary.

Copy link
Contributor

@Profpatsch Profpatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an “Couldn get HLS manifest” Extraction Exception, but the stream extraction seems to work just fine, so LGTM I’d say so we can create a release.

@AudricV AudricV self-assigned this Jan 31, 2025
app/build.gradle Outdated
implementation 'com.github.TeamNewPipe:NewPipeExtractor:v0.24.4'
implementation 'com.github.FireMasterK:NewPipeExtractor:d2cbd09089e8af933738f98b671ad58236a79d6e'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that the NewPipeExtractor still points to an unofficial repo, that needs to be fixed before merge

@Stypox
Copy link
Member Author

Stypox commented Jan 31, 2025

As a general comment, I don’t think it’s wise to add interfaces before even having need of a second implementation, it just makes the code more indirect and harder to read than necessary.

Yes I totally agree with you, but when I started working on this I hoped I would be able to create two implementations

@github-actions github-actions bot added size/giant PRs with more than 750 changed lines and removed size/large PRs with less than 750 changed lines labels Feb 1, 2025
Copy link

sonarqubecloud bot commented Feb 1, 2025

@Figim

This comment was marked as off-topic.

@ShareASmile

This comment was marked as off-topic.

) : PoTokenGenerator {
private val webView = WebView(context)
private val disposables = CompositeDisposable() // used only during initialization
private val poTokenEmitters = mutableListOf<Pair<String, SingleEmitter<String>>>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private val poTokenEmitters = mutableListOf<Pair<String, SingleEmitter<String>>>()
private val poTokenEmitters = ConcurrentHashMap<String, SingleEmitter<String>>()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a list here is totally fine. Small lists are much faster that hash maps and I don't expect this list to grow bigger than a couple of items.

Comment on lines +177 to +179
synchronized(poTokenEmitters) {
poTokenEmitters.add(Pair(identifier, emitter))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
synchronized(poTokenEmitters) {
poTokenEmitters.add(Pair(identifier, emitter))
}
poTokenEmitters[identifier] = emitter

Comment on lines +188 to +192
return synchronized(poTokenEmitters) {
poTokenEmitters.indexOfFirst { it.first == identifier }.takeIf { it >= 0 }?.let {
poTokenEmitters.removeAt(it).second
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return synchronized(poTokenEmitters) {
poTokenEmitters.indexOfFirst { it.first == identifier }.takeIf { it >= 0 }?.let {
poTokenEmitters.removeAt(it).second
}
}
return poTokenEmitters.remove(identifier)

Comment on lines +43 to +44
val shouldRecreate = webPoTokenGenerator == null || forceRecreate ||
webPoTokenGenerator!!.isExpired()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val shouldRecreate = webPoTokenGenerator == null || forceRecreate ||
webPoTokenGenerator!!.isExpired()
val shouldRecreate = forceRecreate || webPoTokenGenerator?.isExpired() == true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a possible syntax, but I preferred making the null check clear to have better clarity

*/
private fun runOnMainThread(
emitterIfPostFails: SingleEmitter<out Any>,
runnable: () -> Unit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runnable: () -> Unit,
runnable: Runnable,

* `webPoSignalOutput` previously obtained in the initialization of [PoTokenWebView]. Can be
* called multiple times.
*/
fun generatePoToken(identifier: String): Single<String>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be written as a suspend function instead? A runBlocking version could also be added to call this from Java.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can change it to coroutines when we convert this to the refactor branch, but I don't think we have coroutines anywhere in NewPipe non-refactor, so I didn't want to introduce new libraries and new ways of writing code, so I just stuck to RxJava.

@AudricV AudricV removed their assignment Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASAP Issue needs to be fixed as soon as possible bug Issue is related to a bug size/giant PRs with more than 750 changed lines youtube Service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[YouTube] HTTP error 403 for playback or download
10 participants