-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binder exception when restoring a large number of applications #563
Conversation
Awesome work! This looks a lot better than the previous attempt at fixing this. There's still some ktlint style complaints to fix. I'll also run this through manual tests before merging (as this is a rather critical code path). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a way to test this could be to install looooots of apps on the test device and then test with and without this PR applied.
} | ||
} | ||
// We need to retrieve the restore sets before starting the restore. | ||
// Otherwise, restorePackages() won't work as they need the restore sets cached internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've edited this, I guess you've tested this and even for restorePackages()
, we need to request the RestoreSet before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested without requesting the restore set, but I think it's likely to be the same, I can test without it once and see if the tests break to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that restore isn't working without this.
app/src/main/java/com/stevesoltys/seedvault/restore/RestoreViewModel.kt
Outdated
Show resolved
Hide resolved
app/src/main/java/com/stevesoltys/seedvault/restore/RestoreViewModel.kt
Outdated
Show resolved
Hide resolved
@@ -341,6 +372,13 @@ internal class RestoreViewModel( | |||
* as a whole failed. | |||
*/ | |||
override fun restoreFinished(result: Int) { | |||
|
|||
// Restore next chunk if successful and there are more packages to restore. | |||
if (result == 0 && packageIndex < packages.size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the result isn't 0
, should we still try to restore the other packages at least? Btw. is there a constant we can use in place of 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to constant, keeps going onto the next chunk if we fail.
1171356
to
6e14477
Compare
59f165f
to
6d7c5d6
Compare
6e14477
to
d8aba14
Compare
6d7c5d6
to
5311daf
Compare
Restore app data in smaller batches when performing a full restoration from a backup set, to prevent a Binder exception that causes the process to fail entirely. Android may encounter this exception when trying to call the transport.startRestore() method if too many packages are involved; in testing, 300 is an example of too many. Instead of using IRestoreSession.restoreAll(), use restorePackages() and provide the package names in batches of 100. This issue reveals itself when using SeedVault with the D2D patch and with an OS, such as stock Pixel OS, that includes an abundance of packages. (Prior to this patch, the call to restoreAll() meant that the framework would request data restoration for all packages installed, even if they were not in the metadata.) In logs, this issue appears as follows: ``` I BackupManagerService: Full restore; asking about 300 apps W BpBinder: Large outgoing transaction of 528540 bytes, interface descriptor <uncached descriptor>, code 14 E JavaBinder: !!! FAILED BINDER TRANSACTION !!! (parcel size = 528540) E BackupManagerService: Unable to contact transport for restore: data parcel size 528540 bytes ``` Change-Id: Ibb5bb4572d9e873beccd6056da5fe3ae4dce71c2
5311daf
to
6c7afd5
Compare
I could reproduce the issue that this MR is fixing by installing more than 500 apps: 2023-11-14 11:17:54.932 1522-2762 BackupManagerService pid-1522 I Full restore; asking about 515 apps
2023-11-14 11:17:54.942 1522-2762 BackupManagerService pid-1522 D Starting restore.
2023-11-14 11:17:54.943 1522-2762 BackupRestoreController pid-1522 I System restore starting for user: 0
2023-11-14 11:17:54.948 10154-10154 Configurab...ortService com.stevesoltys.seedvault D Service created.
2023-11-14 11:17:54.953 10154-10154 Configurab...ortService com.stevesoltys.seedvault D Transport bound.
2023-11-14 11:17:54.956 1522-1522 TransportConnection pid-1522 I ConfigurableBackupTransportService#1453: Notifying [PerformUnifiedRestoreTask.startRestore()] transport = BackupTransportClient
2023-11-14 11:17:54.969 1522-2762 BpBinder pid-1522 W Large outgoing transaction of 826056 bytes, interface descriptor , code 14
2023-11-14 11:17:54.969 1522-2762 JavaBinder pid-1522 E !!! FAILED BINDER TRANSACTION !!! (parcel size = 826056)
2023-11-14 11:17:54.970 1522-2762 BackupManagerService pid-1522 E Unable to contact transport for restore: data parcel size 826056 bytes
2023-11-14 11:17:15.769 2762-2762 binder pid-2762 I 1522:2762 transaction failed 29201/-28, size 826056-8 line 3360
2023-11-14 11:17:54.971 10154-10211 RestoreCoordinator com.stevesoltys.seedvault D finishRestore
2023-11-14 11:17:54.972 1522-2762 BackupRestoreController pid-1522 I systemRestoreFinished for 0
2023-11-14 11:17:54.972 1522-2762 BackupRestoreController pid-1522 I maybeSendWidgetRestoreBroadcasts for 0
2023-11-14 11:17:54.972 1522-2762 BackupManagerService pid-1522 I Restore complete.
2023-11-14 11:17:54.974 1522-2762 BackupManagerService pid-1522 V [UserID:0] Released wakelock:*backup*-0-2762
2023-11-14 11:17:54.975 10154-10154 Configurab...ortService com.stevesoltys.seedvault D Service destroyed.
2023-11-14 11:17:55.054 1522-18904 RestoreSession pid-1522 D endRestoreSession
2023-11-14 11:17:55.054 1522-2762 BackupManagerService pid-1522 V [UserID:0] Clearing restore session and halting timeout |
Something isn't working right here. I got some error in the beginning which showed an error message in the UI and unlocked the FINISH button, however probably due to the new code, the restore continued and was actually doing stuff. Then this pattern keeps repeating over and over: 2023-11-14 12:37:33.459 14334-14349 RestoreCoordinator com.stevesoltys.seedvault I Start restore with [@pm@, com.android.rkpdapp, com.caydey.ffshare, com.gianlu.aria2android, com.android.traceur, org.woheller69.audio_analyzer_for_android]
...
2023-11-14 12:37:40.245 14334-14349 RestoreCoordinator com.stevesoltys.seedvault I Start restore with [@pm@, org.xcsoar, deltazero.amarok.foss, com.android.settings, com.android.htmlviewer] Somehow for each of those it is getting all restore sets, skipping over the one it can't decrypt. Why is it doing a |
The error I got in the beginning may be a bug unrelated to this PR, us starting the restore session twice: 2023-11-14 12:53:35.697 16771-16812 RestoreViewModel com.stevesoltys.seedvault D Start InstallResult Flow
2023-11-14 12:53:35.713 16771-16812 RestoreViewModel com.stevesoltys.seedvault D Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.714 1522-1625 BackupManagerService pid-1522 V [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.716 16771-16813 RestoreViewModel com.stevesoltys.seedvault D Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.717 1522-1560 BackupManagerService pid-1522 V [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.717 16771-16814 RestoreViewModel com.stevesoltys.seedvault D Completed InstallResult Flow
2023-11-14 12:53:35.751 16771-16771 Configurab...ortService com.stevesoltys.seedvault D Service created.
2023-11-14 12:53:35.753 16771-16771 Configurab...ortService com.stevesoltys.seedvault D Transport bound.
2023-11-14 12:53:35.757 1522-1625 BackupManagerService pid-1522 I [UserID:0] Restore session requested but one already active
2023-11-14 12:53:35.760 1522-1560 BackupManagerService pid-1522 V [UserID:0] Acquired wakelock:*backup*-0-2762
2023-11-14 12:53:35.760 16771-16812 RestoreViewModel com.stevesoltys.seedvault E Error starting new session
android.os.RemoteException: beginRestoreSessionForUser returned null
at com.stevesoltys.seedvault.restore.RestoreViewModel.getOrStartSession(RestoreViewModel.kt:132)
at com.stevesoltys.seedvault.restore.RestoreViewModel.startRestore(RestoreViewModel.kt:199)
at com.stevesoltys.seedvault.restore.RestoreViewModel.access$startRestore(RestoreViewModel.kt:74)
at com.stevesoltys.seedvault.restore.RestoreViewModel$onNextClickedAfterInstallingApps$1.invokeSuspend(RestoreViewModel.kt:183)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664) |
For some reason, the backup metadata isn't cached, so in Given this and the fact that each chunks also does a |
The optimization for not having to re-fetch the backup metadata was to call @stevesoltys are you still following? So action items I see:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. There's still the issue that one chunk error will enable finish button and show error message before end and that restore session gets started twice, but we can resolve these in other MRs.
Copy of #484 with review comment changes.