Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binder exception when restoring a large number of applications #563

Merged
merged 4 commits into from
Nov 27, 2023

Conversation

stevesoltys
Copy link
Member

@stevesoltys stevesoltys commented Sep 24, 2023

Copy of #484 with review comment changes.

@stevesoltys stevesoltys requested a review from grote September 24, 2023 23:21
@grote
Copy link
Collaborator

grote commented Sep 25, 2023

Awesome work! This looks a lot better than the previous attempt at fixing this. There's still some ktlint style complaints to fix. I'll also run this through manual tests before merging (as this is a rather critical code path).

@grote grote linked an issue Sep 25, 2023 that may be closed by this pull request
3 tasks
Copy link
Collaborator

@grote grote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a way to test this could be to install looooots of apps on the test device and then test with and without this PR applied.

}
}
// We need to retrieve the restore sets before starting the restore.
// Otherwise, restorePackages() won't work as they need the restore sets cached internally.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've edited this, I guess you've tested this and even for restorePackages(), we need to request the RestoreSet before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested without requesting the restore set, but I think it's likely to be the same, I can test without it once and see if the tests break to confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that restore isn't working without this.

@@ -341,6 +372,13 @@ internal class RestoreViewModel(
* as a whole failed.
*/
override fun restoreFinished(result: Int) {

// Restore next chunk if successful and there are more packages to restore.
if (result == 0 && packageIndex < packages.size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the result isn't 0, should we still try to restore the other packages at least? Btw. is there a constant we can use in place of 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to constant, keeps going onto the next chunk if we fail.

@stevesoltys stevesoltys force-pushed the bugfix/binder-exception-too-many-packages branch from 59f165f to 6d7c5d6 Compare November 1, 2023 01:14
@stevesoltys stevesoltys force-pushed the bugfix/binder-exception-too-many-packages branch from 6d7c5d6 to 5311daf Compare November 1, 2023 01:25
t-m-w and others added 2 commits October 31, 2023 21:38
Restore app data in smaller batches when performing a full restoration
from a backup set, to prevent a Binder exception that causes the
process to fail entirely.

Android may encounter this exception when trying to call the
transport.startRestore() method if too many packages are involved; in
testing, 300 is an example of too many. Instead of using
IRestoreSession.restoreAll(), use restorePackages() and provide the
package names in batches of 100.

This issue reveals itself when using SeedVault with the D2D patch and
with an OS, such as stock Pixel OS, that includes an abundance of
packages. (Prior to this patch, the call to restoreAll() meant that the
framework would request data restoration for all packages installed,
even if they were not in the metadata.)

In logs, this issue appears as follows:
```
I BackupManagerService: Full restore; asking about 300 apps
W BpBinder: Large outgoing transaction of 528540 bytes, interface descriptor <uncached descriptor>, code 14
E JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 528540)
E BackupManagerService: Unable to contact transport for restore: data parcel size 528540 bytes
```

Change-Id: Ibb5bb4572d9e873beccd6056da5fe3ae4dce71c2
@stevesoltys stevesoltys force-pushed the bugfix/binder-exception-too-many-packages branch from 5311daf to 6c7afd5 Compare November 1, 2023 01:38
@stevesoltys stevesoltys changed the base branch from feature/d2d-transfer to android14 November 1, 2023 01:38
@grote
Copy link
Collaborator

grote commented Nov 14, 2023

I could reproduce the issue that this MR is fixing by installing more than 500 apps:

2023-11-14 11:17:54.932  1522-2762  BackupManagerService    pid-1522                             I  Full restore; asking about 515 apps
2023-11-14 11:17:54.942  1522-2762  BackupManagerService    pid-1522                             D  Starting restore.
2023-11-14 11:17:54.943  1522-2762  BackupRestoreController pid-1522                             I  System restore starting for user: 0
2023-11-14 11:17:54.948 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Service created.
2023-11-14 11:17:54.953 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Transport bound.
2023-11-14 11:17:54.956  1522-1522  TransportConnection     pid-1522                             I  ConfigurableBackupTransportService#1453: Notifying [PerformUnifiedRestoreTask.startRestore()] transport = BackupTransportClient
2023-11-14 11:17:54.969  1522-2762  BpBinder                pid-1522                             W  Large outgoing transaction of 826056 bytes, interface descriptor , code 14
2023-11-14 11:17:54.969  1522-2762  JavaBinder              pid-1522                             E  !!! FAILED BINDER TRANSACTION !!!  (parcel size = 826056)
2023-11-14 11:17:54.970  1522-2762  BackupManagerService    pid-1522                             E  Unable to contact transport for restore: data parcel size 826056 bytes
2023-11-14 11:17:15.769  2762-2762  binder                  pid-2762                             I  1522:2762 transaction failed 29201/-28, size 826056-8 line 3360
2023-11-14 11:17:54.971 10154-10211 RestoreCoordinator      com.stevesoltys.seedvault            D  finishRestore
2023-11-14 11:17:54.972  1522-2762  BackupRestoreController pid-1522                             I  systemRestoreFinished for 0
2023-11-14 11:17:54.972  1522-2762  BackupRestoreController pid-1522                             I  maybeSendWidgetRestoreBroadcasts for 0
2023-11-14 11:17:54.972  1522-2762  BackupManagerService    pid-1522                             I  Restore complete.
2023-11-14 11:17:54.974  1522-2762  BackupManagerService    pid-1522                             V  [UserID:0] Released wakelock:*backup*-0-2762
2023-11-14 11:17:54.975 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Service destroyed.
2023-11-14 11:17:55.054  1522-18904 RestoreSession          pid-1522                             D  endRestoreSession
2023-11-14 11:17:55.054  1522-2762  BackupManagerService    pid-1522                             V  [UserID:0] Clearing restore session and halting timeout

@grote
Copy link
Collaborator

grote commented Nov 14, 2023

Something isn't working right here. I got some error in the beginning which showed an error message in the UI and unlocked the FINISH button, however probably due to the new code, the restore continued and was actually doing stuff.

Then this pattern keeps repeating over and over:

2023-11-14 12:37:33.459 14334-14349 RestoreCoordinator      com.stevesoltys.seedvault            I  Start restore with [@pm@, com.android.rkpdapp, com.caydey.ffshare, com.gianlu.aria2android, com.android.traceur, org.woheller69.audio_analyzer_for_android]
...
2023-11-14 12:37:40.245 14334-14349 RestoreCoordinator      com.stevesoltys.seedvault            I  Start restore with [@pm@, org.xcsoar, deltazero.amarok.foss, com.android.settings, com.android.htmlviewer]

Somehow for each of those it is getting all restore sets, skipping over the one it can't decrypt. Why is it doing a @pm@ restore for each set? Is this needed? What is causing it to re-load restore sets for each chunk?

@grote
Copy link
Collaborator

grote commented Nov 14, 2023

The error I got in the beginning may be a bug unrelated to this PR, us starting the restore session twice:

2023-11-14 12:53:35.697 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            D  Start InstallResult Flow
2023-11-14 12:53:35.713 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            D  Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.714  1522-1625  BackupManagerService    pid-1522                             V  [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.716 16771-16813 RestoreViewModel        com.stevesoltys.seedvault            D  Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.717  1522-1560  BackupManagerService    pid-1522                             V  [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.717 16771-16814 RestoreViewModel        com.stevesoltys.seedvault            D  Completed InstallResult Flow
2023-11-14 12:53:35.751 16771-16771 Configurab...ortService com.stevesoltys.seedvault            D  Service created.
2023-11-14 12:53:35.753 16771-16771 Configurab...ortService com.stevesoltys.seedvault            D  Transport bound.
2023-11-14 12:53:35.757  1522-1625  BackupManagerService    pid-1522                             I  [UserID:0] Restore session requested but one already active
2023-11-14 12:53:35.760  1522-1560  BackupManagerService    pid-1522                             V  [UserID:0] Acquired wakelock:*backup*-0-2762
2023-11-14 12:53:35.760 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            E  Error starting new session
                                                                                                    android.os.RemoteException: beginRestoreSessionForUser returned null
                                                                                                    	at com.stevesoltys.seedvault.restore.RestoreViewModel.getOrStartSession(RestoreViewModel.kt:132)
                                                                                                    	at com.stevesoltys.seedvault.restore.RestoreViewModel.startRestore(RestoreViewModel.kt:199)
                                                                                                    	at com.stevesoltys.seedvault.restore.RestoreViewModel.access$startRestore(RestoreViewModel.kt:74)
                                                                                                    	at com.stevesoltys.seedvault.restore.RestoreViewModel$onNextClickedAfterInstallingApps$1.invokeSuspend(RestoreViewModel.kt:183)
                                                                                                    	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                                                                                                    	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
                                                                                                    	at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
                                                                                                    	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
                                                                                                    	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)

@grote
Copy link
Collaborator

grote commented Nov 14, 2023

For some reason, the backup metadata isn't cached, so in RestoreCoordinator#startRestore(), we end up calling getAvailableMetadata() for all chunks.

Given this and the fact that each chunks also does a @pm@ restore, we should probably increase the chunk size as much as can be considered safe.

@grote
Copy link
Collaborator

grote commented Nov 14, 2023

The optimization for not having to re-fetch the backup metadata was to call restoreCoordinator.beforeStartRestore(restorableBackup.backupMetadata) which is only done once in this PR, not before each start.

@stevesoltys are you still following? So action items I see:

  • increase chunk size to something large, maybe 100?
  • ensure metadata is cached, probably by called beforeStartRestore() before each start
  • figure out how to improve error reporting and enabling finish button fits into this multi-start approach
  • find out why we (sometimes?) call beginRestoreSession() twice.

Copy link
Collaborator

@grote grote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. There's still the issue that one chunk error will enable finish button and show error message before end and that restore session gets started twice, but we can resolve these in other MRs.

@grote grote merged commit 3928092 into android14 Nov 27, 2023
4 checks passed
@grote grote deleted the bugfix/binder-exception-too-many-packages branch November 27, 2023 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to back up all apps by pretending to be a device to device transport
3 participants