Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TorchModuleWrapper serialization issue #20869

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Surya2k1
Copy link
Contributor

@Surya2k1 Surya2k1 commented Feb 6, 2025

Currently model with TorchModuleWrapper fails to save with the following error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

I have tried with different encoding as experimental and found 'latin-1' works fine for saving and reloading along with some code changes to TorchModuleWrapper class. Atleast this change worked for the minimal code snippet as mentioned in #20860

May fixes #20860

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.

Project coverage is 76.74%. Comparing base (c04cf9d) to head (ba24025).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
keras/src/utils/torch_utils.py 0.00% 6 Missing ⚠️
keras/src/saving/serialization_lib.py 60.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c04cf9d) and HEAD (ba24025). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (c04cf9d) HEAD (ba24025)
keras 5 4
keras-torch 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20869      +/-   ##
==========================================
- Coverage   82.24%   76.74%   -5.50%     
==========================================
  Files         561      561              
  Lines       52633    52640       +7     
  Branches     8137     8139       +2     
==========================================
- Hits        43288    40400    -2888     
- Misses       7340    10282    +2942     
+ Partials     2005     1958      -47     
Flag Coverage Δ
keras 76.65% <27.27%> (-5.40%) ⬇️
keras-jax 64.22% <27.27%> (-0.01%) ⬇️
keras-numpy 59.03% <27.27%> (-0.01%) ⬇️
keras-openvino 32.52% <0.00%> (-0.01%) ⬇️
keras-tensorflow 64.84% <27.27%> (-0.01%) ⬇️
keras-torch ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MicheleCattaneo
Copy link

It seems like you are trying to replace utf-8 with latin-1. I think that it would not fix the issue as both encodings are meant for text rather than arbitrary data. I wonder, what could have been the reason for using utf-8 to begin with?

@fchollet
Copy link
Collaborator

Thanks for the PR. I think this issue calls for a different fix -- what's the best way to serialize bytes (to go in a JSON file) in the general case? Most online solutions seem to recommend utf-8, but clearly this isn't fully general.

@MicheleCattaneo
Copy link

Thanks for the PR. I think this issue calls for a different fix -- what's the best way to serialize bytes (to go in a JSON file) in the general case? Most online solutions seem to recommend utf-8, but clearly this isn't fully general.

Is it possible that we need to first use something like base64 in get_config? The value of buffer.getvalue() are raw bytes that are placed in a dictionary, they later will be forced into a JSON, which I think should only accept strings (sequences of unicode characters) rather than bytes sequences. Once that is a valid string, then writing the JSON string to a file should hopefully work fine with utf-8.
What do you think?

@fchollet
Copy link
Collaborator

That sounds like something we can try. If we do that, we should introduce a new kind of object (not __bytes__) since we need to be backwards compatible with already saved files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Assigned Reviewer
Development

Successfully merging this pull request may close these issues.

TorchModuleWrapper serialization issue
5 participants