-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgraded strings yield wrong results. #36
Comments
Thanks! Any chance you could send a pull request? I'm sure @salortiz would accept it. |
As your code shows, with MLDB_File what you put in is what you get out, so if you need any particular encoding your code should use the standard Only If your code use a proper UTF-8 environment (via And in LMDB_File's UTF8 mode a lexical |
My code was admittedly contrived—production code doesn’t normally call Here are a couple more feasible ways of currently arriving at upgraded strings:
Of course, those won’t necessarily be true for any given Perl version, and that’s the point of the abstraction: things that aren’t the Perl interpreter shouldn’t care how Perl internally stores its strings. If two SVs
Perl’s string handling isn’t all that complex; it’s just a bit buggy and weirdly documented. But the idea is that two strings are equal if they store the same code points, regardless of whether Perl stores them with the same internal encoding. Calling SvPVbyte would thus actually be less clever than calling SvPV since SvPVbyte always yields a consistent result regardless of Perl’s internals. |
@salortiz I guess you’re in the same boat as CDB_File … this problem can’t really be fixed without potentially breaking existing applications. We can work around this problem by |
I think that Perl (and SvPVs in particular) violates your expectations. But that don't means that Perl nor The module assumes that the user known what its data contains. Your affirmation that
is plain false, the result don't even have the same length: use v5.12; # Need 'say'
use Encode;
my $decoded = decode("latin1", my $l1acute = "\x{e1}");
say bytes::length $l1acute; # say "1"
say bytes::length $decoded; # say "2" In your original example I don't known what do you expect to be in
(You can fix your expectations adding a As both If you think that |
@salortiz You’re using bytes.pm; that module’s own documentation emphatically says to avoid it except for debugging purposes because it breaks Perl’s string encapsulation.
Thus, in theory it really shouldn’t matter if a module upgrades or downgrades a passed-in scalar. That said, I did follow your precedent of avoiding changing the passed-in scalar in my PR, for what that’s worth. |
… prints:
The UTF8 option toggles between SvPV and SvPVutf8; it should actually toggle between SvPVbyte and SvPVutf8 to avoid a spurious dependency on Perl’s internal string storage logic.
The text was updated successfully, but these errors were encountered: