Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read UTF-16 files through Qt decoding to a temporary file #550

Merged
merged 5 commits into from
Jan 6, 2025

Conversation

ghutchis
Copy link
Member

Fixes OpenChemistry/avogadrolibs#1689

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Fixes OpenChemistry/avogadrolibs#1689

Signed-off-by: Geoff Hutchison <geoff.hutchison@gmail.com>
@ghutchis
Copy link
Member Author

@matterhorn103 - since we haven't yet fixed the Qt6 builds on avogadroapp, could you check that this builds for you on Linux? Seems okay on Qt5 / Qt6 on Mac, but I want to check before merging.

QTextStream in(&file);
QString text;
bool isUTF16 = false;
if (file.open(QIODevice::ReadOnly)) {

Check notice

Code scanning / Flawfinder (reported by Codacy)

Check when opening files - can an attacker redirect it (via symlinks), force the opening of special file type (e.g., device files), move things around to create a race condition, control its ancestors, or change its contents? (CWE-362). Note

Check when opening files - can an attacker redirect it (via symlinks), force the opening of special file type (e.g., device files), move things around to create a race condition, control its ancestors, or change its contents? (CWE-362).
// UTF-16, read the file and let QString handle decoding
isUTF16 = true;
file.close();
file.open(QIODevice::ReadOnly | QIODevice::Text);

Check notice

Code scanning / Flawfinder (reported by Codacy)

Check when opening files - can an attacker redirect it (via symlinks), force the opening of special file type (e.g., device files), move things around to create a race condition, control its ancestors, or change its contents? (CWE-362). Note

Check when opening files - can an attacker redirect it (via symlinks), force the opening of special file type (e.g., device files), move things around to create a race condition, control its ancestors, or change its contents? (CWE-362).
Signed-off-by: Matthew J. Milner <matterhorn103@proton.me>
@matterhorn103
Copy link
Contributor

The Flatpak build action is set up for avogadroapp though and that's against Qt6, so the success or failure of that will be a good indication.

I'm surprised that it works for you on your Mac because the Flatpak action failed with:

/run/build/avogadro2/avogadroapp/avogadro/backgroundfileformat.cpp:60:12: error: ‘class QTextStream’ has no member named ‘setCodec’
   60 |         in.setCodec("UTF-16");
      |            ^~~~~~~~
/run/build/avogadro2/avogadroapp/avogadro/backgroundfileformat.cpp:82:13: error: ‘class QTextStream’ has no member named ‘setCodec’
   82 |         out.setCodec("UTF-8");
      |             ^~~~~~~~

It also fails for me locally.

QTextStream::setCodec() was removed for Qt6. You need to use QTextStream::setEncoding() instead with the appropriate enum e.g. QStringConverter::Utf8.

QTextStream should be both locale aware and able to detect UTF-8/16/32 by default though, so it seems unnecessary to do it the way you are (which includes a check for the BOM, which Qt should also do for you)? I guess maybe for Qt5 that's the necessary way of doing things, I don't know. For Qt6 I implemented a simpler approach here.

    #if QT_VERSION >= 0x060000
      QFile file(m_fileName);
      if (file.open(QFile::ReadOnly | QIODevice::Text)) {
        QTextStream in(&file);
        QString text;
        text = in.readAll();
        file.close();
        m_success =
          m_format->readString(text.toLocal8Bit().data(), *m_molecule);
      }

That code compiles fine against Qt6 and works fine for XYZ and MOL formats (I don't have any ORCA files lying around on this machine to test).

@ghutchis
Copy link
Member Author

ghutchis commented Jan 3, 2025

I'll take a look on Monday - this might work with Qt5 on the example file from the bug report.

@ghutchis
Copy link
Member Author

ghutchis commented Jan 3, 2025

See OpenChemistry/avogadrolibs#1689 for an example ORCA file.

Adopt simpler approach to file reading on Qt6
@matterhorn103
Copy link
Contributor

First point to report is that your fix seems to work, I can open the UTF-16 file. :)

For reference, the same file converted to UTF-8 (it's easy to do with KDE's Kate):
acetone_opt_utf8.out.zip

Sadly, my code that you merged is less good and when I build against Qt6 I can't open ORCA files at all – neither the one included in the error report, nor the same one after conversion to UTF-8, nor a fairly normal one that opens fine. Not sure what the issue is, nothing informative gets reported.

@ghutchis
Copy link
Member Author

ghutchis commented Jan 5, 2025

Yes - I had a moment to look through the code last night. I think the main thing is to revert to my previous version, but have #if for Qt5 vs. Qt6 methods. For some reason, while Qt5.x has compatibility for almost everything, the setCodec vs. setEncoding is different and requires #if QT_VERSION bits.

@ghutchis
Copy link
Member Author

ghutchis commented Jan 5, 2025

The main issue with your version is that it means every file is read twice. My version (checking for the BOM) just checks those two bytes, then reads normal files once. Only the UTF-16 files (which shouldn't be very common) need to be handled separately.

And for various reasons, some formats need an actual file (e.g., ORCA).

@matterhorn103
Copy link
Contributor

Oh is that true? My intention was the exact opposite: to open and read all files only once, by using Qt's encoding-aware functionality to get a string and then always pass that string to Avogadro's parser.

This reverts commit 0df0fe8.

Signed-off-by: Geoff Hutchison <geoff.hutchison@gmail.com>
Signed-off-by: Geoff Hutchison <geoff.hutchison@gmail.com>
@ghutchis ghutchis merged commit 1e548d5 into OpenChemistry:master Jan 6, 2025
26 of 31 checks passed
@ghutchis ghutchis deleted the read-utf16 branch January 6, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avogadro does not open an .out file from ORCA (UTF-16 encoding)
2 participants