Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlns-prefixed attribute on root element is lost #569

Open
jdm opened this issue Jan 12, 2025 · 9 comments
Open

xmlns-prefixed attribute on root element is lost #569

jdm opened this issue Jan 12, 2025 · 9 comments

Comments

@jdm
Copy link
Member

jdm commented Jan 12, 2025

<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi moved to child -->
<svg xmlns="http://www.w3.org/2000/svg">
    <sodipodi:namedview xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>
<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi removed entirely -->
<svg xmlns="http://www.w3.org/2000/svg">
    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>

Originally posted by @noahbald in #538 (comment)

@jdm jdm changed the title Root xmlns attribute is lost xmlns-prefixed attribute on root element is lost Jan 12, 2025
@jdm
Copy link
Member Author

jdm commented Jan 12, 2025

It looks like the parsing code stores the namespace data for the node in the treebuilder's namespace stack, but the data about which tag declares a namespace is thrown away. Instead we store the namespace uri for child tags; this is why the xmlns-prefixed attribute jumps to the first child that uses the namespace.

@jdm
Copy link
Member Author

jdm commented Jan 12, 2025

Parsing code that strips out the xmlns attribute:

self.declare_ns(attr);

Parsing code that stores the namespace URI on child tags:
name.ns = ns_uri;

Serialization code that uses the namespace URI:
self.find_or_insert_ns(name);

@jdm
Copy link
Member Author

jdm commented Jan 12, 2025

Testcase in rcdom/tests/xml-driver.rs:

#[test]
fn weird() {
    assert_serialization(
       r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##,
        driver::parse_document(RcDom::default(), Default::default())
            .from_utf8()
            .one(r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>"##.as_bytes()),
    );
}

@jdm
Copy link
Member Author

jdm commented Jan 12, 2025

Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and Ygg01/xml5lib-tests@3fb8d2b suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.

@jdm
Copy link
Member Author

jdm commented Jan 12, 2025

@Ygg01
Copy link
Contributor

Ygg01 commented Jan 12, 2025

Current status: I have changes that allow these testcases to pass. However, they cause a lot of xml5lib namespace tests to fail, and Ygg01/xml5lib-tests@3fb8d2b suggests that the current behaviour is intentional. I'm hoping to get in touch with @Ygg01 to better understand what the right behaviour here is.

Hi Josh. To be honest I don't quite remember the motivation. It's been like 8 years.

The important question is does it goes against XML namespace spec? If not, just change the tests. I'm open to PRs on xml5lib-tests

@Ygg01
Copy link
Contributor

Ygg01 commented Jan 22, 2025

@jdm what are the next steps to close this?

  • Do you want me to fix the xml5lib-tests?
  • Move xml5lib-tests to the Servo repository, so you can take control over it?
  • Make PRs to Ygg01/xm5lib-tests for me to review?

@noahbald
Copy link

noahbald commented Feb 15, 2025

Hey if it helps I've found a few documents which will become invalid/broken after parsing and serializing and then viewing in Firefox. Perhaps these can help?

Arch Linux Logo
Blobs
Isometric Madness
tldr-pages Banner
Wikipedia Logo

@Ygg01
Copy link
Contributor

Ygg01 commented Feb 16, 2025

@jdm I've updated the xml5lib tests. New commit. Might require submodule update.

Looking back I think I'm starting to remember why I did it like that. And why I was wrong.

  1. When converting XML ⇾ tree, the prefixes aren't kept. In fact, they are completely arbitrary. They are like temporary variables that are discarded after parsing, e.g. svg:width = "10" becomes {http://www.w3.org/2000/svg}width = "10" in tree.
  2. When converting namespaces, the namespace definitions could be used by simplistic libs, to match on prefix, by scanning for namespace definitions, which is something that should never be done. You search by namespaces, not prefix. I omitted them to prevent such use cases. Trying to be correct by design.
  3. I didn't consider someone would serialize and deserialize and expect similar/same results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants