Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml5ever drops xml attributes with matching namespace and local names #538

Closed
noahbald opened this issue Jun 2, 2024 · 9 comments · Fixed by #539
Closed

xml5ever drops xml attributes with matching namespace and local names #538

noahbald opened this issue Jun 2, 2024 · 9 comments · Fixed by #539
Assignees

Comments

@noahbald
Copy link

noahbald commented Jun 2, 2024

Hey there, I've run into this issue recently where only one attribute of a given namespace and local name will remain in the document. Consider the following test

#[test]
fn test_weird() -> anyhow::Result<()> {
    use rcdom::SerializableHandle;
    use xml5ever::{
        driver::{parse_document, XmlParseOpts},
        serialize::{serialize, SerializeOpts},
        tendril::TendrilSink,
    };

    let src = r##"<svg xmlns="http://www.w3.org/2000/svg" xmlns:x="http://www.w3.org/1999/xlink">
    <defs>
        <g id="mid-line"/>
        <g id="line-plus">
            <use href="#mid-line"/>
            <use href="#plus"/>
        </g>
        <g id="plus"/>
        <g id="line-circle">
            <use x:href="#mid-line"/>
        </g>
    </defs>
    <path d="M0 0" id="a"/>
    <use x:href="#a" x="50" y="50"/>
    <use x:href="#line-plus"/>
    <use x:href="#line-circle"/>
</svg>"##;
    println!("src:\n{src}");

    let dom: rcdom::RcDom =
        parse_document(rcdom::RcDom::default(), XmlParseOpts::default()).one(src);
    let mut sink: std::io::BufWriter<_> = std::io::BufWriter::new(Vec::new());
    serialize(
        &mut sink,
        &std::convert::Into::<SerializableHandle>::into(dom.document),
        SerializeOpts::default(),
    )?;

    let sink: Vec<_> = sink.into_inner()?;
    println!("\noutput:\n{}", String::from_utf8_lossy(&sink));
    Ok(())
}

The output for me is as follows

<svg xmlns="http://www.w3.org/2000/svg" xmlns:x="http://www.w3.org/1999/xlink">
    <defs>
        <g id="mid-line"/>
        <g id="line-plus">
            <use x:href="#mid-line"/>
            <use x:href="#plus"/>
        </g>
        <g id="plus"/>
        <g id="line-circle">
            <use x:href="#mid-line"/>
        </g>
    </defs>
    <path d="M0 0" id="a"/>
    <use x:href="#a" x="50" y="50"/>
    <use x:href="#line-plus"/>
    <use x:href="#line-circle"/>
</svg>

output:
<svg xmlns="http://www.w3.org/2000/svg">
    <defs>
        <g id="mid-line"></g>
        <g id="line-plus">
            <use x:href="#mid-line"></use>
            <use></use>
        </g>
        <g id="plus"></g>
        <g id="line-circle">
            <use></use>
        </g>
    </defs>
    <path d="M0 0" id="a"></path>
    <use x="50" y="50"></use>
    <use></use>
    <use></use>
</svg>

As you can see, all the x:href attributes, except for the first, seem to be dropped completely.
This doesn't seem like expected behaviour unless I'm missing something.

If needed, this are the versions I'm using

rcdom = { package = "markup5ever_rcdom", version = "0.3" }
xml5ever = "0.18.0"
noahbald added a commit to noahbald/oxvg that referenced this issue Jun 3, 2024
Unfortunately, there seems to be a bug with xml5ever that drops namespaced attributes, I've raised a bug servo/html5ever#538
@demurgos
Copy link

demurgos commented Jun 4, 2024

I've just hit this exact issue when parsing this PECL API response:

<?xml version="1.0" encoding="UTF-8"?>
<r xmlns="http://pear.php.net/dtd/rest.release" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://pear.php.net/dtd/rest.release     http://pear.php.net/dtd/rest.release.xsd">
 <p xlink:href="/rest/p/protobuf">protobuf</p>
 <c>pecl.php.net</c>
 <v>4.27.0</v>
 <st>stable</st>
 <l>BSD-3-Clause</l>
 <m>protobufpackages</m>
 <s>Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.</s>
 <d>https://developers.google.com/protocol-buffers/</d>
 <da>2024-05-23 14:46:50</da>
 <n>* See github.com/protocolbuffers/protobuf/releases/tag/v27.0 for release notes.</n>
 <f>243961</f>
 <g>https://pecl.php.net/get/protobuf-4.27.0</g>
 <x xlink:href="package.4.27.0.xml"/>
</r>

The xlink:href appears in the tree for the <p> node at the start, but not in the <x> node at the end (empty attribute list).

@jdm

This comment was marked as outdated.

@jdm
Copy link
Member

jdm commented Jun 5, 2024

https://github.com/servo/html5ever/blob/main/xml5ever/src/tree_builder/mod.rs#L341-L349 is the check that filters out subsequent attributes. This happens because of this duplicate attribute check: https://github.com/servo/html5ever/blob/main/xml5ever/src/tree_builder/mod.rs#L316

@jdm
Copy link
Member

jdm commented Jun 5, 2024

This was introduced in baeacb4 but the original repo's related issue is no longer visible: https://github.com/Ygg01/xml5ever/issues/30

@demurgos
Copy link

demurgos commented Jun 5, 2024

Thank you @jdm I confirm that it worked. Patching my local dependencies to use the commit in #539, I can properly parse the response, including the last link:

Release {
    package: ReleasePackage {
        name: "protobuf",
        link: "/rest/p/protobuf",
    },
    channel: "pecl.php.net",
    version: "4.27.0",
    status: "stable",
    license: "BSD-3-Clause",
    maintainer: "protobufpackages",
    summary: "Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.",
    description: "https://developers.google.com/protocol-buffers/",
    time: "2024-05-23 14:46:50",
    release_notes: "* See github.com/protocolbuffers/protobuf/releases/tag/v27.0 for release notes.",
    archive: ReleaseArchive {
        size: 243961,
        link: "https://pecl.php.net/get/protobuf-4.27.0",
    },
    extracted_link: "package.4.27.0.xml",
}

@jdm jdm closed this as completed in #539 Jun 5, 2024
@noahbald
Copy link
Author

noahbald commented Nov 28, 2024

Hey @jdm are there any plans to update the dependencies of markup5ever_rcdom to include the fixes?


edit:

Just found there's a verion =0.5.0-unnoficial -- which works better but it still seems to be moving/removing the xmlns-prefixed attributes

<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <sodipodi:namedview>
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi moved to child -->
<svg xmlns="http://www.w3.org/2000/svg">
    <sodipodi:namedview xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
        ...
    </sodipodi:namedview>

    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>
<!-- before -->
<svg xmlns="http://www.w3.org/2000/svg" xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd">
    <path d="..." sodipodi:nodetypes="cccc"/>
</svg>

<!-- after: xmlns:sodipodi removed entirely -->
<svg xmlns="http://www.w3.org/2000/svg">
    <path d="..." sodipodi:nodetypes="cccc"></path>
</svg>

@jdm
Copy link
Member

jdm commented Nov 29, 2024

The published rcdom crates are not owned by the html5ever developers.

@noahbald
Copy link
Author

noahbald commented Nov 29, 2024

Oh that's good to know. I think this issue may be coming from the parsing step, which in the example I gave uses xml5ever 0.20.0 and markup5ever 0.14

the example is based similarly on the original comment (#538 (comment))

@noahbald
Copy link
Author

@jdm will this issue be re-opened this issue based on my last comment? I don't think the root xmlns'd attributes should be removed if the specified namespace is used as a prefix by descendants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants