Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update URLs and some formatting #529

Merged
merged 1 commit into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

html5ever is an HTML parser developed as part of the [Servo][] project.

It can parse and serialize HTML according to the [WHATWG](https://whatwg.org/) specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented [in the bug tracker][]. html5ever passes all tokenizer tests from [html5lib-tests][], with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. `document.write`.
It can parse and serialize HTML according to the [WHATWG](https://whatwg.org/) specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented [in the bug tracker][]. html5ever passes all tokenizer tests from [html5lib-tests][], with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. `document.write`.

Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (That said, many XHTML documents in the wild are serialized in an HTML-compatible form).
Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (that said, many XHTML documents in the wild are serialized in an HTML-compatible form).

html5ever is written in [Rust][], therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.
html5ever is written in [Rust][], therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.


## Getting started in Rust
Expand All @@ -25,6 +25,7 @@ html5ever = "0.27"

You should also take a look at [`examples/html2html.rs`], [`examples/print-rcdom.rs`], and the [API documentation][].


## Getting started in other languages

Bindings for Python and other languages are much desired.
Expand All @@ -45,7 +46,7 @@ Run `cargo doc` in the repository root to build local documentation under `targe

html5ever uses callbacks to manipulate the DOM, therefore it does not provide any DOM tree representation.

html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 `document.write`) by converting input.
html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 `document.write`) by converting input.

The code is cross-referenced with the WHATWG syntax spec, and eventually we will have a way to present code and spec side-by-side.

Expand All @@ -56,5 +57,5 @@ html5ever builds against the official stable releases of Rust, though some optim
[Rust]: https://www.rust-lang.org/
[in the bug tracker]: https://github.com/servo/html5ever/issues?q=is%3Aopen+is%3Aissue+label%3Aweb-compat
[html5lib-tests]: https://github.com/html5lib/html5lib-tests
[`examples/html2html.rs`]: https://github.com/servo/html5ever/blob/master/rcdom/examples/html2html.rs
[`examples/print-rcdom.rs`]: https://github.com/servo/html5ever/blob/master/rcdom/examples/print-rcdom.rs
[`examples/html2html.rs`]: https://github.com/servo/html5ever/blob/main/rcdom/examples/html2html.rs
[`examples/print-rcdom.rs`]: https://github.com/servo/html5ever/blob/main/rcdom/examples/print-rcdom.rs
12 changes: 5 additions & 7 deletions xml5ever/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,24 +1,22 @@
[package]

name = "xml5ever"
version = "0.18.0"
authors = ["The xml5ever project developers"]
license = "MIT OR Apache-2.0"
repository = "https://github.com/servo/html5ever"
description = "Push based streaming parser for xml"
documentation = "https://docs.rs/xml5ever/"

homepage = "https://github.com/servo/html5ever/blob/master/xml5ever/README.md"
description = "Push based streaming parser for XML."
documentation = "https://docs.rs/xml5ever"
homepage = "https://github.com/servo/html5ever/blob/main/xml5ever/README.md"
readme = "README.md"
keywords = ["xml", "xml5", "parser", "parsing"]
exclude = ["xml5lib-tests/*"]
categories = [ "parser-implementations", "web-programming" ]
categories = ["parser-implementations", "web-programming"]
edition = "2021"

[dependencies]
log = "0.4"
mac = "0.1"
markup5ever = {version = "0.12", path = "../markup5ever" }
markup5ever = { version = "0.12", path = "../markup5ever" }

[dev-dependencies]
criterion = "0.3"
Expand Down
16 changes: 8 additions & 8 deletions xml5ever/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ First let's define our dependencies:
```

With dependencies declared, we can now make a simple tokenizer sink. First step is to
define a [`TokenSink`](https://ygg01.github.io/docs/xml5ever/xml5ever/tokenizer/trait.TokenSink.html). [`TokenSink`](https://ygg01.github.io/docs/xml5ever/xml5ever/tokenizer/trait.TokenSink.html) are traits that received stream of [`Tokens`](https://ygg01.github.io/docs/xml5ever/xml5ever/tokenizer/enum.Token.html).
define a [`TokenSink`](https://docs.rs/xml5ever/latest/xml5ever/tokenizer/trait.TokenSink.html). [`TokenSink`](https://docs.rs/xml5ever/latest/xml5ever/tokenizer/trait.TokenSink.html) are traits that received stream of [`Tokens`](https://docs.rs/xml5ever/latest/xml5ever/tokenizer/enum.Token.html).

In our case we'll define a unit struct (i.e. a struct without any fields).

```rust
struct SimpleTokenPrinter;
```

To make `SimpleTokenPrinter` a [`TokenSink`](https://ygg01.github.io/docs/xml5ever/xml5ever/tokenizer/trait.TokenSink.html), we need to implement [process_token](https://ygg01.github.io/docs/xml5ever/xml5ever/tokenizer/trait.TokenSink.html#tymethod.process_token) method.
To make `SimpleTokenPrinter` a [`TokenSink`](https://docs.rs/xml5ever/latest/xml5ever/tokenizer/trait.TokenSink.html), we need to implement [process_token](https://docs.rs/xml5ever/latest/xml5ever/tokenizer/trait.TokenSink.html#tymethod.process_token) method.

```rust
impl TokenSink for SimpleTokenPrinter {
Expand Down Expand Up @@ -64,7 +64,7 @@ To make `SimpleTokenPrinter` a [`TokenSink`](https://ygg01.github.io/docs/xml5ev
```

Now, we need some input to process. For input we'll use `stdin`. However, xml5ever `tokenize_to` method only takes `StrTendril`. So we need to construct a
[`ByteTendril`](https://doc.servo.org/tendril/type.ByteTendril.html) using `ByteTendril::new()`, then read the `stdin` using [`read_to_tendril`](https://doc.servo.org/tendril/trait.ReadExt.html#tymethod.read_to_tendril) extension.
[`ByteTendril`](https://docs.rs/tendril/latest/tendril/type.ByteTendril.html) using `ByteTendril::new()`, then read the `stdin` using [`read_to_tendril`](https://docs.rs/tendril/latest/tendril/trait.ReadExt.html#tymethod.read_to_tendril) extension.

Once that is set, to make `SimpleTokenPrinter` parse the input, call,
`tokenize_to` with it as the first parameter, input wrapped in Option for second parameter and XmlToke.
Expand Down Expand Up @@ -96,7 +96,7 @@ Once that is set, to make `SimpleTokenPrinter` parse the input, call,

NOTE: `unwrap` causes panic, it's only OK to use in simple examples.

For full source code check out: [`examples/simple_xml_tokenizer.rs`](https://github.com/Ygg01/xml5ever/blob/master/examples/simple_xml_tokenizer.rs)
For full source code check out: [`examples/simple_xml_tokenizer.rs`](https://github.com/servo/html5ever/blob/main/xml5ever/examples/simple_xml_tokenizer.rs)

Once we have successfully compiled the example we run the example with inline
xml
Expand All @@ -105,7 +105,7 @@ xml
cargo script simple_xml_tokenizer.rs <<< "<xml>Text with <b>bold words</b>!</xml>"
```

or by sending an [`examples/example.xml`](https://github.com/Ygg01/xml5ever/blob/master/examples/simple_xml_tokenizer.rs) located in same folder as examples.
or by sending an [`examples/example.xml`](https://github.com/servo/html5ever/blob/main/xml5ever/examples/example.xml) located in same folder as examples.

```bash
cargo script simple_xml_tokenizer.rs < example.xml
Expand Down Expand Up @@ -153,8 +153,8 @@ First part is similar to making SimpleTokenPrinter:
let input = input.try_reinterpret().unwrap();
```

This time, we need an implementation of [`TreeSink`](https://ygg01.github.io/docs/xml5ever/xml5ever/tree_builder/interface/trait.TreeSink.html). xml5ever comes with a
built-in `TreeSink` implementation called [`RcDom`](https://ygg01.github.io/docs/xml5ever/xml5ever/rcdom/struct.RcDom.html). To process input into
This time, we need an implementation of [`TreeSink`](https://docs.rs/xml5ever/latest/xml5ever/tree_builder/trait.TreeSink.html). xml5ever comes with a
built-in `TreeSink` implementation called [`RcDom`](https://docs.rs/markup5ever_rcdom/latest/markup5ever_rcdom/struct.RcDom.html). To process input into
a `TreeSink` we use the following line:

```rust
Expand Down Expand Up @@ -220,4 +220,4 @@ kind of function that will help us traverse it. We shall call that function `wal
}
```

For full source code check out: [`examples/xml_tree_printer.rs`](https://github.com/Ygg01/xml5ever/blob/master/examples/xml_tree_printer.rs)
For full source code check out: [`examples/xml_tree_printer.rs`](https://github.com/servo/html5ever/blob/main/rcdom/examples/xml_tree_printer.rs)