🔥 Explore this must-read post from Hacker News 📖
📂 **Category**:
💡 **What You’ll Learn**:
A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.
libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.
- Memory-safe — arena-based tree with zero
unsafein the public API - Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
- Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
- Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
- HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
- XPath 1.0 — full expression parser and evaluator with all core functions
- Validation — DTD, RelaxNG, and XML Schema (XSD) validation
- Canonical XML — C14N 1.0 and Exclusive C14N serialization
- XInclude — document inclusion processing
- XML Catalogs — OASIS XML Catalogs for URI resolution
xmllintCLI — command-line tool for parsing, validating, and querying XML- Zero-copy where possible — string interning for fast comparisons
- No global state — each
Documentis self-contained andSend + Sync - C/C++ FFI — full C API with header file (
include/xmloxide.h) for embedding in C/C++ projects - Minimal dependencies — only
encoding_rs(library has zero other deps;clapis CLI-only)
use xmloxide::Document;
let doc = Document::parse_str("Hello ").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("root"));
assert_eq!(doc.text_content(root), "Hello");
use xmloxide::Document;
use xmloxide::serial::serialize;
let doc = Document::parse_str("Hello ").unwrap();
let xml = serialize(&doc);
assert_eq!(xml, "Hello ");
use xmloxide::Document;
use xmloxide::xpath::💬;
let doc = Document::parse_str("Rust ").unwrap();
let root = doc.root_element().unwrap();
let result = evaluate(&doc, root, "count(book)").unwrap();
assert_eq!(result.to_number(), 1.0);
use xmloxide::sax::{parse_sax, SaxHandler, DefaultHandler};
use xmloxide::parser::ParseOptions;
struct MyHandler;
impl SaxHandler for MyHandler {
fn start_element(&mut self, name: &str, _: Option<&str>, _: Option<&str>,
_: &[(String, String, Option<String>, Option<String>)]) {
println!("Element: {name}");
}
}
parse_sax(" ", &ParseOptions::default(), &mut MyHandler).unwrap();
use xmloxide::html::parse_html;
let doc = parse_html("Hello
World"
).unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));
use xmloxide::parser::{parse_str_with_options, ParseOptions};
let opts = ParseOptions::default().recover(true);
let doc = parse_str_with_options("" , &opts).unwrap();
for diag in &doc.diagnostics {
eprintln!("{}", diag);
}
# Parse and pretty-print
xmllint --format document.xml
# Validate against a schema
xmllint --schema schema.xsd document.xml
xmllint --relaxng schema.rng document.xml
xmllint --dtdvalid schema.dtd document.xml
# XPath query
xmllint --xpath "//title" document.xml
# Canonical XML
xmllint --c14n document.xml
# Parse HTML
xmllint --html page.html
| Module | Description |
|---|---|
tree |
Arena-based DOM tree (Document, NodeId, NodeKind) |
parser |
XML 1.0 recursive descent parser with error recovery |
parser::push |
Push/incremental parser for chunked input |
html |
Error-tolerant HTML 4.01 parser |
sax |
SAX2 streaming event-driven parser |
reader |
XmlReader pull-based parsing API |
serial |
XML serializer and Canonical XML (C14N) |
xpath |
XPath 1.0 expression parser and evaluator |
validation::dtd |
DTD parsing and validation |
validation::relaxng |
RelaxNG schema validation |
validation::xsd |
XML Schema (XSD) validation |
xinclude |
XInclude 1.0 document inclusion |
catalog |
OASIS XML Catalogs for URI resolution |
encoding |
Character encoding detection and transcoding |
ffi |
C/C++ FFI bindings (include/xmloxide.h) |
Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and 12% faster on SVG. Serialization is 1.5-2.4x faster thanks to the arena-based tree design. XPath is 1.1-2.7x faster across all benchmarks.
Parsing:
| Document | Size | xmloxide | libxml2 | Result |
|---|---|---|---|---|
| Atom feed | 4.9 KB | 26.7 µs (176 MiB/s) | 25.5 µs (184 MiB/s) | ~4% slower |
| SVG drawing | 6.3 KB | 58.5 µs (103 MiB/s) | 65.6 µs (92 MiB/s) | 12% faster |
| Maven POM | 11.5 KB | 76.9 µs (142 MiB/s) | 74.2 µs (148 MiB/s) | ~4% slower |
| XHTML page | 10.2 KB | 69.5 µs (139 MiB/s) | 61.5 µs (157 MiB/s) | ~13% slower |
| Large (374 KB) | 374 KB | 2.15 ms (169 MiB/s) | 2.08 ms (175 MiB/s) | ~3% slower |
Serialization:
| Document | Size | xmloxide | libxml2 | Result |
|---|---|---|---|---|
| Atom feed | 4.9 KB | 11.3 µs | 17.5 µs | 1.5x faster |
| Maven POM | 11.5 KB | 20.1 µs | 47.5 µs | 2.4x faster |
| Large (374 KB) | 374 KB | 614 µs | 1397 µs | 2.3x faster |
XPath:
| Expression | xmloxide | libxml2 | Result |
|---|---|---|---|
Simple path (//entry/title) |
1.51 µs | 1.63 µs | 8% faster |
Attribute predicate (//book[@id]) |
5.91 µs | 15.99 µs | 2.7x faster |
count() function |
1.09 µs | 1.67 µs | 1.5x faster |
string() function |
1.32 µs | 1.77 µs | 1.3x faster |
Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath // step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.
# Run benchmarks (requires libxml2 system library)
cargo bench --features bench-libxml2 --bench comparison_bench
- 785 unit tests across all modules
- 112 FFI tests covering the full C API surface (including SAX streaming)
- libxml2 compatibility suite — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing
- W3C XML Conformance Test Suite — 1727/1727 applicable tests passing (100%)
- Integration tests covering real-world XML documents, edge cases, and error recovery
cargo test --all-features
xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).
# Build shared + static libraries (uses the included Makefile)
make
# Or build individually:
make shared # .so / .dylib / .dll
make static # .a / .lib
# Build and run the C example
make example
#include "xmloxide.h"
xmloxide_document *doc = xmloxide_parse_str("Hello ");
uint32_t root = xmloxide_doc_root_element(doc);
char *name = xmloxide_node_name(doc, root); // "root"
char *text = xmloxide_node_text_content(doc, root); // "Hello"
xmloxide_free_string(name);
xmloxide_free_string(text);
xmloxide_free_doc(doc);
The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML parsing, DTD/RelaxNG/XSD validation, C14N, and XML Catalogs — is declared in include/xmloxide.h.
| libxml2 | xmloxide (Rust) | xmloxide (C FFI) |
|---|---|---|
xmlReadMemory |
Document::parse_str |
xmloxide_parse_str |
xmlReadFile |
Document::parse_file |
xmloxide_parse_file |
xmlParseDoc |
Document::parse_bytes |
xmloxide_parse_bytes |
htmlReadMemory |
html::parse_html |
xmloxide_parse_html |
xmlFreeDoc |
(drop Document) |
xmloxide_free_doc |
xmlDocGetRootElement |
doc.root_element() |
xmloxide_doc_root_element |
xmlNodeGetContent |
doc.text_content(id) |
xmloxide_node_text_content |
xmlNodeSetContent |
doc.set_text_content(id, s) |
xmloxide_set_text_content |
xmlGetProp |
doc.attribute(id, name) |
xmloxide_node_attribute |
xmlSetProp |
doc.set_attribute(...) |
xmloxide_set_attribute |
xmlNewNode |
doc.create_node(...) |
xmloxide_create_element |
xmlNewText |
doc.create_node(Text{..}) |
xmloxide_create_text |
xmlAddChild |
doc.append_child(p, c) |
xmloxide_append_child |
xmlAddPrevSibling |
doc.insert_before(ref, c) |
xmloxide_insert_before |
xmlUnlinkNode |
doc.remove_node(id) |
xmloxide_remove_node |
xmlCopyNode |
doc.clone_node(id, deep) |
xmloxide_clone_node |
xmlGetID |
doc.element_by_id(s) |
xmloxide_element_by_id |
xmlDocDumpMemory |
serial::serialize(&doc) |
xmloxide_serialize |
xmlDocDumpFormatMemory |
serial::serialize_with_options |
xmloxide_serialize_pretty |
htmlDocDumpMemory |
serial::html::serialize_html |
xmloxide_serialize_html |
xmlC14NDocDumpMemory |
serial::c14n::canonicalize |
xmloxide_canonicalize |
xmlXPathEvalExpression |
xpath::evaluate |
xmloxide_xpath_eval |
xmlValidateDtd |
validation::dtd::validate |
xmloxide_validate_dtd |
xmlRelaxNGValidateDoc |
validation::relaxng::validate |
xmloxide_validate_relaxng |
xmlSchemaValidateDoc |
validation::xsd::validate_xsd |
xmloxide_validate_xsd |
xmlXIncludeProcess |
xinclude::process_xincludes |
xmloxide_process_xincludes |
xmlLoadCatalog |
Catalog::parse |
xmloxide_parse_catalog |
xmlSAX2... callbacks |
sax::SaxHandler trait |
xmloxide_sax_parse |
xmlTextReaderRead |
reader::XmlReader |
xmloxide_reader_read |
xmlCreatePushParserCtxt |
parser::PushParser |
xmloxide_push_parser_new |
xmlParseChunk |
PushParser::push |
xmloxide_push_parser_push |
Thread safety: Unlike libxml2, xmloxide has no global state. Each Document is self-contained and Send + Sync. The FFI layer uses thread-local storage for the last error message — each thread has its own error state. No initialization or cleanup functions are needed.
xmloxide includes fuzz targets for security testing:
# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz
# Run a fuzz target
cargo +nightly fuzz run fuzz_xml_parse
cargo +nightly fuzz run fuzz_html_parse
cargo +nightly fuzz run fuzz_xpath
cargo +nightly fuzz run fuzz_roundtrip
cargo build
cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo bench
Minimum supported Rust version: 1.81
- No XML 1.1 — xmloxide implements XML 1.0 (Fifth Edition) only. XML 1.1 is rarely used and not planned.
- No XSLT — XSLT is a separate specification (libxslt) and is out of scope.
- No Schematron — Schematron validation is not implemented. DTD, RelaxNG, and XSD are supported.
- HTML 4.01 only — the HTML parser targets HTML 4.01, not the HTML5 parsing algorithm.
- Push parser buffers internally — the push/incremental parser API (
PushParser) currently buffers all pushed data and performs the full parse onfinish(), rather than truly streaming like libxml2’sxmlParseChunk. SAX streaming (parse_sax) is available as an alternative for memory-constrained large-document processing. - XPath
namespace::axis — thenamespace::axis returns the element node when in-scope namespaces match (rather than materializing separate namespace nodes), following the same pattern as the attribute axis.
See CONTRIBUTING.md for development setup and guidelines.
See CHANGELOG.md for version history.
MIT
{💬|⚡|🔥} **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#jonwigginsxmloxide #pure #Rust #reimplementation #libxml2**
🕒 **Posted on**: 1772341285
🌟 **Want more?** Click here for more info! 🌟
