jonwiggins/xmloxide: A pure Rust reimplementation of libxml2

🔥 Explore this must-read post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

crates.io
docs.rs
License: MIT
MSRV

A pure Rust reimplementation of libxml2 — the de facto standard XML/HTML parsing library in the open-source world.

libxml2 became officially unmaintained in December 2025 with known security issues. xmloxide aims to be a memory-safe, high-performance replacement that passes the same conformance test suites.

Memory-safe — arena-based tree with zero unsafe in the public API
Conformant — 100% pass rate on the W3C XML Conformance Test Suite (1727/1727 applicable tests)
Error recovery — parse malformed XML and still produce a usable tree, just like libxml2
Multiple parsing APIs — DOM tree, SAX2 streaming, XmlReader pull, push/incremental
HTML parser — error-tolerant HTML 4.01 parsing with auto-closing and void elements
XPath 1.0 — full expression parser and evaluator with all core functions
Validation — DTD, RelaxNG, and XML Schema (XSD) validation
Canonical XML — C14N 1.0 and Exclusive C14N serialization
XInclude — document inclusion processing
XML Catalogs — OASIS XML Catalogs for URI resolution
xmllint CLI — command-line tool for parsing, validating, and querying XML
Zero-copy where possible — string interning for fast comparisons
No global state — each Document is self-contained and Send + Sync
C/C++ FFI — full C API with header file (include/xmloxide.h) for embedding in C/C++ projects
Minimal dependencies — only encoding_rs (library has zero other deps; clap is CLI-only)

use xmloxide::Document;

let doc = Document::parse_str("Hello").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("root"));
assert_eq!(doc.text_content(root), "Hello");

use xmloxide::Document;
use xmloxide::serial::serialize;

let doc = Document::parse_str("Hello").unwrap();
let xml = serialize(&doc);
assert_eq!(xml, "Hello");

use xmloxide::Document;
use xmloxide::xpath::💬;

let doc = Document::parse_str("Rust").unwrap();
let root = doc.root_element().unwrap();
let result = evaluate(&doc, root, "count(book)").unwrap();
assert_eq!(result.to_number(), 1.0);

use xmloxide::sax::{parse_sax, SaxHandler, DefaultHandler};
use xmloxide::parser::ParseOptions;

struct MyHandler;
impl SaxHandler for MyHandler {
    fn start_element(&mut self, name: &str, _: Option<&str>, _: Option<&str>,
                     _: &[(String, String, Option<String>, Option<String>)]) {
        println!("Element: {name}");
    }
}

parse_sax("", &ParseOptions::default(), &mut MyHandler).unwrap();

use xmloxide::html::parse_html;

let doc = parse_html("Hello 
World").unwrap();
let root = doc.root_element().unwrap();
assert_eq!(doc.node_name(root), Some("html"));

use xmloxide::parser::{parse_str_with_options, ParseOptions};

let opts = ParseOptions::default().recover(true);
let doc = parse_str_with_options("", &opts).unwrap();
for diag in &doc.diagnostics {
    eprintln!("{}", diag);
}

# Parse and pretty-print
xmllint --format document.xml

# Validate against a schema
xmllint --schema schema.xsd document.xml
xmllint --relaxng schema.rng document.xml
xmllint --dtdvalid schema.dtd document.xml

# XPath query
xmllint --xpath "//title" document.xml

# Canonical XML
xmllint --c14n document.xml

# Parse HTML
xmllint --html page.html

Module	Description
`tree`	Arena-based DOM tree (`Document`, `NodeId`, `NodeKind`)
`parser`	XML 1.0 recursive descent parser with error recovery
`parser::push`	Push/incremental parser for chunked input
`html`	Error-tolerant HTML 4.01 parser
`sax`	SAX2 streaming event-driven parser
`reader`	XmlReader pull-based parsing API
`serial`	XML serializer and Canonical XML (C14N)
`xpath`	XPath 1.0 expression parser and evaluator
`validation::dtd`	DTD parsing and validation
`validation::relaxng`	RelaxNG schema validation
`validation::xsd`	XML Schema (XSD) validation
`xinclude`	XInclude 1.0 document inclusion
`catalog`	OASIS XML Catalogs for URI resolution
`encoding`	Character encoding detection and transcoding
`ffi`	C/C++ FFI bindings (`include/xmloxide.h`)

Parsing throughput is competitive with libxml2 — within 3-4% on most documents, and 12% faster on SVG. Serialization is 1.5-2.4x faster thanks to the arena-based tree design. XPath is 1.1-2.7x faster across all benchmarks.

Parsing:

Document	Size	xmloxide	libxml2	Result
Atom feed	4.9 KB	26.7 µs (176 MiB/s)	25.5 µs (184 MiB/s)	~4% slower
SVG drawing	6.3 KB	58.5 µs (103 MiB/s)	65.6 µs (92 MiB/s)	12% faster
Maven POM	11.5 KB	76.9 µs (142 MiB/s)	74.2 µs (148 MiB/s)	~4% slower
XHTML page	10.2 KB	69.5 µs (139 MiB/s)	61.5 µs (157 MiB/s)	~13% slower
Large (374 KB)	374 KB	2.15 ms (169 MiB/s)	2.08 ms (175 MiB/s)	~3% slower

Serialization:

Document	Size	xmloxide	libxml2	Result
Atom feed	4.9 KB	11.3 µs	17.5 µs	1.5x faster
Maven POM	11.5 KB	20.1 µs	47.5 µs	2.4x faster
Large (374 KB)	374 KB	614 µs	1397 µs	2.3x faster

XPath:

Expression	xmloxide	libxml2	Result
Simple path (`//entry/title`)	1.51 µs	1.63 µs	8% faster
Attribute predicate (`//book[@id]`)	5.91 µs	15.99 µs	2.7x faster
`count()` function	1.09 µs	1.67 µs	1.5x faster
`string()` function	1.32 µs	1.77 µs	1.3x faster

Key optimizations: arena-based tree for fast serialization, byte-level pre-checks for character validation, bulk text scanning, ASCII fast paths for name parsing, zero-copy element name splitting, inline entity resolution, XPath // step fusion with fused axis expansion, inlined tree accessors, and name-test fast paths for child/descendant axes.

# Run benchmarks (requires libxml2 system library)
cargo bench --features bench-libxml2 --bench comparison_bench

785 unit tests across all modules
112 FFI tests covering the full C API surface (including SAX streaming)
libxml2 compatibility suite — 119/119 tests passing (100%) covering XML parsing, namespaces, error detection, and HTML parsing
W3C XML Conformance Test Suite — 1727/1727 applicable tests passing (100%)
Integration tests covering real-world XML documents, edge cases, and error recovery

cargo test --all-features

xmloxide provides a C-compatible API for embedding in C/C++ projects (like Chromium, game engines, or any codebase that currently uses libxml2).

# Build shared + static libraries (uses the included Makefile)
make

# Or build individually:
make shared   # .so / .dylib / .dll
make static   # .a / .lib

# Build and run the C example
make example

#include "xmloxide.h"

xmloxide_document *doc = xmloxide_parse_str("Hello");
uint32_t root = xmloxide_doc_root_element(doc);
char *name = xmloxide_node_name(doc, root);   // "root"
char *text = xmloxide_node_text_content(doc, root); // "Hello"

xmloxide_free_string(name);
xmloxide_free_string(text);
xmloxide_free_doc(doc);

The full API — including tree navigation and mutation, XPath evaluation, serialization (plain and pretty-printed), HTML parsing, DTD/RelaxNG/XSD validation, C14N, and XML Catalogs — is declared in include/xmloxide.h.

libxml2	xmloxide (Rust)	xmloxide (C FFI)
`xmlReadMemory`	`Document::parse_str`	`xmloxide_parse_str`
`xmlReadFile`	`Document::parse_file`	`xmloxide_parse_file`
`xmlParseDoc`	`Document::parse_bytes`	`xmloxide_parse_bytes`
`htmlReadMemory`	`html::parse_html`	`xmloxide_parse_html`
`xmlFreeDoc`	(drop `Document`)	`xmloxide_free_doc`
`xmlDocGetRootElement`	`doc.root_element()`	`xmloxide_doc_root_element`
`xmlNodeGetContent`	`doc.text_content(id)`	`xmloxide_node_text_content`
`xmlNodeSetContent`	`doc.set_text_content(id, s)`	`xmloxide_set_text_content`
`xmlGetProp`	`doc.attribute(id, name)`	`xmloxide_node_attribute`
`xmlSetProp`	`doc.set_attribute(...)`	`xmloxide_set_attribute`
`xmlNewNode`	`doc.create_node(...)`	`xmloxide_create_element`
`xmlNewText`	`doc.create_node(Text{..})`	`xmloxide_create_text`
`xmlAddChild`	`doc.append_child(p, c)`	`xmloxide_append_child`
`xmlAddPrevSibling`	`doc.insert_before(ref, c)`	`xmloxide_insert_before`
`xmlUnlinkNode`	`doc.remove_node(id)`	`xmloxide_remove_node`
`xmlCopyNode`	`doc.clone_node(id, deep)`	`xmloxide_clone_node`
`xmlGetID`	`doc.element_by_id(s)`	`xmloxide_element_by_id`
`xmlDocDumpMemory`	`serial::serialize(&doc)`	`xmloxide_serialize`
`xmlDocDumpFormatMemory`	`serial::serialize_with_options`	`xmloxide_serialize_pretty`
`htmlDocDumpMemory`	`serial::html::serialize_html`	`xmloxide_serialize_html`
`xmlC14NDocDumpMemory`	`serial::c14n::canonicalize`	`xmloxide_canonicalize`
`xmlXPathEvalExpression`	`xpath::evaluate`	`xmloxide_xpath_eval`
`xmlValidateDtd`	`validation::dtd::validate`	`xmloxide_validate_dtd`
`xmlRelaxNGValidateDoc`	`validation::relaxng::validate`	`xmloxide_validate_relaxng`
`xmlSchemaValidateDoc`	`validation::xsd::validate_xsd`	`xmloxide_validate_xsd`
`xmlXIncludeProcess`	`xinclude::process_xincludes`	`xmloxide_process_xincludes`
`xmlLoadCatalog`	`Catalog::parse`	`xmloxide_parse_catalog`
`xmlSAX2...` callbacks	`sax::SaxHandler` trait	`xmloxide_sax_parse`
`xmlTextReaderRead`	`reader::XmlReader`	`xmloxide_reader_read`
`xmlCreatePushParserCtxt`	`parser::PushParser`	`xmloxide_push_parser_new`
`xmlParseChunk`	`PushParser::push`	`xmloxide_push_parser_push`

Thread safety: Unlike libxml2, xmloxide has no global state. Each Document is self-contained and Send + Sync. The FFI layer uses thread-local storage for the last error message — each thread has its own error state. No initialization or cleanup functions are needed.

xmloxide includes fuzz targets for security testing:

# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz

# Run a fuzz target
cargo +nightly fuzz run fuzz_xml_parse
cargo +nightly fuzz run fuzz_html_parse
cargo +nightly fuzz run fuzz_xpath
cargo +nightly fuzz run fuzz_roundtrip

cargo build
cargo test
cargo clippy --all-targets --all-features -- -D warnings
cargo bench

Minimum supported Rust version: 1.81

No XML 1.1 — xmloxide implements XML 1.0 (Fifth Edition) only. XML 1.1 is rarely used and not planned.
No XSLT — XSLT is a separate specification (libxslt) and is out of scope.
No Schematron — Schematron validation is not implemented. DTD, RelaxNG, and XSD are supported.
HTML 4.01 only — the HTML parser targets HTML 4.01, not the HTML5 parsing algorithm.
Push parser buffers internally — the push/incremental parser API (PushParser) currently buffers all pushed data and performs the full parse on finish(), rather than truly streaming like libxml2’s xmlParseChunk. SAX streaming (parse_sax) is available as an alternative for memory-constrained large-document processing.
XPath namespace:: axis — the namespace:: axis returns the element node when in-scope namespaces match (rather than materializing separate namespace nodes), following the same pattern as the attribute axis.

See CONTRIBUTING.md for development setup and guidelines.

See CHANGELOG.md for version history.

MIT

{💬|⚡|🔥} **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#jonwigginsxmloxide #pure #Rust #reimplementation #libxml2**

🕒 **Posted on**: 1772341285

🌟 **Want more?** Click here for more info! 🌟

jonwiggins/xmloxide: A pure Rust reimplementation of libxml2

By

Leave a Reply Cancel reply