Files
prawn/README.md
2025-07-03 20:39:41 +02:00

1.4 KiB

prawn

logo

prawn is an extremely fast Rust web scraper that downloads a webpage's HTML and all linked CSS and JS resources, saving them into a local folder for offline use.

Features

  • High-performance: Uses reqwest (with connection pooling), tokio, and rayon for parallelism.
  • CLI tool: Accepts a URL as an argument.
  • Downloads and parses HTML as fast as possible.
  • Extracts and concurrently downloads all <link rel="stylesheet"> and <script src="..."> resources.
  • Rewrites HTML to point to local files and saves it as saved_site/index.html.
  • All CSS and JS files are saved into saved_site/css/ and saved_site/js/ respectively.

Usage

cargo run -- https://example.com

This will download the HTML, CSS, and JS concurrently and save them to ./saved_site/ within seconds.

Constraints

  • Uses async Rust (tokio) for HTTP I/O and rayon or futures for concurrent downloads.
  • Uses scraper for fast DOM-like parsing.
  • No GUI dependencies or headless browsers (pure HTTP and HTML/CSS/JS).
  • Avoids unsafe code unless absolutely justified and documented.
  • Minimizes unnecessary allocations or cloning.

License

MIT