this works fine for now
This commit is contained in:
46
README.md
46
README.md
@@ -1,3 +1,47 @@
|
||||
# prawn
|
||||
|
||||
prawncrawler
|
||||
<!--
|
||||
|
||||
Logo Image
|
||||
Sadly I cant do my cool Styling for the Div :C
|
||||
-->
|
||||
<div
|
||||
style = "
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
">
|
||||
<img
|
||||
src = "assets/logo.png"
|
||||
alt = "logo"
|
||||
style = "width:50%"
|
||||
/>
|
||||
</div>
|
||||
|
||||
|
||||
prawn is an extremely fast Rust web scraper that downloads a webpage's HTML and all linked CSS and JS resources, saving them into a local folder for offline use.
|
||||
|
||||
## Features
|
||||
- High-performance: Uses `reqwest` (with connection pooling), `tokio`, and `rayon` for parallelism.
|
||||
- CLI tool: Accepts a URL as an argument.
|
||||
- Downloads and parses HTML as fast as possible.
|
||||
- Extracts and concurrently downloads all `<link rel="stylesheet">` and `<script src="...">` resources.
|
||||
- Rewrites HTML to point to local files and saves it as `saved_site/index.html`.
|
||||
- All CSS and JS files are saved into `saved_site/css/` and `saved_site/js/` respectively.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
cargo run -- https://example.com
|
||||
```
|
||||
|
||||
This will download the HTML, CSS, and JS concurrently and save them to `./saved_site/` within seconds.
|
||||
|
||||
## Constraints
|
||||
- Uses async Rust (`tokio`) for HTTP I/O and `rayon` or `futures` for concurrent downloads.
|
||||
- Uses `scraper` for fast DOM-like parsing.
|
||||
- No GUI dependencies or headless browsers (pure HTTP and HTML/CSS/JS).
|
||||
- Avoids unsafe code unless absolutely justified and documented.
|
||||
- Minimizes unnecessary allocations or cloning.
|
||||
|
||||
## License
|
||||
MIT
|
||||
Reference in New Issue
Block a user