Webarchive
Lightweight self-hosted _wayback machine_ that creates HTML and PDF files from your bookmarks.
Lightweight self-hosted _wayback machine_ that creates HTML and PDF files from your bookmarks.
# Own Webarchive
Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.
## Supported store formats
* **headers** — save all headers from response
* **pdf** — save page in pdf
* **single_file** — save html and all its resources (css,js,images) into one html file
## Requirements
* Golang 1.19 or higher
* wkhtmltopdf binary in $PATH (to save pages in pdf)
## Configuration
The service can be configured via environment variables. There is a list of available
variables:
* **DB**
* **DB_PATH** — path for the database files (default `./db`)
* **LOGGING**
* **LOGGING_DEBUG** — enable debug logs (default `false`)
* **API**
* **API_ADDRESS** — address the API server will listen (default `0.0.0.0:5001`)
* **UI**
* **UI_ENABLED** — Enable builtin web UI (default `true`)
* **UI_PREFIX** — Prefix for the web UI (default `/`)
* **UI_THEME** — UI theme name (default `basic`). No other values available yet
* **PDF**
* **PDF_LANDSCAPE** — use landscape page orientation instead of portrait (default `false`)
* **PDF_GRAYSCALE** — use grayscale filter for the output pdf (default `false`)
* **PDF_MEDIA_PRINT** — use media type `print` for the request (default `true`)
* **PDF_ZOOM** — zoom page (default `1.0` i.e. no actual zoom)
* **PDF_VIEWPORT** — use specified viewport value (default `1280x720`)
* **PDF_DPI** — use specified DPI value for the output pdf (default `150`)
* **PDF_FILENAME** — use specified name for output pdf file (default `page.pdf`)
*Note*: Prefix **WEBARCHIVE_** can be used with the environment variable names
in case of any conflicts.
## ⚡ One-Click Deploy
| Cloud Provider | Deploy Button |
|----------------|---------------|
| AWS | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=aws&language=cfn"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/aws.svg" height="38"></a> |
| DigitalOcean | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=do&language=dop"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/do.svg" height="38"></a> |
| Render | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=rnd&language=rnd"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/rnd.svg" height="38"></a> |
<sub>Generated by <a href="https://deploystack.io/c/derfenix-webarchive" target="_blank">DeployStack.io</a></sub>
## Usage
### 1. Start the server
#### Start without docker
```shell
go run ./cmd/server/main.go
```
#### Change API address
```shell
API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go
```
#### Start in docker
```shell
docker compose up -d webarchive
```
### 2. Add a page
```shell
curl -X POST --location "http://localhost:5001/api/v1/pages" \
-H "Content-Type: application/json" \
-d "{
\"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
\"formats\": [
\"pdf\",
\"headers\"
]
}" | jq .
```
or
```shell
curl -X POST --location \
"http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"
```
### 3. Get the page's info
```shell
curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .
```
where `$page_id` — value of the `id` field from previous command response.
If `status` field in response is `success` (or `with_errors`) - the `results` field
will contain all processed formats with ids of the stored files.
### 4. Open file in browser
```shell
xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"
```
Where `$page_id` — value of the `id` field from previous command response, and
`$file_id` — the id of interesting file.
### 5. List all stored pages
```shell
curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .
```
## Roadmap
- [x] Save page to pdf
- [x] Save URL headers
- [x] Save page to the single-page html
- [ ] Save page to html with separate resource files (?)
- [ ] Basic web UI
- [ ] Optional authentication
- [ ] Multi-user access
- [ ] Support SQL database with or without separate files storage
- [ ] Tags/Categories
- [ ] Save page to markdown
A self-hosted toolkit for archiving webpages to the Internet Archive, archive.today, IPFS, and local file systems.
Small self-contained pure-Go web server with Lua, Markdown, HTTP/2, QUIC, Redis and PostgreSQL support.
Create HTML & screenshot archives of sites from your bookmarks, browsing history, RSS feeds, or other sources (alternative to Wayback Machine).
Archives information management application for managing and providing Web access to archives, manuscripts and digital objects.
Make open data websites.