|
|
1 год назад | |
|---|---|---|
| src | 1 год назад | |
| static | 1 год назад | |
| templates | 1 год назад | |
| tests | 1 год назад | |
| .gitignore | 1 год назад | |
| CHANGELOG.md | 1 год назад | |
| ISSUES.md | 1 год назад | |
| README.md | 1 год назад | |
| config.py | 1 год назад | |
| install-git-hooks.sh | 1 год назад | |
| pyproject.toml | 1 год назад | |
| requirements-dev.lock | 1 год назад | |
| requirements-embeddings.lock | 1 год назад | |
| requirements.lock | 1 год назад | |
| settings.toml | 1 год назад |
A data / AI project to capture / analyse the evolution over time of the frontpages of main media sites.
A live version is available here : http://18.171.236.162:8000/
(Please forgive the ugly UI 🥹 ! You'd better use it on desktop, though there is an almost readable mobile version)
* Hosted on an AWS free-tier EC2 instance + "12 months free" RDS database, managed with Terraform
This project aims at observing what subjects news medias put forward on their websites.
The basic process consists of :
A basic web UI is available to display the results.
At the moment, 6 sites are supported (see them there) but the list will expand over time.
None of this would be possible without the incredible Wayback Machine and the volunteers that have helped setup the snapshotting of all those sites for decades.
First you need to setup a PostgreSQL server and create a database whose path / credentials will be stored in a file .secrets.toml with the key database_url.
database_url="postgresql://user:password@yourdomain.com:port/database_name
rye sync --no-lock --no-dev --all-featuresSetup your preferences by updating the configuration file
rye run snapshotsrye run embeddingsrye run similarity_indexrye run web_server