|
|
1 ano atrás | |
|---|---|---|
| src | 1 ano atrás | |
| static | 1 ano atrás | |
| templates | 1 ano atrás | |
| tests | 1 ano atrás | |
| .gitignore | 1 ano atrás | |
| CHANGELOG.md | 1 ano atrás | |
| ISSUES.md | 1 ano atrás | |
| README.md | 1 ano atrás | |
| config.py | 1 ano atrás | |
| install-git-hooks.sh | 1 ano atrás | |
| pyproject.toml | 1 ano atrás | |
| requirements-dev.lock | 1 ano atrás | |
| requirements-embeddings.lock | 1 ano atrás | |
| requirements.lock | 1 ano atrás | |
| settings.toml | 1 ano atrás |
A data / AI project to capture / analyse the evolution over time of the frontpages of main media sites.
The basic process consists of :
A basic web UI is available to display the results.
At the moment, 6 sites are supported (see them there) but the list will expand over time.
None of this would be possible without the incredible Wayback Machine and the volunteers that have helped setup the snapshotting of all those sites for decades.
First you need to setup a PostgreSQL server and create a database whose path / credentials will be stored in a file .secrets.toml with the key database_url.
database_url="postgresql://user:password@yourdomain.com:port/database_name
rye sync --no-lock --no-dev --all-featuresSetup your preferences by updating the configuration file
rye run snapshotsrye run embeddingsrye run similarity_indexrye run web_server