Keine Beschreibung

jherve fe1f1a421e Ensure CdxRequest is created with tz-aware datetimes vor 1 Jahr
src fe1f1a421e Ensure CdxRequest is created with tz-aware datetimes vor 1 Jahr
static 3501a02749 Add logo information and display it in the UI vor 1 Jahr
templates 70d9705643 [fix] URLs now use the virtual_timestamp instead of an ID vor 1 Jahr
tests 1c91c4bfe6 Initial commit vor 1 Jahr
.gitignore d51ff03bb2 Switch to Annoy for vector search/indexing vor 1 Jahr
README.md 86dd4147a5 Add some new bugs vor 1 Jahr
config.py caa854cba8 Format some file vor 1 Jahr
pyproject.toml cf34257719 Extend duration display to similar articles vor 1 Jahr
requirements-dev.lock cf34257719 Extend duration display to similar articles vor 1 Jahr
requirements-embeddings.lock 5d907e27a8 Add a requirements file for embeddings vor 1 Jahr
requirements.lock cf34257719 Extend duration display to similar articles vor 1 Jahr
settings.toml 207cc110db Move database URL to secrets vor 1 Jahr

README.md

de_quoi_parle_le_monde

Bugs

Non-uniqueness of featured_article_snapshot_id in snapshot_apparitions view

In the featured_article_snapshot_id view, the field featured_article_snapshot_id is taken as if it was unique by row, but it is not.

This can be easily checked with this query :

SELECT * FROM (
    SELECT featured_article_snapshot_id, json_group_array(snapshot_id), COUNT(*) as count
    FROM snapshot_apparitions
    WHERE is_main -- Not required
    GROUP BY featured_article_snapshot_id
)
WHERE count > 1

Among other things it leads to "deadends" while browsing the UI, likely because the timestamp search and time diff relies on this false assumption.

2024-05-23 : This is likely not relevant anymore now that the URLs include the timestamp and not the snapshot_id.

Different virtual timestamp, same timestamp

The snapshot process ends up choosing the same snapshot for different virtual timestamps.

This can be checked with this query :

SELECT
    sv.id, sv.site_id, sv2.id, sv2.site_id, sv.timestamp_virtual, sv2.timestamp_virtual, sv2.timestamp
FROM snapshots_view sv
CROSS JOIN snapshots_view sv2
WHERE
    sv.id != sv2.id
    and sv.timestamp = sv2.timestamp

Weird choices of snapshot

Some snapshots are chosen even though they are up to 5/6 hours too early / too late.

SELECT timestamp-timestamp_virtual AS difference, * FROM snapshots_view
ORDER BY difference

Time zones

We want snapshots according to a given localtime, but at the moment the time is taken as UTC.