Sin descripción

jherve 91fa1e33ef Bump to version 0.2.0 hace 1 año
src 52b7d4b4de Change the project name hace 1 año
static 3501a02749 Add logo information and display it in the UI hace 1 año
templates 70d9705643 [fix] URLs now use the virtual_timestamp instead of an ID hace 1 año
tests 1c91c4bfe6 Initial commit hace 1 año
.gitignore d51ff03bb2 Switch to Annoy for vector search/indexing hace 1 año
CHANGELOG.md 91fa1e33ef Bump to version 0.2.0 hace 1 año
README.md 52b7d4b4de Change the project name hace 1 año
config.py caa854cba8 Format some file hace 1 año
install-git-hooks.sh d120dfd089 Add a script to install commit git hooks hace 1 año
pyproject.toml 91fa1e33ef Bump to version 0.2.0 hace 1 año
requirements-dev.lock cf34257719 Extend duration display to similar articles hace 1 año
requirements-embeddings.lock 5d907e27a8 Add a requirements file for embeddings hace 1 año
requirements.lock cf34257719 Extend duration display to similar articles hace 1 año
settings.toml 207cc110db Move database URL to secrets hace 1 año

README.md

media_observer

Bugs

Non-uniqueness of featured_article_snapshot_id in snapshot_apparitions view

In the featured_article_snapshot_id view, the field featured_article_snapshot_id is taken as if it was unique by row, but it is not.

This can be easily checked with this query :

SELECT * FROM (
    SELECT featured_article_snapshot_id, json_group_array(snapshot_id), COUNT(*) as count
    FROM snapshot_apparitions
    WHERE is_main -- Not required
    GROUP BY featured_article_snapshot_id
)
WHERE count > 1

Among other things it leads to "deadends" while browsing the UI, likely because the timestamp search and time diff relies on this false assumption.

2024-05-23 : This is likely not relevant anymore now that the URLs include the timestamp and not the snapshot_id.

Different virtual timestamp, same timestamp

The snapshot process ends up choosing the same snapshot for different virtual timestamps.

This can be checked with this query :

SELECT
    sv.id, sv.site_id, sv2.id, sv2.site_id, sv.timestamp_virtual, sv2.timestamp_virtual, sv2.timestamp
FROM snapshots_view sv
CROSS JOIN snapshots_view sv2
WHERE
    sv.id != sv2.id
    and sv.timestamp = sv2.timestamp

Weird choices of snapshot

Some snapshots are chosen even though they are up to 5/6 hours too early / too late.

SELECT timestamp-timestamp_virtual AS difference, * FROM snapshots_view
ORDER BY difference