Нет описания

jherve 91fa1e33ef Bump to version 0.2.0 1 год назад
src 52b7d4b4de Change the project name 1 год назад
static 3501a02749 Add logo information and display it in the UI 1 год назад
templates 70d9705643 [fix] URLs now use the virtual_timestamp instead of an ID 1 год назад
tests 1c91c4bfe6 Initial commit 1 год назад
.gitignore d51ff03bb2 Switch to Annoy for vector search/indexing 1 год назад
CHANGELOG.md 91fa1e33ef Bump to version 0.2.0 1 год назад
README.md 52b7d4b4de Change the project name 1 год назад
config.py caa854cba8 Format some file 1 год назад
install-git-hooks.sh d120dfd089 Add a script to install commit git hooks 1 год назад
pyproject.toml 91fa1e33ef Bump to version 0.2.0 1 год назад
requirements-dev.lock cf34257719 Extend duration display to similar articles 1 год назад
requirements-embeddings.lock 5d907e27a8 Add a requirements file for embeddings 1 год назад
requirements.lock cf34257719 Extend duration display to similar articles 1 год назад
settings.toml 207cc110db Move database URL to secrets 1 год назад

README.md

media_observer

Bugs

Non-uniqueness of featured_article_snapshot_id in snapshot_apparitions view

In the featured_article_snapshot_id view, the field featured_article_snapshot_id is taken as if it was unique by row, but it is not.

This can be easily checked with this query :

SELECT * FROM (
    SELECT featured_article_snapshot_id, json_group_array(snapshot_id), COUNT(*) as count
    FROM snapshot_apparitions
    WHERE is_main -- Not required
    GROUP BY featured_article_snapshot_id
)
WHERE count > 1

Among other things it leads to "deadends" while browsing the UI, likely because the timestamp search and time diff relies on this false assumption.

2024-05-23 : This is likely not relevant anymore now that the URLs include the timestamp and not the snapshot_id.

Different virtual timestamp, same timestamp

The snapshot process ends up choosing the same snapshot for different virtual timestamps.

This can be checked with this query :

SELECT
    sv.id, sv.site_id, sv2.id, sv2.site_id, sv.timestamp_virtual, sv2.timestamp_virtual, sv2.timestamp
FROM snapshots_view sv
CROSS JOIN snapshots_view sv2
WHERE
    sv.id != sv2.id
    and sv.timestamp = sv2.timestamp

Weird choices of snapshot

Some snapshots are chosen even though they are up to 5/6 hours too early / too late.

SELECT timestamp-timestamp_virtual AS difference, * FROM snapshots_view
ORDER BY difference