暫無描述

jherve d120dfd089 Add a script to install commit git hooks 1 年之前
src 33981d114d Remove useless "run" method from jobs 1 年之前
static 3501a02749 Add logo information and display it in the UI 1 年之前
templates 70d9705643 [fix] URLs now use the virtual_timestamp instead of an ID 1 年之前
tests 1c91c4bfe6 Initial commit 1 年之前
.gitignore d51ff03bb2 Switch to Annoy for vector search/indexing 1 年之前
README.md f1f96c934c Setup timezone in "web" part so that timestamps are properly displayed 1 年之前
config.py caa854cba8 Format some file 1 年之前
install-git-hooks.sh d120dfd089 Add a script to install commit git hooks 1 年之前
pyproject.toml cf34257719 Extend duration display to similar articles 1 年之前
requirements-dev.lock cf34257719 Extend duration display to similar articles 1 年之前
requirements-embeddings.lock 5d907e27a8 Add a requirements file for embeddings 1 年之前
requirements.lock cf34257719 Extend duration display to similar articles 1 年之前
settings.toml 207cc110db Move database URL to secrets 1 年之前

README.md

de_quoi_parle_le_monde

Bugs

Non-uniqueness of featured_article_snapshot_id in snapshot_apparitions view

In the featured_article_snapshot_id view, the field featured_article_snapshot_id is taken as if it was unique by row, but it is not.

This can be easily checked with this query :

SELECT * FROM (
    SELECT featured_article_snapshot_id, json_group_array(snapshot_id), COUNT(*) as count
    FROM snapshot_apparitions
    WHERE is_main -- Not required
    GROUP BY featured_article_snapshot_id
)
WHERE count > 1

Among other things it leads to "deadends" while browsing the UI, likely because the timestamp search and time diff relies on this false assumption.

2024-05-23 : This is likely not relevant anymore now that the URLs include the timestamp and not the snapshot_id.

Different virtual timestamp, same timestamp

The snapshot process ends up choosing the same snapshot for different virtual timestamps.

This can be checked with this query :

SELECT
    sv.id, sv.site_id, sv2.id, sv2.site_id, sv.timestamp_virtual, sv2.timestamp_virtual, sv2.timestamp
FROM snapshots_view sv
CROSS JOIN snapshots_view sv2
WHERE
    sv.id != sv2.id
    and sv.timestamp = sv2.timestamp

Weird choices of snapshot

Some snapshots are chosen even though they are up to 5/6 hours too early / too late.

SELECT timestamp-timestamp_virtual AS difference, * FROM snapshots_view
ORDER BY difference