Skip to content

Remove DISTINCT from game queries, refix #901#1314

Merged
dfabulich merged 1 commit intomainfrom
search-new-distinct-performance
Aug 16, 2025
Merged

Remove DISTINCT from game queries, refix #901#1314
dfabulich merged 1 commit intomainfrom
search-new-distinct-performance

Conversation

@dfabulich
Copy link
Copy Markdown
Collaborator

@dfabulich dfabulich commented Aug 16, 2025

In #902 I added DISTINCT to the game search query, to fix a bug where searches by competition could show duplicates, if the same game appeared in multiple divisions.

It turns out that adding DISTINCT prevents the query planner from doing fast index scans, e.g. when simply querying for the newest games, it would do a full scan of all rows in the index, rather than just the N newest rows.

So I've removed DISTINCT in this commit, and re-fixed #901 by querying compgames with a subquery with a distinct clause.

Fixes #1287

Before

Simple query for new games, full scan

http://localhost:8080/search?browse&sortby=lnew

MariaDB [ifdb]> analyze select
    ->     distinct games.id as id,
    ->             games.title as title,
    ->             games.author as author,
    ->             games.desc as description,
    ->             games.tags as tags,
    ->             games.created as createdate,
    ->             games.moddate as moddate,
    ->             games.system as devsys,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%M %e, %Y'))
    ->                 as pubfmt,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%Y-%m-%d'))
    ->                 as published,
    ->             date_format(games.published, '%Y') as pubyear,
    ->             (games.coverart is not null) as hasart,
    ->             avgRating as avgrating,
    ->             numRatingsInAvg as ratingcnt,
    ->             stdDevRating as ratingdev,
    ->             numRatingsTotal,
    ->             numMemberReviews,
    ->             starsort,
    ->             games.sort_title as sort_title,
    ->             games.sort_author as sort_author,
    ->             ifnull(games.published, '9999-12-31') as sort_pub,
    ->             games.pagevsn,
    ->             games.flags
    ->
    -> from
    ->     games
    ->             join gameRatingsSandbox0_mv on games.id = gameid
    ->
    -> where
    ->     1
    ->
    ->
    ->
    -> order by
    ->     games.created desc
    ->
    -> limit 0, 100;
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows  | r_rows   | filtered | r_filtered | Extra           |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+
|    1 | SIMPLE      | games                  | index  | PRIMARY       | created | 5       | NULL          | 14371 | 14371.00 |   100.00 |     100.00 | Using temporary |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.games.id | 1     | 1.00     |   100.00 |     100.00 |                 |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+-----------------+

Query for IFComp 2024 games

http://localhost:8080/search?searchbar=competitionid:4lxgmwam7owmbb9h

MariaDB [ifdb]> analyze select sql_calc_found_rows
    ->     distinct games.id as id,
    ->             games.title as title,
    ->             games.author as author,
    ->             games.desc as description,
    ->             games.tags as tags,
    ->             games.created as createdate,
    ->             games.moddate as moddate,
    ->             games.system as devsys,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%M %e, %Y'))
    ->                 as pubfmt,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%Y-%m-%d'))
    ->                 as published,
    ->             date_format(games.published, '%Y') as pubyear,
    ->             (games.coverart is not null) as hasart,
    ->             avgRating as avgrating,
    ->             numRatingsInAvg as ratingcnt,
    ->             stdDevRating as ratingdev,
    ->             numRatingsTotal,
    ->             numMemberReviews,
    ->             starsort,
    ->             games.sort_title as sort_title,
    ->             games.sort_author as sort_author,
    ->             ifnull(games.published, '9999-12-31') as sort_pub,
    ->             games.pagevsn,
    ->             games.flags
    ->
    -> from
    ->     games
    ->             join gameRatingsSandbox0_mv on games.id = gameid inner join compgames on compgames.gameid = games.id and compgames.compid = '4lxgmwam7owmbb9h'
    ->
    -> where
    ->     1
    ->
    ->
    ->
    -> order by
    ->     starsort desc
    ->
    -> limit 0, 100;
+------+-------------+------------------------+--------+---------------+---------+---------+-----------------------+------+--------+----------+------------+--------------------------------------------------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref                   | rows | r_rows | filtered | r_filtered | Extra                                                  |
+------+-------------+------------------------+--------+---------------+---------+---------+-----------------------+------+--------+----------+------------+--------------------------------------------------------+
|    1 | SIMPLE      | compgames              | ref    | compid,gameid | compid  | 130     | const                 | 82   | 71.00  |   100.00 |     100.00 | Using index condition; Using temporary; Using filesort |
|    1 | SIMPLE      | games                  | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.compgames.gameid | 1    | 1.00   |   100.00 |     100.00 |                                                        |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.compgames.gameid | 1    | 1.00   |   100.00 |     100.00 |                                                        |
+------+-------------+------------------------+--------+---------------+---------+---------+-----------------------+------+--------+----------+------------+--------------------------------------------------------+
3 rows in set (0.007 sec)

After

Simple query for new games, fast index scan

MariaDB [ifdb]> analyze select
    ->     games.id as id,
    ->             games.title as title,
    ->             games.author as author,
    ->             games.desc as description,
    ->             games.tags as tags,
    ->             games.created as createdate,
    ->             games.moddate as moddate,
    ->             games.system as devsys,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%M %e, %Y'))
    ->                 as pubfmt,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%Y-%m-%d'))
    ->                 as published,
    ->             date_format(games.published, '%Y') as pubyear,
    ->             (games.coverart is not null) as hasart,
    ->             avgRating as avgrating,
    ->             numRatingsInAvg as ratingcnt,
    ->             stdDevRating as ratingdev,
    ->             numRatingsTotal,
    ->             numMemberReviews,
    ->             starsort,
    ->             games.sort_title as sort_title,
    ->             games.sort_author as sort_author,
    ->             ifnull(games.published, '9999-12-31') as sort_pub,
    ->             games.pagevsn,
    ->             games.flags
    ->
    -> from
    ->     games
    ->             join gameRatingsSandbox0_mv on games.id = gameid
    ->
    -> where
    ->     1
    ->
    ->
    ->
    -> order by
    ->     games.created desc
    ->
    -> limit 0, 100;
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+--------+----------+------------+-------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows  | r_rows | filtered | r_filtered | Extra |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+--------+----------+------------+-------+
|    1 | SIMPLE      | games                  | index  | PRIMARY       | created | 5       | NULL          | 14371 | 100.00 |   100.00 |     100.00 |       |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | ifdb.games.id | 1     | 1.00   |   100.00 |     100.00 |       |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+--------+----------+------------+-------+

In particular note r_rows before and after. Before this PR it was 14371, but after it was 100. rows is the planners estimate of how many rows it would have to scan.

Query for IFComp 2024 games

MariaDB [ifdb]> analyze select sql_calc_found_rows
    ->     games.id as id,
    ->             games.title as title,
    ->             games.author as author,
    ->             games.desc as description,
    ->             games.tags as tags,
    ->             games.created as createdate,
    ->             games.moddate as moddate,
    ->             games.system as devsys,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%M %e, %Y'))
    ->                 as pubfmt,
    ->             if (time(games.published) = '00:00:00',
    ->                 date_format(games.published, '%Y'),
    ->                 date_format(games.published, '%Y-%m-%d'))
    ->                 as published,
    ->             date_format(games.published, '%Y') as pubyear,
    ->             (games.coverart is not null) as hasart,
    ->             avgRating as avgrating,
    ->             numRatingsInAvg as ratingcnt,
    ->             stdDevRating as ratingdev,
    ->             numRatingsTotal,
    ->             numMemberReviews,
    ->             starsort,
    ->             games.sort_title as sort_title,
    ->             games.sort_author as sort_author,
    ->             ifnull(games.published, '9999-12-31') as sort_pub,
    ->             games.pagevsn,
    ->             games.flags
    ->
    -> from
    ->     games
    ->             join gameRatingsSandbox0_mv on games.id = gameid inner join (select distinct gameid from compgames where compid = '4lxgmwam7owmbb9h') as compgames on compgames.gameid = games.id
    ->
    -> where
    ->     1
    ->
    ->
    ->
    -> order by
    ->     starsort desc
    ->
    -> limit 0, 100;
+------+-------------+------------------------+--------+---------------+---------+---------+------------------+------+--------+----------+------------+-----------------------------------------------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref              | rows | r_rows | filtered | r_filtered | Extra                                               |
+------+-------------+------------------------+--------+---------------+---------+---------+------------------+------+--------+----------+------------+-----------------------------------------------------+
|    1 | PRIMARY     | <derived2>             | ALL    | NULL          | NULL    | NULL    | NULL             | 82   | 67.00  |   100.00 |     100.00 | Using temporary; Using filesort                     |
|    1 | PRIMARY     | games                  | eq_ref | PRIMARY       | PRIMARY | 130     | compgames.gameid | 1    | 1.00   |   100.00 |     100.00 |                                                     |
|    1 | PRIMARY     | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 130     | compgames.gameid | 1    | 1.00   |   100.00 |     100.00 |                                                     |
|    2 | DERIVED     | compgames              | ref    | compid        | compid  | 130     | const            | 82   | 71.00  |   100.00 |     100.00 | Using index condition; Using where; Using temporary |
+------+-------------+------------------------+--------+---------------+---------+---------+------------------+------+--------+----------+------------+-----------------------------------------------------+

@dfabulich dfabulich requested a review from salty-horse August 16, 2025 18:35
In #902 I added `DISTINCT` to the game search query, to fix a bug where searches by competition could show duplicates, if the same game appeared in multiple divisions.

It turns out that adding `DISTINCT` prevents the query planner from doing simple index scans, e.g. when simply querying for the newest games.

So I've removed `DISTINCT` in this commit, and re-fixed #901 by querying `compgames` with a subquery with a `distinct` clause.

Fixes #1287.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The home page query for new games was flagged as a slow query Search for IFComp 2024 games returns duplicated results

2 participants