[Solved] Optimizing big sorted join (nested loop) based on value frequency

DavidP Asks: Optimizing big sorted join (nested loop) based on value frequency
I have a query involving two tables, let’s call them timeline and events, defined as follows:

Code:
create table timeline (
   event_id int,
   ts timestamptz
);

create table events (
   id int primary key,
   foo int,
   bar int,
   baz int
);

Then I have these indexes:

Code:
create index timeline_sorted_idx on timeline (ts desc, event_id desc)
create index events_foo_idx on events (foo);
create index events_bar_idx on events (bar);
create index events_baz_idx on events (baz);

Both tables contain circa 10 million rows.

Now I want to execute this type of query (to scroll through timeline in mobile app):

Code:
select * 
from events e
join timeline t on e.id = t.event_id
where (t.ts >= 2020-01-01 and t.ts <= 2021-01-01) and (e.foo = 1 or e.bar = 1 or e.baz = 1)
order by t.ts desc, t.event_id desc
offset 0 limit 10;

which yields this execution plan for foo/bar/baz = 1 (there is lot of 1s in events table):

Code:
Limit  (cost=0.87..129.74 rows=10 width=328) (actual time=0.792..7.086 rows=10 loops=1)
  ->  Nested Loop  (cost=0.87..1432982.57 rows=111200 width=328) (actual time=0.790..7.080 rows=10 loops=1)
        ->  Index Only Scan using timeline_sorted_idx on timeline t  (cost=0.43..90018.01 rows=1158326 width=12) (actual time=0.030..1.358 rows=827 loops=1)
              Index Cond: ((ts >= '2020-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2020-01-01 00:00:00'::timestamp without time zone))
              Heap Fetches: 827
        ->  Index Scan using events_pkey on events e  (cost=0.43..1.16 rows=1 width=304) (actual time=0.006..0.006 rows=0 loops=827)
              Index Cond: (id = t.event_id)
              Filter: (foo = 1 OR bar = 1 OR baz = 1)
              Rows Removed by Filter: 1
Planning time: 0.990 ms
Execution time: 7.248 ms

but this for foo/bar/baz = 2 (2 has zero occurrence in events table):

Code:
Limit  (cost=1000.89..1843.67 rows=10 width=328) (actual time=2737.428..2737.428 rows=0 loops=1)
  ->  Gather Merge  (cost=1000.89..644711.51 rows=7638 width=328) (actual time=2737.426..2737.426 rows=0 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Nested Loop  (cost=0.87..642829.87 rows=3182 width=316) (actual time=2691.361..2691.361 rows=0 loops=3)
              ->  Parallel Index Only Scan using timeline_sorted_idx on timeline t  (cost=0.43..83261.11 rows=482636 width=12) (actual time=0.097..484.201 rows=390318 loops=3)
                    Index Cond: ((ts >= '2020-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2020-01-01 00:00:00'::timestamp without time zone))
                    Heap Fetches: 438622
              ->  Index Scan using events_pkey on events e (cost=0.43..1.16 rows=1 width=304) (actual time=0.005..0.005 rows=0 loops=1170954)
                    Index Cond: (id = t.event_id)
                    Filter: (foo = 1 OR bar = 1 OR baz = 1)
                    Rows Removed by Filter: 1
Planning time: 0.743 ms
Execution time: 3002.328 ms

OK, I understand… postgres chose nested loop, because there is a sorted index on a timeline and the only thing to do is to find matching records in events for the first N requested rows. But, that works well if there are lots of matching records, so it finds all matches in a few hundreds or thousands of iterations. But If there is no match, then it takes some serious amount of time to iterate over a million of records in timeline and find nothing in events.

The question is, is there a way how to convince Postgres to choose a different plan based on foo/bar/baz frequency? In my tests, materialize events first with CTE for small frequencies is much faster than this, but much slower for high frequencies…

— EDIT — Also, it is important to note that in reality, there are a lot of “events” tables and all of them share one common timeline table. I cannot change that schema, because I’m not an owner of that system and this is not the primary use case of it…

Ten-tools.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your response here to help other visitors like you. Thank you, Ten-tools.