Query-generation

Timeline
Login

Timeline

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Main Idea

We want to integrate the = and IN operators into our workflow.

Currently we are generate queries in the following manner

start choose fact table randomly generate subgraph Begin predicate generation Pick random column. Generate a range predicate according to row retention probability End predicate generation Query Builder makes the SQL statement End Repeats  Up to n times
down
circle "start" fit
arrow
box "choose fact" "table randomly" fit
arrow
box "generate subgraph" fit
arrow
LS: circle "Begin" "predicate" "generation" fit
arrow
box "Pick random column." fit
arrow
box "Generate a range" "predicate according to" "row retention probability" fit
arrow
LE: circle "End" "predicate" "generation" fit
arrow
box "Query Builder" "makes the SQL" "statement" fit
arrow
circle "End"


L1: line from LE.e to (LE + (2,0))
L2: line from (LS + (2,0)) to L1.e "Repeats " aligned "Up to n times" aligned
L3: arrow from L2.n to LS.e

The new structure idea is to add a probability to choose one or the other column

start choose fact table randomly generate subgraph Begin predicate generation Pick random column. Choose Operator Range Generate a range predicate according to row retention probability End predicate generation Query Builder makes the SQL statement End Repeats  Up to n times Generate Equal With one MCV Equal Generate IN With one MCV and random elements IN
down
circle "start" fit
arrow
box "choose fact" "table randomly" fit
arrow
box "generate subgraph" fit
arrow
LS: circle "Begin" "predicate" "generation" fit
arrow
box "Pick random column." fit
arrow color red
D: diamond color red "Choose" "Operator" fit
arrow "Range" aligned above
box "Generate a range" "predicate according to" "row retention probability" fit
arrow
LE: circle "End" "predicate" "generation" fit
arrow
box "Query Builder" "makes the SQL" "statement" fit
arrow
circle "End"


L1: line from LE.e to (LE + (3,0))
L2: line from (LS + (3,0)) to L1.e "Repeats " aligned "Up to n times" aligned
L3: arrow from L2.n to LS.e


E: box at (D + (1.5,-1.2)) "Generate Equal" "With one MCV" fit color red
arrow from D.e to E.n "Equal" aligned above color red
arrow from E.s to LE.n color red

IN: box at (D + (-1.9,-1.2)) "Generate IN" "With one MCV" "and random elements" fit color red
line from IN.n to D.w  "IN" aligned above color red
arrow from D.w to IN.n color red
arrow from IN.s to LE.n color red

How to pick operators

We based ourselves in the probabilities we saw for operators in TPCDS

  1. = appears 100%
  2. IN appears 38% of the time
  3. ranges appear 50% of the time in < but another 50% of the time with between.

So we will make IN appear a less than =. For now we will go with the probabilities = (3/7) , range (3/7) , in (1,7)

Changing the input of the CLI

We are now going to use a toml file to pass the input because there is too many inputs to manage. We are also adding inputs for

  1. the amount of elements in the IN.
  2. The Minimum amount of selectivity that should be in the MCV to be accepted.
13 check-ins related to "new-predicates"
2025-05-28
09:27
Merges IN and = predicate check-in: 0a0518ab14 user: mathos tags: trunk
09:24
Adds equality lower bound as an array. Leaf check-in: 4162607284 user: mathos tags: new-predicates
00:05
Finally. An stable version check-in: 48e17f1cde user: mathos tags: new-predicates
2025-05-27
23:42
Half of the tests are not working. It is 2 am. But alas. I have a fucking IN and an = check-in: 84054ef5eb user: mathos tags: new-predicates
23:12
Added = and in to the predicate, but not yet to the builder. Half of the tests are failing check-in: e4eae2a3c5 user: mathos tags: new-predicates
12:17
Adds the probabilities to the snowflake and search_params endpoints check-in: d8b4a049cf user: mathos tags: new-predicates
11:41
Fixes tests to new format of input check-in: 0869b3ce5b user: mathos tags: new-predicates
2025-05-26
22:06
Adds config file to snowflake endpoint check-in: 284272895e user: mathos tags: new-predicates
21:57
Adds the new toml config file for the search_params check-in: 6a92d2de07 user: mathos tags: new-predicates
11:57
Adds the new classes for predicate. The enum idea was stupid check-in: 8161d98cc7 user: mathos tags: new-predicates
11:35
Adds new type of predicates (non functional) check-in: 705dea2165 user: mathos tags: new-predicates
11:22
Minor refactor to predicate class. Ticket [1e726428f6e719fb] check-in: c93b2b766c user: mathos tags: new-predicates
11:12
Fix to be able to save the CSV check-in: 0f0856db5a user: mathos tags: trunk