Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Main Idea
We want to integrate the =
and IN
operators into our workflow.
Currently we are generate queries in the following manner
down circle "start" fit arrow box "choose fact" "table randomly" fit arrow box "generate subgraph" fit arrow LS: circle "Begin" "predicate" "generation" fit arrow box "Pick random column." fit arrow box "Generate a range" "predicate according to" "row retention probability" fit arrow LE: circle "End" "predicate" "generation" fit arrow box "Query Builder" "makes the SQL" "statement" fit arrow circle "End" L1: line from LE.e to (LE + (2,0)) L2: line from (LS + (2,0)) to L1.e "Repeats " aligned "Up to n times" aligned L3: arrow from L2.n to LS.e→ /pikchrshow
The new structure idea is to add a probability to choose one or the other column
down circle "start" fit arrow box "choose fact" "table randomly" fit arrow box "generate subgraph" fit arrow LS: circle "Begin" "predicate" "generation" fit arrow box "Pick random column." fit arrow color red D: diamond color red "Choose" "Operator" fit arrow "Range" aligned above box "Generate a range" "predicate according to" "row retention probability" fit arrow LE: circle "End" "predicate" "generation" fit arrow box "Query Builder" "makes the SQL" "statement" fit arrow circle "End" L1: line from LE.e to (LE + (3,0)) L2: line from (LS + (3,0)) to L1.e "Repeats " aligned "Up to n times" aligned L3: arrow from L2.n to LS.e E: box at (D + (1.5,-1.2)) "Generate Equal" "With one MCV" fit color red arrow from D.e to E.n "Equal" aligned above color red arrow from E.s to LE.n color red IN: box at (D + (-1.9,-1.2)) "Generate IN" "With one MCV" "and random elements" fit color red line from IN.n to D.w "IN" aligned above color red arrow from D.w to IN.n color red arrow from IN.s to LE.n color red→ /pikchrshow
How to pick operators
We based ourselves in the probabilities we saw for operators in TPCDS
=
appears 100%IN
appears 38% of the timeranges
appear 50% of the time in < but another 50% of the time with between.
So we will make IN appear a less than =. For now we will go with the probabilities = (3/7) , range (3/7) , in (1,7)
Changing the input of the CLI
We are now going to use a toml file to pass the input because there is too many inputs to manage. We are also adding inputs for
- the amount of elements in the IN.
- The Minimum amount of selectivity that should be in the MCV to be accepted.
13 check-ins related to "new-predicates"
2025-05-28
| ||
09:27 | Merges IN and = predicate check-in: 0a0518ab14 user: mathos tags: trunk | |
09:24 | Adds equality lower bound as an array. Leaf check-in: 4162607284 user: mathos tags: new-predicates | |
00:05 | Finally. An stable version check-in: 48e17f1cde user: mathos tags: new-predicates | |
2025-05-27
| ||
23:42 | Half of the tests are not working. It is 2 am. But alas. I have a fucking IN and an = check-in: 84054ef5eb user: mathos tags: new-predicates | |
23:12 | Added = and in to the predicate, but not yet to the builder. Half of the tests are failing check-in: e4eae2a3c5 user: mathos tags: new-predicates | |
12:17 | Adds the probabilities to the snowflake and search_params endpoints check-in: d8b4a049cf user: mathos tags: new-predicates | |
11:41 | Fixes tests to new format of input check-in: 0869b3ce5b user: mathos tags: new-predicates | |
2025-05-26
| ||
22:06 | Adds config file to snowflake endpoint check-in: 284272895e user: mathos tags: new-predicates | |
21:57 | Adds the new toml config file for the search_params check-in: 6a92d2de07 user: mathos tags: new-predicates | |
11:57 | Adds the new classes for predicate. The enum idea was stupid check-in: 8161d98cc7 user: mathos tags: new-predicates | |
11:35 | Adds new type of predicates (non functional) check-in: 705dea2165 user: mathos tags: new-predicates | |
11:22 | Minor refactor to predicate class. Ticket [1e726428f6e719fb] check-in: c93b2b766c user: mathos tags: new-predicates | |
11:12 | Fix to be able to save the CSV check-in: 0f0856db5a user: mathos tags: trunk | |