Query-generation

data_structure.md at docs
Login

data_structure.md at docs

File docs/miscellaneous/data_structure.md artifact 442a2571ca on branch docs


File structure

The structure of the files is similar to:

📦query-generator
 ┣ 📂data
 ┃ ┣ 📂duckdb
 ┃ ┃ ┗ 📂TPCDS
 ┃ ┃ ┃ ┗ 📜0.1.db
 ┃ ┣ 📂generated_queries
 ┃ ┃ ┗ 📂SNOWFLAKE_SEARCH_PARAMS
 ┃ ┃ ┃ ┗ 📂TPCDS
 ┃ ┗ 📂histograms
 ┃ ┃ ┣ 📜histogram_job.parquet
 ┃ ┃ ┣ 📜histogram_tpcds.parquet
 ┃ ┃ ┣ 📜histogram_tpch.parquet
 ┣ 📂docs
 ┣ 📂params_config
 ┃ ┣ 📂complex_queries
 ┃ ┃ ┣ 📜tpcds.toml
 ┃ ┃ ┣ 📜tpcds_dev.toml
 ┃ ┣ 📂search_params
 ┃ ┃ ┣ 📜job.toml
 ┃ ┃ ┣ 📜job_dev.toml
 ┃ ┃ ┣ 📜tpcds.toml
 ┃ ┃ ┗ 📜tpcds_dev.toml
 ┃ ┗ 📂snowflake
 ┃ ┃ ┗ 📜tpcds.toml
 ┣ 📂src
 ┣ 📂tests
 ┣ 📜CONTRIBUTING.md
 ┣ 📜README.md
 ┗ 📜pyproject.toml

Data folder

Includes

  1. The databases that are generated. They are under duckdb
  2. The generated queries by default.
  3. The precomputed histograms for popular databases

Params Config folder

It contains the input files for the most relevant query generation like TPC-DS 100. When the file has a _dev in it, it means that the files