File docs/miscellaneous/data_structure.md artifact 442a2571ca on branch docs
File structure
The structure of the files is similar to:
📦query-generator
┣ 📂data
┃ ┣ 📂duckdb
┃ ┃ ┗ 📂TPCDS
┃ ┃ ┃ ┗ 📜0.1.db
┃ ┣ 📂generated_queries
┃ ┃ ┗ 📂SNOWFLAKE_SEARCH_PARAMS
┃ ┃ ┃ ┗ 📂TPCDS
┃ ┗ 📂histograms
┃ ┃ ┣ 📜histogram_job.parquet
┃ ┃ ┣ 📜histogram_tpcds.parquet
┃ ┃ ┣ 📜histogram_tpch.parquet
┣ 📂docs
┣ 📂params_config
┃ ┣ 📂complex_queries
┃ ┃ ┣ 📜tpcds.toml
┃ ┃ ┣ 📜tpcds_dev.toml
┃ ┣ 📂search_params
┃ ┃ ┣ 📜job.toml
┃ ┃ ┣ 📜job_dev.toml
┃ ┃ ┣ 📜tpcds.toml
┃ ┃ ┗ 📜tpcds_dev.toml
┃ ┗ 📂snowflake
┃ ┃ ┗ 📜tpcds.toml
┣ 📂src
┣ 📂tests
┣ 📜CONTRIBUTING.md
┣ 📜README.md
┗ 📜pyproject.toml
- The
docsfolder contains the documentation files. - The
srcfolder contains the source code for the generator - The
testfolder contains the tests made to the code for quality assurance. - The
pyproject.tomlcontains the information pixi needs to install the libraries and run the project.
Data folder
Includes
- The databases that are generated. They are under
duckdb - The generated queries by default.
- The precomputed histograms for popular databases
Params Config folder
It contains the input files for the most relevant query generation like
TPC-DS 100. When the file has a _dev in it, it means that the files