DuckDB is incredibly fast if you have enough memory

Parviz Deyhim
1 min readJun 16, 2023

--

DuckDB is incredibly fast as long as the dataset size fits the available memory. However, when dataset size grows (in this case 2x) more than the available memory, the query crashes with the out of memory error.

We ran several TPCH benchmarks on DuckDB hosted on a n1–standard-1 VM with 3.7GB of memory. Each benchmark was executed with a different dataset sizes.

https://explorer.benchops.io/report.html?id=e27c9694-7448-46a5-912f-1848cfd710fa
https://explorer.benchops.io/report?id=cbad9e64-ead2-4ddd-a77d-2c7ebc635cfb

For each benchmark, we created TPCH data starting with 1GB and counted up to 6GB. The results are shown on the right side panel of this page (login required). DuckDB is incredibly fast. For instance the TPCH benchmark with 4GB of data only took 30 seconds. However, the benchmark got linearly slow as the dataset size grew. This is the expected behavior for the most part given that DuckDB is a single node scale-up architecture. However, what was unexpected was out of memory failure when we ran the benchmark with 6GB of data which is approximately 2x the total memory DuckDB is allowed to use (75% of the total memory of the VM): duckdb.OutOfMemoryException: Out of Memory Error: failed to pin block of size 262KB (1.9GB/2.0GB used)

--

--

Parviz Deyhim

Data lover and cloud architect @databricks (ex-google, ex-aws)