Redshift Serverless: RPU and impact on query latency

Parviz Deyhim
3 min readMay 10, 2023

I’ve decided to take Redshift Serverless for a test drive. My main goal? To understand how it scales and to get a grasp on this thing called RPU (Redshift Performance Units), especially how it affects cost and performance. I’m going to dive into a 1GB TCPDH dataset, stored on S3 in Delta format, and let Redshift Serverless run through 22 TPCDH queries on S3. To gauge the speed, I’ll measure the throughput.

To make this happen, I’m going to rely on Benchops’ managed TPCDH benchmark. I’ll be juggling a mix of metrics from Benchops to unpack the cost-performance dynamics. In this blog post I’m focusing on understanding what happens to the overall performance as we increase the RPU.

I’ll kick things off with a benchmark using the default RPU setting of 128. The results? You can check them out on the right-hand panel of this page (you’ll need to log in to see them).

Next, I’ll tweak the Redshift RPU down to 64 and run the benchmark all over again. I’ll keep this process going until I hit 16 RPUs. If you glance at the graph here (again, login required), you’ll see that the query latency (as expected) gets better with more RPUs.

It’s no surprise that more RPUs (which means more resources) lead to better query execution times. Yet, the improvement in latency from upping the RPU from 16 to 32 outshines the gains made when increasing from 32 to 64 and 128.

To dig deeper, I’m going to pull up the metrics collected by Benchops and analyze this intriguing data. What I found was that as RPU increased from 32 to 64 and 128, the query compile time increased significantly.

Helper function to get data from Benchops APIs

def _call_api(endpoint, auth_header):
import requests
import json
headers = {
"Authorization" : auth_header
}

req = requests.get(endpoint, verify=False, headers=headers)

if req.status_code == 200:
return json.loads(req.content)
else:
print(req.status_code)

return None
def _flatten_list(list_of_lists):
return [item for sublist in list_of_lists for item in sublist]
auth = '****'benchmarks = ['dd463a7b-8f76-49f0-aa3a-c118411db215', 'cbe4f267-7b03-427a-8309-582d3c854855', '14ff8835-5e63-41e1-b18e-0ac24ad8dc1c', 'a3e85cac-86fd-41ba-87d8-c65244c29852']
benchmarks_data = list(map(lambda x: _get_benchmark(x, auth), benchmarks))

transforming data to a shape that we can analyze

benchmarks_data_flat = list(map(lambda x: json.loads(x)['metrics'], benchmarks_data))df = json_normalize(_flatten_list(benchmarks_data_flat))df_q = df[df['metric'] == 'query_stats']df_q.loc[:,'RPU'] = df_q['metadata.query_text'].apply(lambda x: int(re.search(r'name:(\w+)?', x).group(1).replace('RPU', '')))
df_q.loc[:,'query_id'] = df_q['metadata.query_text'].apply(lambda x: re.search(r'TPC-H (\w+)', x).group(1))
g = df_q[[
##'metadata.save', 'metadata.sortlimit', 'metadata.hashjoin',
##'metadata.hash', 'metadata.scan', 'metadata.sort', 'metadata.queue_time',
##'metadata.compile_time',
'metadata.elapsed_time',
##'metadata.lock_wait_time',
'query_id',
'RPU']].groupby(by=['RPU', 'query_id']).agg(np.mean).reset_index()
g_diff = g.groupby(by=['query_id'])['metadata.elapsed_time'].agg(['last', 'first'])/1000000
g_diff.loc[:, 'diff'] = (g_diff['first'] - g_diff['last'])
g_diff = g_diff['diff'].apply(lambda x: 'improved' if x > 0 else 'slowed')
df_joined = df_q.join(g_diff, on=['query_id'])

Taking a look at different query execution components to see if there as been any slowdown over time

df_joined_filtered = df_joined.loc[:, [

'metadata.save', 'metadata.sortlimit', 'metadata.hashjoin',
'metadata.hash', 'metadata.scan', 'metadata.sort', 'metadata.queue_time',
'metadata.compile_time','diff', 'RPU',
'metadata.elapsed_time',
'query_id'

]]
df_joined_filtered[df_joined_filtered['diff'] == 'slowed'].groupby(by=['RPU']).median().plot(kind='line', stacked=True)
png
df_joined_filtered[df_joined_filtered['query_id'] == 'Q3'].groupby(by='RPU').mean().plot(kind='bar', stacked=True)
png

--

--

Parviz Deyhim

Data lover and cloud architect @databricks (ex-google, ex-aws)