Published by CloudForecast
Amazon Redshift is a widely used cloud data warehouse that is used by many businesses, like Nasdaq, GE, and Zynga, to process analytical queries and analyze exabytes of data across databases, data lakes, data warehouses, and third-party data sets.
There are multiple use cases for Redshift, including enhancing business intelligence capabilities, increasing developer and analyst productivity,
and building machine learning models for predictive insights, like demand forecasting.
Amazon Redshift can be leveraged by modern data-driven organizations to vastly improve their data warehousing and analytics capabilities. However, the pricing for Redshift services can be challenging to understand, with multiple criteria that define the total cost.
In this article, you’ll learn about Amazon Redshift and its pricing structure, with suggestions for how to optimize costs.
What Is Amazon Redshift?
Essentially, Amazon Redshift provides analytics over multiple databases and offers high scalability in a secure and compliant fashion.
Additionally, there is a serverless option called Amazon Redshift Serverless that makes it even easier to rapidly scale analytics setup without requiring a managed data warehouse infrastructure. It helps with data democratization and assists various data stakeholders to extract data insights by simply loading and querying data in the warehouse.
Amazon Redshift Pricing
In this section, you’ll learn about Amazon Redshift’s capabilities as it pertains to usage and pricing.
For new enterprise users, the AWS Free Tier provides a free two-month trial of the DC2.Large node. This free service includes 750 hours per month, which is sufficient to run a single DC2.Large node with 160GB of compressed solid-state drives (SSD).
When you launch an Amazon Redshift cluster, you select a number of nodes in a specific region as well as their instance type to run your data warehouse. In on-demand pricing, a simple hourly rate applies based on the previous configuration and is billed as long as the cluster is live. The typical hourly rate for a DC2.Large node is $0.25 USD per hour.
Redshift Serverless Pricing
With Amazon Redshift Serverless, costs accrue only when the data warehouse is active and is measured in units of Redshift Processing Units (RPUs). You’re charged in terms of RPU-hours on a per-second basis. The serverless configuration also includes concurrency scaling and Amazon Redshift Spectrum, and the cost for these services is already included.
Managed Storage Pricing
Amazon Redshift charges for the data stored in a managed storage at a specific rate per GB-month. Its usage is calculated on an hourly basis as a function of the total amount of data and starts as low as $0.024 USD per GB with the RA3 node. The cost of a managed storage also varies according to the particular AWS region in which the data is stored.
For example, consider the cost of a managed storage pricing where 100TB of data is stored with an RA3 node type for thirty days in the US East region, where the cost is $0.024 USD per GB-month.
The total usage for thirty days in GB-hours is as follows:
100TB × 1024GB/TB (converting TB to GB) × 30 days × 24 hours/day = 73,728,000 GB-hours
Then you can convert GB-hours to GB-months:
73,728,000 GB-hours / (24 × 30) hours per month = 102,400 GB-months
Finally, you can calculate the total cost of 102,400 GB-months at $0.024 USD/GB-month in the US East region:
102,400 GB-months × $0.024 USD = $2,457.60 USD
With Amazon Redshift Spectrum, users can run SQL queries directly on the data in the S3 buckets. Here, the cost is based on the number of bytes scanned by the Spectrum utility.
The pricing of Redshift Spectrum is $5 USD per terabyte of data scanned.
Concurrency Scaling Pricing
With Concurrency Scaling, Amazon Redshift can be scaled to multiple concurrent users and queries. For every twenty-four hours that your main cluster is live, you accrue a one-hour credit. Any additional usage is charged on a per-second, on-demand rate that depends on the number of types of nodes in the main cluster.
Reserved Instance Pricing
Reserved instances are designated for stable production workloads and are less expensive than clusters run on an on-demand basis. Significant cost savings can be achieved through long-term usage and commitment to Amazon Redshift in the span of a few years.
Pricing for reserved instances can either be paid all up front, partially up front, or monthly over the course of a year with no up-front charges.
Amazon Redshift Cost Optimization Considerations
Before you begin using Amazon Redshift, you need to be aware of your current costs.
AWS Cost ExplorerThe AWS Pricing Calculator provides a configurable tool to estimate the cost of using Amazon Redshift.
For instance, the annual cost of one node of the DC2.8xlarge instance in the US East (Ohio) region on an on-demand basis is as follows:
1 instance × $4.80 USD hourly × 730 hours in a month × 12 months = $42,048 USD
The cost for the same Amazon Redshift configuration for a reserved instance for a one-year term paid up front is $27,640 USD.
Using AWS cost allocation tags can help you decode and manage your AWS costs. Tagsenable AWS resources to be labeled in the form of key-value pairs and can include various types, like technical, business, security, and automation. Once the tags are activated in the Billing and Cost Management console, a cost allocation report can be generated based on the specific resources tagged. Tags can be user-defined or AWS-generated.
Amazon Redshift Cost Optimization
Optimizing Amazon Redshift costs comes down to effective planning, prudent usage and allocation of resources, and regular monitoring of the usage and associated costs.
The analytical queries made on the data stored in Amazon Redshift can be optimized to run more efficiently. Queries can be compute-intensive, can be storage-intensive, or can take a long time to execute.
There are a number of query tuning techniques that can be used to optimize your queries. Tables with skewed data or missing statistics, and queries with nested loops and long wait times, typically affect query performance and can be improved as illustrated in this AWS developer guide.
Here is a commonly used weak query that selects all the columns in a table:
SELECT * FROM USERS
The previous query can be very inefficient and slow if the table consists of thousands of columns, especially if only a few columns are relevant for the necessary analysis. This query can be optimized by specifying and retrieving the exact column names like the following:
SELECT Firstname, Lastname, DOB FROM USERS
Cluster Limits and Quotas
Usage limits on Amazon Redshift clusters can be programmed using the AWS Command Line Interface (CLI) tool. Limits can be imposed on concurrency scaling in terms of time and spectrum in terms of data scanned. Daily, weekly, or monthly periods can be used.
A number of limits and quotas are defined for Redshift resources that can also be applied to constrain the overall costs associated with Redshift.
Amazon Redshift costs can also be managed by storing data in a compressed, partitioned, and columnar data format, like Apache Parquet, since fewer data is scanned.
Amazon Redshift is a powerful and cost-effective cloud-native data warehouse that provides scalable and performant data analytics and processing capabilities. It also comes with a serverless configuration that allows any data stakeholder to run data queries without the need to provision and manage the data warehouse infrastructure.
Amazon Redshift has multiple aspects affecting its pricing, including on-demand or reserved capabilities, serverless, managed storage pricing, Redshift Spectrum pricing, concurrency scaling pricing, and reserved instance pricing. Keeping on top of the various Amazon Redshift costs is not straightforward but can be made easier by AWS cost monitoring tools, like CloudForecast.
CloudForecast helps manage AWS costs through daily cost management reports, monthly financial reports, untagged AWS resources discovery, and idle and underutilized resources visibility for cost-saving opportunities.
Copyright © 2022, Sundeep Teki
All rights reserved. No part of these articles may be reproduced, distributed, or transmitted in any form or by any means, including electronic or mechanical methods, without the prior written permission of the author.
This is a personal blog. Any views or opinions represented in this blog are personal and belong solely to the blog owner and do not represent those of people, institutions or organizations that the owner may or may not be associated with in professional or personal capacity, unless explicitly stated.