Amazon Redshift

What is Redshift

 Amazon Redshift is a cloud warehouse solution by Amazon Web Services (AWS). For an understanding of Redshift’s general architecture, please read my answer What is the architecture of Amazon Redshift?

But 7 years are an eternity in tech, and Redshift has advanced by a lot, so it’s time for an update. Redshift has the advantage that Amazon AWS is by the far the largest cloud providers, and it’s the natural choice when you’re running your analytical workloads on Amazon AWS.

So despite strong growth by other cloud warehouse products like BigQuery, Azure SQL Warehouse and Snowflake, Amazon Redshift is still the market leader for cloud warehouses.

So how does Redshift work, and what’s been driving its adoption? There are two basic components you need to understand about Amazon Redshift:

  1. The technology, which is a combination of two things:
    1. Columnar storage
    2. Massively Parallel Processing (MPP)
  2. The economics, again a combination of two things:
    1. Data lake integration (via Redshift Spectrum)
    2. Pricing

It’s the combination of these factors that have driven up adoption.

Technology: Columnar Storage & MPP

Columnar storage and MPP allow for distributed processing of large, complex analytical queries across terabytes of data.

That’s nothing, other (on-premise) warehouses also offer that. The difference here is that they require hardware, installation and maintenance. It can take weeks and months to get running.

Redshift’s innovation was to deliver the database as a service, with a cluster up and running in less than 15 minutes.

Economics: Data Lakes and Pricing

Your lowest starting price with Redshift is $1,380 per year for one node of dc2 with a 1-year commitment.

If you want a quick rule of thumb, for a 3-year commitment:

  • Dense Storage nodes cost ~$1,000/TB/Year and scale to over a Petabyte of compressed data.
  • Dense Compute nodes cost ~$5,500/TB/Year and scale up to hundreds of compressed Terabytes for $5,500/TB/Year

Then add that you can shift data from Redshift into S3, and query it with Redshift Spectrum. That’s your “data lake” strategy. The cost to store 1TB of data in S3 is about $200/TB/Year.

At such a low price point, it makes no more sense to aggregate your data within some external processing layer like Hadoop. Just store your data in your cloud warehouse, and run your transformations in SQL – a much less complex proposition.

Summary

You could summarize it that Redshift is simple, fast and cheap. It’s no wonder that it has found broad adoption in the SMB market and the enterprise alike.

Author: Aditya Bhuyan

I am an IT Professional with close to two decades of experience. I mostly work in open source application development and cloud technologies. I have expertise in Java, Spring and Cloud Foundry.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s