What is Redshift
Amazon Redshift is a cloud warehouse solution by Amazon Web Services (AWS). For an understanding of Redshift’s general architecture, please read my answer
But 7 years are an eternity in tech, and Redshift has advanced by a lot, so it’s time for an update. Redshift has the advantage that Amazon AWS is by the far the largest cloud providers, and it’s the natural choice when you’re running your analytical workloads on Amazon AWS.
So despite strong growth by other cloud warehouse products like BigQuery, Azure SQL Warehouse and Snowflake, Amazon Redshift is still the market leader for cloud warehouses.
So how does Redshift work, and what’s been driving its adoption? There are two basic components you need to understand about Amazon Redshift:
- The technology, which is a combination of two things:
- Columnar storage
- Massively Parallel Processing (MPP)
- The economics, again a combination of two things:
- Data lake integration (via Redshift Spectrum)
It’s the combination of these factors that have driven up adoption.
Technology: Columnar Storage & MPP
Columnar storage and MPP allow for distributed processing of large, complex analytical queries across terabytes of data.
That’s nothing, other (on-premise) warehouses also offer that. The difference here is that they require hardware, installation and maintenance. It can take weeks and months to get running.
Redshift’s innovation was to deliver the database as a service, with a cluster up and running in less than 15 minutes.
Economics: Data Lakes and Pricing
Your lowest starting price with Redshift is $1,380 per year for one node of dc2 with a 1-year commitment.
If you want a quick rule of thumb, for a 3-year commitment:
- Dense Storage nodes cost ~$1,000/TB/Year and scale to over a Petabyte of compressed data.
- Dense Compute nodes cost ~$5,500/TB/Year and scale up to hundreds of compressed Terabytes for $5,500/TB/Year
Then add that you can shift data from Redshift into S3, and query it with Redshift Spectrum. That’s your “data lake” strategy. The cost to store 1TB of data in S3 is about $200/TB/Year.
At such a low price point, it makes no more sense to aggregate your data within some external processing layer like Hadoop. Just store your data in your cloud warehouse, and run your transformations in SQL – a much less complex proposition.
You could summarize it that Redshift is simple, fast and cheap. It’s no wonder that it has found broad adoption in the SMB market and the enterprise alike.