Amazon Redshift : Limitations

Amazon Redshift is a fanatastic product. It is being used by cloud developers since long time. Being a user of Redshift I want to put the below limitations of the product I have observed.

  • SQL database: so your data must be structured as table with basic types only. Usually not a problem but forget json
  • Still a cluster to dimension. Although adding or removing nodes is just a click away and a secure procedure quite fast (couple of hours), you have to manually do it.
  • You pay for the cluster by the time up, not by number of queries/requests/data returned or other model (vs a SAAS service)
  • As any MPP database (or noSQL database for that matter), distribution key and sort key choice for a table will drive the performance. So you have to choose carefully and some time store the same data in more than table if performance requires it (for my case, still not needed)
  • The cluster has a master node on which all queries and data returned pass thru. This is a potential bottleneck compared to other distributed database, although not a critical one.
  • Amazon Redshift is an OLAP and not an OLTP database. Because of that, it is not built with query speed in mind. It might sound trivial but you will find many people complaining that it is difficult to achieve sub second query times. This is not an Amazon Redshift limitation per se, but more a trade off of the architecture.
  • Loading data into Amazon Redshift happens in most (if not all) cases through Amazon S3. This adds complexity and it might be considered a slower process but again it is a trade off for having the ability to load extreme amounts of data into your data warehouse.
  • By being an MPP system Amazon Redshift does not support things like indices. Instead you will need to consider things like selecting the right table distribution strategies, deciding about the best sort keys for your tables and the optimal compression settings for your table columns.
  • With systems of the scale of Amazon Redshift also maintenance is of great importance. So you will need to have a disciplined maintenance schedule if you want to get the most out of your cluster. You will need to periodically VACUUM your tables to free disk space, as deleted rows are not automatically reclaimed, this is inherited from PostgreSQL. Also you will have to actively monitor your queries and your cluster performance and make the appropriate changes to your table distribution strategies and sort keys as your data and queries change.

Author: Aditya Bhuyan

I am an IT Professional with close to two decades of experience. I mostly work in open source application development and cloud technologies. I have expertise in Java, Spring and Cloud Foundry.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s