Amazon Redshift is a fanatastic product. It is being used by cloud developers since long time. Being a user of Redshift I want to put the below limitations of the product I have observed.
- SQL database: so your data must be structured as table with basic types only. Usually not a problem but forget json
- Still a cluster to dimension. Although adding or removing nodes is just a click away and a secure procedure quite fast (couple of hours), you have to manually do it.
- You pay for the cluster by the time up, not by number of queries/requests/data returned or other model (vs a SAAS service)
- As any MPP database (or noSQL database for that matter), distribution key and sort key choice for a table will drive the performance. So you have to choose carefully and some time store the same data in more than table if performance requires it (for my case, still not needed)
- The cluster has a master node on which all queries and data returned pass thru. This is a potential bottleneck compared to other distributed database, although not a critical one.
- Amazon Redshift is an OLAP and not an OLTP database. Because of that, it is not built with query speed in mind. It might sound trivial but you will find many people complaining that it is difficult to achieve sub second query times. This is not an Amazon Redshift limitation per se, but more a trade off of the architecture.
- happens in most (if not all) cases through Amazon S3. This adds complexity and it might be considered a slower process but again it is a trade off for having the ability to load extreme amounts of data into your data warehouse.
- By being an MPP system Amazon Redshift does not support things like indices. Instead you will need to consider things like , deciding about the best and the .
- With systems of the scale of Amazon Redshift also maintenance is of great importance. So you will need to have a disciplined maintenance schedule if you want to get the most out of your cluster. You will need to , as deleted rows are not automatically reclaimed, this is inherited from PostgreSQL. Also you will have to and make the appropriate changes to your table distribution strategies and sort keys as your data and queries change.