AWS Redshift: Amazon Redshift is a data warehouse that simplifies the procedure of information analysis as well as business intelligence. Networking Virtual Private Cloud (VPC) service provides administrators tight command with an isolated part of AWS cloud which forms the own virtual network of theirs.
Amazon Web Services provisions materials instantly in the VPC. Administrators can remain in addition to network traffic with Network Load Balancer, Application Load Balancer along with other load balancing equipment from Amazon Web Services.
Amazon Redshift Features
Scalable
In Amazon Redshift data warehouse with some simple clicks, you can easily scale the nodes. It allows scaling higher than storage capacity without any loss in performance.
Supports VPC
Amazon allows the users to launch Redshift within VPC and control access to the cluster via a virtual network environment.
SSL
The connection between clients and redshift can be encrypted with the SSL encryption.
Encryption
The data stored in the Amazon Redshift can be encrypted and configure when tables are created.
Cost-effective
This Amazon Redshift is more cost-effective than the traditional data warehouse. Redshift doesn’t have long-term commitments, up-fronts costs, and an on-demand pricing structure.
Amazon Redshift Advantages and Disadvantages
Advantages of Amazon Redshift
High Performance
The Amazon Redshift has high performance because of huge (massive) parallelism, distribution, efficient data compression, query optimization. The Massively Parallel Processing in Redshift enables us to parallelize data loading, back up and restore operation.
AWS ecosystem
The AWS is used to run their infrastructure for many companies like Ec2 for servers, RDS for database and long-term storage S3 is used. Suppose your infrastructure is in AWS then the Redshift works well and the cost of data transport and data locality will be comparatively low. In Amazon Redshift the data can move very fastly with the help of Massive Parallel Processing.
SQL Interface
If you’re already familiar with SQL, the same interface in PostgreSQL is used in ParAccel which is used in Amazon Redshift Query Engine. So, you no need to learn new technologies while using the query module in Amazon Redshift. The Redshift uses SQL that works with Postgres JDBC/ODBC drives which are already existing and ready to connect Business Intelligence tools.
Security
Amazon Redshift provides many more security features such as data encryption, VPC for network isolation and a lot more ways to handle access control. The data encryption option can be used in many places in Amazon Redshift. While loading data from S3 with SSL encryption you can encrypt data in process of transmission.
With the help of VPC infrastructure, you can launch Amazon Redshift clusters. whereas you can specify the VPC security groups to restrict inbounds & outbounds access to clusters in Redshift.
Disadvantages and Limitations of Redshift
A good understanding of Sort and Distribution Keys
In Amazon Redshift the sort keys and distribution keys will decide how data to be stored and can index them all over the Redshift nodes. It provides only one distribution key for the tables which cant be changed so we have been more careful with workloads before deciding Distribution keys.
Doesn’t enforce uniqueness
The Amazon Redshift can’t enforce uniqueness on inserted data. So, If you have a distributed system then immediately writes data in Redshift then you can handle uniqueness by yourself or by methods in data de-duplication or on the application layer.
Dynamo DB, S3, EMR support parallel upload
Suppose your data is in any of these AWS Dynamo DB, Amazon S3, Amazon EMR then Redshift can load fastly with MPP (Massively Parallel Processing). But not in every case for that you need to use JDBC inserts or some scripts that help to load data. You can also use ETL solutions such as Hevo that loads 100s of source data into Amazon Redshift parallelly.
Data on cloud
The Redshift provides data on the cloud which is a good thing but sometimes you need concern about the privacy of the data on the cloud. Because our data may be sensitive or you may not like to load data on the cloud.
Can’t be used as a live app DB
The Redshift is very fast while running queries on a large amount of data or running analytics or reports but cant be used as live app DB. So you have to load data into catching layer or vanilla Postgres to serve data to web applications.