|Amazon Redshift Updates|
|Written by Kay Ewbank|
|Thursday, 05 December 2019|
Amazon has announced a number of updates to Redshift, its cloud-based data warehouse service.
Redshift data can be analyzed using ‘normal’ SQL-based tools and business intelligence applications, and is designed to be easy to set up and manage - clusters can be set up using a few clicks in the AWS Management Console. Queries can be distributed and parallelized across multiple nodes. Amazon has automated most of the common administrative tasks associated with provisioning, configuring, monitoring, backing up, and securing a data warehouse to make Redshift easier to administer. Redshift is based on ParAccel technology from Actian (formerly known as Ingres), which Amazon acquired in 2013.
The updates announced at Amazon's Re:Invent conference start with the support for data lake export in Apache Parquet format. You can now unload the result of an Amazon Redshift query to your Amazon S3 data lake as Apache Parquet. The Parquet format is up to twice as fast to unload and uses up to six times less storage in Amazon S3, compared to text formats.
The next improvement to be announced is a preview of support for federated querying. The Amazon Redshift Federated Query feature lets you query and analyze data across operational databases, data warehouses, and data lakes. With Federated Query, you can now integrate queries on live data in Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL with queries across your Amazon Redshift and Amazon S3 environments.
Another improvements to queries in Redshift is the preview of Advanced Query Accelerator (AQUA) for Amazon Redshift. This is a new distributed and hardware-accelerated cache that Amazon says means Redshift can run up to ten times faster than any other cloud data warehouse. AQUA attempts to avoid the bottleneck of having to move data from centralized storage to compute clusters for processing, where the network bandwidth needed to move the data can be the bottleneck. Instead, AQUA does a substantial share of data processing in-place on its hardware-accelerated cache. Data intensive tasks such as such as filtering and aggregation are carried out closer to the storage layer so minimizing data movement between where data is stored and compute clusters.
The final improvement to Redshift is support for materialized views - again, this is in preview. Materialized views can speed up query performance for repeated and predictable analytical workloads. They store pre-computed results of queries and maintain them by incrementally processing the latest changes made to the source tables. Any query that uses the materialized views gets the pre-computed results much faster. Materialized views can be created based on one or more source tables using filters, projections, inner joins, aggregations, grouping, functions and other SQL constructs.
More details of all the new features can be found on the Redshift website.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Friday, 06 December 2019 )|