AWS Glue 4 Adds Pandas Support
Written by Kay Ewbank   
Thursday, 01 December 2022

AWS Glue has been updated with updated engines and support for Pandas. AWS Glue is a serverless data integration service that Amazon says makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning and application development.

Glue includes a collection of libraries, engines, and tools developed by the open source community. AWS Glue consists of a Data Catalog which is a central metadata repository; an ETL engine that can automatically generate Scala or Python code; a flexible scheduler that handles dependency resolution, job monitoring, and retries; and AWS Glue DataBrew for cleaning and normalizing data with a visual interface.


Glue 4 includes AWS Glue Studio, a new graphical interface that makes it easy to create, run, and monitor extract, transform, and load jobs. The Studio can be used to visually compose data transformation workflows for running on AWS Glue’s Apache Spark-based serverless ETL engine.


The Pandas support means Python developers can use Pandas data analysis and manipulation facilities. The new version of Glue also has updated versions of the Spark and Python engines, Python 3.10 and Apache Spark 3.3.0. Both engines include bug fixes and performance enhancements; Spark includes new features such as row-level runtime filtering and additional built-in functions. Glue and Amazon EMR make use of the same optimized Spark runtime, which the Glue team says has been optimized to run in the AWS cloud and can be two to three times faster than the basic open source version.

Glue 4.0 also adds native support for the Cloud Shuffle Service Plugin for Spark to help scale disk usage, and Adaptive Query Execution to dynamically optimize queries as they run.

Another improvement to the new release is the addition of support for more data formats. Glue now has support for Apache Hudi, Apache Iceberg, and Delta Lake. It also now includes the Parquet vectorized reader, with support for additional data types and encodings. It has been upgraded to use log4j 2 and is no longer dependent on log4j 1.


More Information

Amazon Glue Webpage

Related Articles

Amazon Announces AWS Visual Embedding

Amazon Launches AWS Workflow Studio

Amazon Releases Data IDE, Meet EMR Studio

Amazon AWS Invests In Rust

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


C++ Is TIOBE's Language Of The Year

The accolade "Programming Language of the Year" attracts new attention to the winner. This year it is C++ that has taken the honor, having been the language that has seen the greatest year-on-year inc [ ... ]

Build Rich GUI Apps In Python With Aid From Delphi

Embarcadero has made its Delphi-based GUI libraries, VCL and FireMonkey (FMX), available for Python. These libraries are meant to be better and more adequate than the Tkinter ones that ship with  [ ... ]

More News





or email your comment to:

Last Updated ( Thursday, 01 December 2022 )