AWS Re:Invent 2022 Digest

Key announcement digest

Xin Cheng
14 min readMar 8, 2023

AWS re:Invent 2022 happened on Nov 28, 2022 — Dec 2, 2022. Just got time to digest it. We can see lots of announcements on AI/ML/Data, but somehow on Kubernetes side there are not too many new things.

AI

New — Process PDFs, Word Documents, and Images with Amazon Comprehend for IDP
This feature allows you to classify and extract entities from PDF documents, Microsoft Word files, and images directly from Amazon Comprehend without you needing to extract the text first. It seems to do OCR if the documents are scanned.

New for Amazon SageMaker — Perform Shadow Tests to Compare Inference Performance Between ML Model Variants
Deploying a model in shadow mode lets you conduct a more holistic test by routing a copy of the live inference requests for a production model to the new (shadow) model.

Next Generation SageMaker Notebooks — Now with Built-in Data Preparation, Real-Time Collaboration, and Notebook Automation
You can now improve data quality in minutes with the built-in data preparation capability, edit the same notebooks with your teams in real time, and automatically convert notebook code to production-ready jobs.

New — Share ML Models and Notebooks More Easily Within Your Organization with Amazon SageMaker JumpStart
Easily share your models and notebooks to collaborate and increase productivity, or to put your models into production, using SageMaker JumpStart, a machine learning (ML) hub that provides access to built-in algorithms with pre-trained models from popular model hubs.

AWS Machine Learning University New Educator Enablement Program to Build Diverse Talent for ML/AI Jobs
The new program offers year-round lesson planning, course playbooks, and access to free compute resources.

New — Introducing Support for Real-Time and Batch Inference in Amazon SageMaker Data Wrangler
This feature allows you to reuse the data transformation flow which you created in SageMaker Data Wrangler as a step in Amazon SageMaker inference pipelines.

New — Amazon SageMaker Data Wrangler Supports SaaS Applications as Data Sources
With this feature, you can use more than 40 SaaS applications as data sources via Amazon AppFlow, a SaaS integration service and have the data available on Amazon SageMaker Data Wrangler.

New ML Governance Tools for Amazon SageMaker — Simplify Access Control and Enhance Transparency Over Your ML Projects
New tools let you define custom permissions for SageMaker users in minutes (Amazon SageMaker Role Manager), document model information from conception to deployment (Amazon SageMaker Model Cards), and monitor all your deployed models through a unified dashboard (Amazon SageMaker Model Dashboard).

Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data
This collection of features offers pre-trained deep neural network (DNN) models and geospatial operators (open-source geospatial libraries such as NumPy, GDAL, GeoPandas, and Rasterio,) that make it easy to access and prepare large geospatial datasets.

New — Redesigned UI for Amazon SageMaker Studio
The redesigned UI makes it easier for you to discover and get started with the ML tools in SageMaker Studio.

Classifying and Extracting Mortgage Loan Data with Amazon Textract
The new API was created in response to requests from major lenders in the industry to help them process applications faster and reduce errors, which improves the end-customer experience and lowers operating costs.

Amazon CodeWhisperer Adds Enterprise Administrative Controls, Simple Sign-up, and Support for New Languages (Preview)
Administrators can now easily integrate CodeWhisperer with their existing workforce identity solutions, provide access to users and groups, and configure organization-wide settings. Amazon CodeWhisperer is a machine learning (ML)–powered service that helps improve developer productivity by generating code recommendations based on their comments in natural language and code in the integrated development environment (IDE).

Analytics

New for Amazon Redshift — Simplify Data Ingestion and Make Your Data Warehouse More Secure and Reliable
This year at re:Invent, Amazon Redshift has announced a number of features to help you simplify data ingestion and get to insights easily and quickly, within a secure, reliable environment, including auto-copy from Amazon S3, Amazon Aurora zero-ETL integration with Amazon Redshift, Multi-AZ deployments.

Announcing Additional Data Connectors for Amazon AppFlow
We’ve added 22 new data connectors for Amazon AppFlow, including connectors for marketing, customer service and engagement, and business operations.

Join the Preview — AWS Glue Data Quality
AWS Glue Data Quality can analyze your tables and recommend a set of rules automatically based on what it finds.

New — Amazon Athena for Apache Spark
With this feature, we can run Apache Spark workloads, use Jupyter Notebook as the interface to perform data processing on Athena, and programmatically interact with Spark applications using Athena APIs.

Athena Spark is using AWS Glue data catalog, howerver, computing infrastruture is provided by Amazon Athena. Does Amazon now recommend Amazon Athena for Apache Spark over Glue Spark?

New — Announcing Automated Data Preparation for Amazon QuickSight Q
Automated data preparation utilizes machine learning to infer semantic information about data and adds it to datasets as metadata about the columns (fields), making it faster for you to prepare data in order to support natural language questions.

New for Amazon Redshift — General Availability of Streaming Ingestion for Kinesis Data Streams and Managed Streaming for Apache Kafka
With this new capability, Amazon Redshift can natively ingest hundreds of megabytes of data per second from Amazon Kinesis Data Streams and Amazon MSK into an Amazon Redshift materialized view and query it in seconds.

Preview: Amazon Security Lake — A Purpose-Built Customer-Owned Data Lake Service
This new service automatically centralizes your organization’s security data from cloud and on-premises sources into a purpose-built data lake stored in your account.

New — Amazon Redshift Integration with Apache Spark
This new release makes it easy to build and run Spark applications on Amazon Redshift and Redshift Serverless, enabling customers to open up the data warehouse for a broader set of AWS analytics and machine learning (ML) solutions.

Preview: Amazon OpenSearch Serverless — Run Search and Analytics Workloads without Managing Clusters
This new release provisions and scales resources to deliver fast data ingestion and query responses for even the most demanding and unpredictable workloads, eliminating the need to configure and optimize clusters.

Amazon DataZone (preview): With DataZone, users can safely catalog, discover, share, and govern data across their organization. Redshift, Athena, and QuickSight will integrate with this to do data analysis.

New — Create and Share Operational Reports at Scale with Amazon QuickSight Paginated Reports
This feature allows customers to create and share highly formatted, personalized reports containing business-critical data to hundreds of thousands of end-users — without any infrastructure setup or maintenance, up-front licensing, or long-term commitments.

New Amazon QuickSight API Capabilities to Accelerate Your BI Transformation
New QuickSight API capabilities allow programmatic creation and management of dashboards, analysis, and templates.

New AWS Glue 4.0 — New and Updated Engines, More Data Formats, and More
This version of Glue includes Python 3.10 and Apache Spark 3.3.0, plus native support for the Cloud Shuffle Service Plugin for Spark. It also includes Pandas support, and more.

Announcing AWS Glue for Ray (Preview)
Data engineers can use AWS Glue for Ray to process large datasets with Python and popular Python libraries.

New for Amazon Transcribe — Real-Time Analytics During Live Calls
Real-time call analytics provides APIs for developers to accurately transcribe live calls and at the same time identify customer experience issues and sentiment in real time.

Data sharing

This feature enables data subscribers to access third-party data files directly from data providers’ Amazon Simple Storage Service (Amazon S3) buckets

This feature enables customer who is currently using Snowflake to store analytics data to offer this data to clients who are using Amazon Redshift via AWS Data Exchange.

Data

Database

New — Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS
Trusted Language Extensions for PostgreSQL provides database administrators control over who can install extensions and a permissions model for running them, letting application developers deliver new functionality as soon as they determine an extension meets their needs.

The open source repo contains sample for SQL, PL/pgSQL, JavaScript, Perl

Announcing Amazon DocumentDB Elastic Clusters
Elastic Clusters simplifies how customers interact with Amazon DocumentDB by automatically managing the underlying infrastructure and removing the need to create, remove, upgrade, or scale instances.

New — Amazon RDS Optimized Reads and Optimized Writes
These two new features will accelerate your Amazon RDS for MySQL workloads.

New — Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS
This new feature for Amazon Aurora with MySQL compatibility, Amazon RDS for MySQL, and Amazon RDS for MariaDB, enables you to make database updates safer, simpler, and faster. In as fast as a minute, you can promote the staging environment to be the new production environment with no data loss. During switchover, Blue/Green Deployments blocks writes on blue and green environments so that the green catches up with the blue, ensuring no data loss. Then, Blue/Green Deployments redirects production traffic to the newly promoted staging environment, all without any code changes to your application.

Storage

New — Failover Controls for Amazon S3 Multi-Region Access Points
These controls let you shift S3 data access request traffic routed through an Amazon S3 Multi-Region Access Point to an alternate AWS Region within minutes to test and build highly available applications for business continuity.

The existing Multi-Region Access Point model treats all of the Regions as active and can send traffic to any of them. The model that we are introducing today lets you designate Regions as either active or passive. Buckets in active Regions receive traffic (GET, PUT, and other requests) from the Multi-Region Access Point, buckets in passive Regions don’t (so proving Multi-Region active-active is not ready today (“write to primary region” model is always suggested by Amazon architect)?).

New — Announcing Amazon EFS Elastic Throughput
This new throughput mode is designed to provide your applications with as much throughput as they need with pay-as-you-use pricing.

New for AWS Backup — Protect and Restore Your CloudFormation Stacks
You now have an automated solution to create and restore your applications with a simplified experience, eliminating the need to manage custom scripts.

New — Amazon Redshift Support in AWS Backup
AWS Backup allows you to define a central backup policy to manage data protection of your applications and can now also protect your Amazon Redshift clusters.

Announcing Automated in-AWS Failback for AWS Elastic Disaster Recovery
The new automated support provides a simplified and expedited experience to fail back Amazon Elastic Compute Cloud (Amazon EC2) instances to the original Region, and both failover and failback processes (for on-premises or in-AWS recovery) can be conveniently started from the AWS Management Console. AWS Elastic Disaster Recovery (DRS) continuously replicates server-hosted applications and server- hosted databases from any source into AWS using block-level replication of the underlying server.

Infrastructure

Compute

New AWS SimSpace Weaver–Run Large-Scale Spatial Simulations in the Cloud
With SimSpace Weaver, you can run simulations at scale across multiple Amazon EC2 instances. It supports simulating upwards of a million independent and dynamic entities. SimSpace Weaver app SDK is provided and SimSpace Weaver manages partitions of the simulation state.

New — Accelerate Your Lambda Functions with Lambda SnapStart
Enabling Lambda SnapStart for Java functions can make them start up to 10x faster, at no extra cost.

New — ENA Express: Improved Network Latency and Per-Flow Performance on EC2
Jeff Barr shares how ENA/Elastic Network Adapter Express gives you a lot more per-flow bandwidth with a lot less variability. ENA Express reduces P99 latency of traffic flows by up to 50% and P99.9 latency by up to 85% (in comparison to TCP), while also increasing the maximum single-flow bandwidth from 5 Gbps to 25 Gbps.

New General Purpose, Compute Optimized, and Memory-Optimized Amazon EC2 Instances with Higher Packet-Processing Performance
The new instance families are designed to support your data-intensive workloads with the highest EBS performance in EC2, and the ability to handle up to twice as many packets per second (PPS) as earlier instances.

New Amazon EC2 Instance Types In the Works — C7gn, R7iz, and Hpc7g
Jeff Barr provides a look at three upcoming and exciting new instance types: C7gn Instances are designed for your most demanding network-intensive workloads; Hpc7g Instances, powered by AWS Graviton3E processors, are designed to give you the best price/performance for tightly coupled compute-intensive HPC and distributed computing workloads; R7iz Instances, with high performance and DDR5 memory, are for Electronic Design Automation (EDA), financial, actuarial, and simulation workloads.

New — Amazon ECS Service Connect Enables Easy Communication Between Microservices
This new capability simplifies building and operating resilient distributed applications. You can add a layer of resilience to your ECS service communication and get traffic insights with no changes to your application code.

AWS Announces Amazon EC2 Inf2 Instances (Preview)
These new instances are designed to deliver high performance at the lowest cost in Amazon EC2 for the most demanding deep learning (DL) inference applications.

Announcing the availability of Microsoft Office Amazon Machine Images (AMIs) on Amazon EC2 with AWS provided licenses
With this offering, customers have the flexibility to run Microsoft Office dependent applications on EC2.

Containers

New — AWS Marketplace for Containers Now Supports Direct Deployment to Amazon EKS Clusters
This new launch makes it easier for you to find third-party Kubernetes operation software from the Amazon EKS console and deploy it to your EKS clusters using the same commands used to deploy EKS add-ons.

Management Tools

New — AWS Config Rules Now Support Proactive Compliance
This release extends AWS Config rules to support proactive mode so that they can be run at any time before provisioning and save time spent to implement custom pre-deployment validations.

New for AWS Control Tower — Comprehensive Controls Management (Preview)
You can use the new capability to apply managed preventative, detective, and proactive controls to accounts and organizational units by service, control objective, or compliance framework.

Protect Sensitive Data with Amazon CloudWatch Logs
This new set of capabilities for Amazon CloudWatch Logs leverages pattern matching and machine learning (ML) to detect and protect sensitive log data in transit.

New — Amazon CloudWatch Cross-Account Observability
This new capability lets you search, analyze, and correlate cross-account telemetry data stored in CloudWatch such as metrics, logs, and traces.

Amazon CloudWatch Internet Monitor Provides End-to-End Visibility into Internet Performance for your Applications (Preview)
This new capability gives visibility into how an internet issue might impact the performance and availability of your applications. It allows you to reduce the time it takes to diagnose internet issues from days to minutes.

AWS Local Zones Now Available in Four New Metro Areas

AWS Local Zones, like Azure Edge Zones, bring AWS-managed data center closer to customer, in metro areas.

Developer tools/DevOps

Developer tools

Introducing AWS Application Composer (Preview)
AWS Application Composer helps developers simplify and accelerate architecting, configuring, and building serverless applications. You can drag, drop, and connect AWS services into an application architecture by using AWS Application Composer’s browser-based visual canvas.

Announcing Amazon CodeCatalyst, a Unified Software Development Service (Preview)
Amazon CodeCatalyst enables software development teams to quickly and easily plan, develop, collaborate on, build, and deliver applications on AWS, reducing friction throughout the development lifecycle. Create Dev environment with Cloud9, similar to Azure DevTest Labs.

Application Integration

New — Create Point-to-Point Integrations Between Event Producers and Consumers with Amazon EventBridge Pipes
With Amazon EventBridge Pipes, you can integrate supported AWS and self-managed services as event producers and event consumers into your application in a simple, reliable, consistent, and cost-effective way.

Step Functions Distributed Map — A Serverless Solution for Large-Scale Parallel Data Processing
The new distributed map state can launch up to ten thousand parallel workflows to process data.

The new distributed map state allows you to write Step Functions to coordinate large-scale parallel workloads within your serverless applications. You can now iterate over millions of objects such as logs, images, or .csv files stored in Amazon Simple Storage Service (Amazon S3). The new distributed map state can launch up to ten thousand parallel workflows to process data.

Security, Identity & Compliance

Announcing AWS KMS External Key Store (XKS)
This new capability allows you to store AWS KMS customer managed keys on a hardware security module (HSM) that you operate on premises or at any location of your choice.

Amazon Inspector Now Scans AWS Lambda Functions for Vulnerabilities
Until now, customers who wanted to analyze their mixed workloads (including EC2 instances, container images, and Lambda functions) against common vulnerabilities needed to use AWS and third-party tools.

Automated Data Discovery for Amazon Macie
This new capability allows you to gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets.

AWS announces Amazon Verified Permissions (Preview)
This central fine-grained permissions management system simplifies changing and updating permission rules in a single place without needing to change the code. It serves authorization RBAC and ABAC use cases, with a policy language called CEDAR. Would be interested in comparison with OPA.

permit(
principal == User::"John",
action == Action::"view",
resource
)
when {
resource in Folder::"John's Stuff" &&
context.authenticated == true
};

Industry

Amazon Connect — New ML-Powered Capabilities for Forecasting, Capacity Planning, Scheduling, and Agent Empowerment

Amazon Connect is a contact center as a service (CCaS) solution. With Amazon Connect forecasting, capacity planning, and scheduling, clients can reliably hit service-level targets and gracefully navigate fluctuations in customer demand.

Introducing Amazon Omics — A Purpose-Built Service to Store, Query, and Analyze Genomic and Biological Data at Scale

Part of Health AI service portofolio (Amazon HealthLake, Amazon Comprehend Medical, Amazon Transcribe Medical), it contains

  • Omics-optimized object storage that helps customers store and share their data efficiently and at low cost and also convert not-optimized VCF to optimized format (Apache Parquet).
  • Managed compute for bioinformatics workflows that allows customers to run the exact analysis they specify, without worrying about provisioning underlying infrastructure, supporting common workflow WDL or Nextflow commonly used in Genomics industry.
  • Optimized data stores for population-scale variant analysis.

Announcing AWS Supply Chain (Preview)

AWS Supply Chain is a cloud application that unifies data and provides machine learning (ML)–powered actionable insights, built-in contextual collaboration, and demand planning. AWS Supply Chain connects to your existing enterprise resource planning (ERP) and supply chain management systems, without replatforming, up-front licensing fees, or long-term contracts.

Not sure how is compared against Microsoft Dynamics supply chain insights announced in Ignite 2021. With Microsoft investment with OpenAI, we should see more AI-empowered services.

Appendix

--

--

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet