Amazon data center Available

AWS Inferentia2

Inferentia ยท Inferentia 2nd Gen Architecture

AWS Inferentia2 is Amazon's second-generation inference chip, delivering 3x the compute performance and 4x the throughput of the original Inferentia. With 32GB HBM per accelerator and 190 TFLOPS of performance, it supports large language models and generative AI workloads. Inferentia2 powers EC2 Inf2 instances and delivers the lowest cost per inference on AWS.

Key Features

2 NeuronCores per chip 32GB HBM per accelerator 190 TFLOPS throughput EC2 Inf2 instances Dynamic batching support

Full Specifications

Compute

Architecture Inferentia2
NeuronCores 2
FP16 Performance 190 TFLOPS
BF16 Performance 190 TFLOPS

Memory

Memory Size 32 GB
Memory Type HBM
Memory Bandwidth 820 GB/s

Power & Physical

Form Factor Custom ASIC

Features & Connectivity

NVLink Support No
Multi-GPU Support Yes

Availability

MSRP (USD) Contact for pricing
Release Date Apr 2023
Status Available

Industries

Use Cases

LLM Inference Real-time Inference Natural Language Processing Computer Vision Inference Generative AI Serving

Interested in the AWS Inferentia2?

Get pricing, availability, and bulk discount information from our team.

Enquire Now

Related GPUs

Amazon data center

AWS Trainium2

Available View Specs
Amazon data center

AWS Trainium

Memory

512GB HBM (per node)

FP16

190 TFLOPS

Available View Specs
Amazon data center

AWS Inferentia

Memory

8GB DDR4

Available View Specs