Azure OpenAI Deployment Types and Resiliency - Understanding Models, Capacity, and High Availability
John Savill's Technical Training via YouTube
Overview
Syllabus
- Introduction
- Generative API is stateless
- Regional Azure OpenAI resource
- Capacity pools
- Responsible AI
- Model deployment types
- Standard
- Global
- Network vs inference latency
- Intelligent routing
- Quota vs available capacity
- Data zone and data residency
- Availability benefits?
- Resource is regional
- Multiple regional resources
- Enabling in the application
- API Management
- Prompt caching impact
- Provisioned service
- PayGo features
- PTU features
- Azure reservations
- Batch service
- Summary
- Close
Taught by
John Savill's Technical Training