Live Engine
Select Topic
mediumServerless Inference
A team's SageMaker Serverless Endpoint has
MemorySizeInMB=2048 and MaxConcurrency=10. Under sustained load of 8 concurrent requests, latency spikes from 200ms to 1,800ms. CloudWatch shows ConcurrentExecutions=8 (well below MaxConcurrency=10) and no throttling errors. What is the actual bottleneck?