Skip to main content

Standardizing AWS Bedrock with a LiteLLM Gateway

·1341 words·7 mins
Khalid Rizvi
Author
Khalid Rizvi
Where Legacy Meets GenAI

This architecture is designed for organizations that have standardized on AWS Bedrock as their primary source of foundation models but require a more robust control plane than what AWS provides natively.

litellm_aws_bedrock

The Problem this Solves:

  1. Non-Standard API: The native AWS Bedrock API is not compliant with the OpenAI API, which has become the industry standard. This forces every development team to learn and implement the specific Bedrock SDK, increasing onboarding time and creating code that is not portable.
  2. Coarse-Grained Cost Control: While AWS provides cost tracking, it is often difficult to attribute LLM spending to a specific team, project, or user within a shared AWS account. Enforcing hard budget limits per-team is not a built-in Bedrock feature.
  3. Lack of a Central Governance Point: Without a gateway, each application is responsible for its own logic for things like request caching, retries, and guardrails, leading to duplicated effort and inconsistent policy enforcement.
  4. Future Vendor Lock-in: Committing all applications directly to the Bedrock SDK makes it difficult and expensive to introduce models from other providers (like a new state-of-the-art model from OpenAI or Google) in the future.

This architecture positions LiteLLM as the single point of entry for all LLM consumption, providing a standardized interface and centralized control while leveraging the security and scalability of AWS Bedrock as the backend.

The Fine Print: Architectural and Configuration Details
#

Implementing this solution correctly requires attention to several key areas, particularly IAM security and LiteLLM configuration.

1. EKS Deployment & Networking

  • Deployment: LiteLLM is deployed as a standard Kubernetes Deployment within your EKS cluster. A HorizontalPodAutoscaler (HPA) should be used to automatically scale the LiteLLM pods based on CPU or memory utilization.
  • Exposure: The LiteLLM service is exposed to other applications within your VPC via a Kubernetes Service of type ClusterIP. For external access, an AWS Load Balancer Controller (Ingress) is the standard practice, which provisions an Application Load Balancer (ALB) to route traffic to the LiteLLM service.

2. IAM Security: The Crucial Component You must never hardcode AWS credentials in your LiteLLM configuration or container images. The correct and secure method is to use IAM Roles for Service Accounts (IRSA).

  • How it Works: You create an IAM Role with a trust policy that allows a specific Kubernetes Service Account from your EKS cluster to assume it.
  • Permissions: This IAM Role is granted a single, least-privilege policy: bedrock:InvokeModel on the specific model ARNs you intend to use.
  • Execution: When a LiteLLM pod starts, it is assigned this Kubernetes Service Account. The AWS SDK within the pod automatically acquires temporary credentials by assuming the associated IAM Role. This is seamless and highly secure.

3. LiteLLM Configuration (config.yaml) Your LiteLLM configuration file will define the Bedrock models you want to expose. The critical detail is instructing LiteLLM to use the AWS SDK’s default credential provider chain (which will find the IRSA credentials) instead of static keys.

# litellm/config.yaml

model_list:
  - model_name: claude-3-sonnet          # The name your developers will use in their API call.
    litellm_params:
      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0  # The specific Bedrock model identifier.
      # Because we use IRSA, we explicitly set keys to None.
      # LiteLLM will then rely on the environment's AWS SDK credentials.
      aws_access_key_id: None
      aws_secret_access_key: None
      aws_region_name: "us-east-1"      # Specify your Bedrock region.

  - model_name: llama-3-70b
    litellm_params:
      model: bedrock/meta.llama3-70b-instruct-v1:0
      aws_access_key_id: None
      aws_secret_access_key: None
      aws_region_name: "us-east-1"

# General settings for the proxy
litellm_settings:
  # Disabling master key for simplicity, but a secret should be used in production.
  master_key: "sk-1234"

4. The Client Request Flow

  • Your client applications are configured with the DNS address of the ALB Ingress.
  • They are issued a LiteLLM Virtual API Key (e.g., sk-lite-team-alpha-key) created in the LiteLLM UI. This key is used for budgeting and access control within LiteLLM.
  • All requests are formatted as standard OpenAI API calls.

Here is an example curl command from a developer’s machine:

curl https://litellm.your-company.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-lite-team-alpha-key" \
  -d '{
    "model": "claude-3-sonnet",
    "messages": [
      {
        "role": "user",
        "content": "Explain the significance of the Strunk and White writing style."
      }
    ]
  }'

Notice the developer uses the simple model name claude-3-sonnet and standard OpenAI formatting. They have no awareness that the backend is AWS Bedrock; LiteLLM handles the entire translation and authentication process.

Architectural Diagram
#

This Mermaid diagram illustrates the complete architecture, showing the flow of requests and the critical IAM security integration.

flowchart TD
  subgraph External_User
    Client[Service]
  end

  subgraph AWS_Cloud
    DNS[Route 53 DNS]
    ALB[Application Load Balancer]
    subgraph EKS_Cluster
      Ingress[Kubernetes Ingress]
      K8sService[LiteLLM Service]
      PodGroup[LiteLLM Proxy Pods]
      KSA[Kubernetes Service Account]
    end
    subgraph IAM
      BedrockRole[IAM Role bedrock InvokeModel]
    end
    Bedrock[AWS Bedrock]
  end

  Client <--> DNS
  DNS <--> ALB
  ALB <--> Ingress
  Ingress <--> K8sService
  K8sService <--> PodGroup
  PodGroup <--> KSA
  KSA <--> BedrockRole
  BedrockRole <--> PodGroup
  PodGroup <--> Bedrock

Complete Explanation of the Use Case and Architecture
#

This refined diagram illustrates how LiteLLM can act as an enterprise governance and standardization layer for AWS Bedrock. This pattern is particularly valuable for organizations that commit to AWS Bedrock as their foundation model source but require additional control, standardized interfaces, and granular cost management.

The Strategic Advantage:

By introducing LiteLLM, an organization achieves several critical objectives:

  1. OpenAI API Standardization: Developers interact solely with the industry-standard OpenAI API format, regardless of the underlying Bedrock model (Claude, Llama, Titan). This simplifies development, reduces onboarding time, and makes applications inherently more portable if a different LLM provider is ever needed.
  2. Granular Governance & Cost Control: LiteLLM’s proxy provides a centralized point to apply per-team or per-user budgets, rate limits, and virtual API keys. This enables precise cost attribution and prevents unauthorized or excessive LLM usage, which is often challenging with native cloud billing.
  3. Centralized Observability & Features: All LLM calls flow through LiteLLM, allowing for unified logging, metrics collection, caching, and custom guardrails before requests hit Bedrock. This consistency reduces boilerplate code across applications and ensures policies are uniformly applied.
  4. Enhanced Security with IRSA: Leveraging IAM Roles for Service Accounts (IRSA) provides a highly secure mechanism for LiteLLM to authenticate with Bedrock without hardcoding sensitive AWS credentials, adhering to the principle of least privilege.

Detailed Architectural Flow:

  1. Client Request (OpenAI Format): A developer or client application initiates an HTTPS request to an endpoint (e.g., litellm.your-company.com). This request adheres to the familiar OpenAI API specification and includes a LiteLLM virtual API key for authentication and billing.
  2. DNS Resolution: Amazon Route 53 resolves the custom domain (litellm.your-company.com) to the Application Load Balancer.
  3. Load Balancing: The Application Load Balancer (ALB), managed by the AWS Load Balancer Controller in EKS, receives the request and forwards it to the Kubernetes Ingress.
  4. Kubernetes Ingress: The Ingress resource within EKS routes the incoming traffic to the appropriate Kubernetes Service for LiteLLM.
  5. LiteLLM Kubernetes Service: This ClusterIP service distributes requests across the healthy LiteLLM Proxy pods.
  6. LiteLLM Proxy Pods (EKS Deployment):
    • Authentication & Authorization: The LiteLLM proxy validates the incoming LiteLLM virtual API key. Based on its internal configuration (set via the LiteLLM UI or configuration files), it enforces budgets, rate limits, and access policies for the specific model requested.
    • IAM Authentication (IRSA): The LiteLLM pod, associated with a Kubernetes Service Account (KSA), assumes an IAM Role specifically designed for Bedrock access. This IAM Role for Bedrock Access has a policy granting bedrock:InvokeModel permissions. AWS temporarily grants credentials to the LiteLLM pod.
    • OpenAI to Bedrock Translation: Using these temporary credentials, LiteLLM takes the incoming OpenAI-formatted request and translates it into the specific AWS Bedrock API format required by the chosen foundation model (e.g., anthropic.claude-3-sonnet-20240229-v1:0).
    • Call to AWS Bedrock: LiteLLM makes the actual API call to the AWS Bedrock service.
  7. AWS Bedrock Service: Bedrock processes the request using the specified foundation model and returns its response.
  8. Bedrock to OpenAI Translation: LiteLLM receives the Bedrock-specific response and translates it back into the standardized OpenAI API format.
  9. Response Back to Client: The OpenAI-formatted response travels back through the Kubernetes Service, Ingress, ALB, and DNS to the original client application.

This architecture ensures that developers benefit from a consistent API, while the organization maintains robust control, security, and observability over all its LLM interactions, entirely within its AWS ecosystem.