Data Connections

Data Connections in Red Hat OpenShift AI provide secure access to external data sources and model registries. These connections enable workbenches, model serving, and pipelines to access S3-compatible object storage, model files on persistent volumes, and container registries without embedding credentials directly in your code.

Overview

A data connection is a Kubernetes Secret with the label opendatahub.io/dashboard: 'true' and specific annotations that define the connection type. OpenShift AI recognizes these secrets and makes them available through the dashboard for use in:

Jupyter workbenches (as environment variables)
Model serving deployments
Data science pipelines
Direct API access for custom applications

Connection Types

OpenShift AI supports two primary connection types:

S3 Connections (opendatahub.io/connection-type: s3)
- For S3-compatible object storage (AWS S3, MinIO, OpenShift Data Foundation)
- Used for datasets, model artifacts, pipeline storage, and model serving
- Provides AWS-style credentials
URI Connections (opendatahub.io/connection-type-ref: uri-v1)
- For storing model URIs that can be copied to InferenceService configurations
- Supports PVC paths (pvc://), OCI registries (oci://), HTTP/HTTPS URLs
- Provides a convenient way to manage and organize model locations
- URI values are manually copied to storageUri in InferenceServices

Required Labels and Annotations

All data connections must include:

Label: opendatahub.io/dashboard: 'true' - Makes the connection visible in the dashboard
Annotation: openshift.io/display-name - Human-readable name shown in the UI
Connection type annotation - Defines how the connection is interpreted

Creating Data Connections

Method 1: Declarative (Using YAML)

The declarative approach is recommended for version control, automation, and reproducible deployments.

Basic S3 Connection

# data-connection-s3-basic.yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-s3-storage
  namespace: my-project
  labels:
    # Required: Makes connection available in OpenShift AI
    opendatahub.io/dashboard: 'true'
    # Optional: Indicates this is managed by the dashboard
    opendatahub.io/managed: 'true'
  annotations:
    # Required: Specifies this is an S3 connection
    opendatahub.io/connection-type: s3
    # Required: Display name in the dashboard
    openshift.io/display-name: My S3 Storage
type: Opaque
stringData:
  # Required: S3 access key ID
  AWS_ACCESS_KEY_ID: my-access-key
  # Required: S3 secret access key  
  AWS_SECRET_ACCESS_KEY: my-secret-key
  # Required: S3 endpoint URL
  AWS_S3_ENDPOINT: https://s3.amazonaws.com
  # Required: S3 bucket name
  AWS_S3_BUCKET: my-bucket
  # Optional: AWS region (defaults to us-east-1)
  AWS_DEFAULT_REGION: us-west-2

Apply the connection:

kubectl apply -f data-connection-s3-basic.yaml

Standard S3 Connection with Custom Endpoint

# data-connection-s3-minio.yaml
apiVersion: v1
kind: Secret
metadata:
  name: minio-storage
  namespace: my-project
  labels:
    opendatahub.io/dashboard: 'true'
    opendatahub.io/managed: 'true'
  annotations:
    opendatahub.io/connection-type: s3
    openshift.io/display-name: MinIO Storage
    # Optional: Additional description
    openshift.io/description: 'Local MinIO instance for development'
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: minio
  AWS_SECRET_ACCESS_KEY: minio123
  # Custom endpoint for MinIO
  AWS_S3_ENDPOINT: https://minio-service.minio-namespace.svc.cluster.local:9000
  AWS_S3_BUCKET: ml-datasets
  AWS_DEFAULT_REGION: us-east-1

Basic URI Connection for PVC

# data-connection-uri-pvc.yaml
apiVersion: v1
kind: Secret
metadata:
  name: model-pvc-connection
  namespace: my-project
  labels:
    opendatahub.io/dashboard: 'true'
  annotations:
    # Required: Specifies URI connection type
    opendatahub.io/connection-type-ref: uri-v1
    openshift.io/display-name: Model PVC Storage
    openshift.io/description: 'Points to models stored on PVC'
type: Opaque
stringData:
  # PVC URI format: pvc://<mount-path>/<model-path>
  URI: pvc://models/llama-2-7b-chat

URI Connection for OCI Registry

# data-connection-uri-oci.yaml
apiVersion: v1
kind: Secret
metadata:
  name: model-registry-connection
  namespace: my-project
  labels:
    opendatahub.io/dashboard: 'true'
  annotations:
    opendatahub.io/connection-type-ref: uri-v1
    openshift.io/display-name: Red Hat Model Registry
type: Opaque
stringData:
  # OCI URI format for container registries
  URI: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5

Method 2: Imperative (Using Commands)

Create S3 Connection

# Create secret with S3 credentials
kubectl create secret generic my-s3-connection \
  --from-literal=AWS_ACCESS_KEY_ID=my-key \
  --from-literal=AWS_SECRET_ACCESS_KEY=my-secret \
  --from-literal=AWS_S3_ENDPOINT=https://s3.amazonaws.com \
  --from-literal=AWS_S3_BUCKET=my-bucket \
  --from-literal=AWS_DEFAULT_REGION=us-east-1 \
  -n my-project

# Add required labels and annotations
kubectl label secret my-s3-connection \
  opendatahub.io/dashboard=true \
  opendatahub.io/managed=true \
  -n my-project

kubectl annotate secret my-s3-connection \
  opendatahub.io/connection-type=s3 \
  openshift.io/display-name="My S3 Connection" \
  -n my-project

Create URI Connection

# Create secret with URI
kubectl create secret generic model-uri-connection \
  --from-literal=URI=pvc://models/my-model \
  -n my-project

# Add required labels and annotations
kubectl label secret model-uri-connection \
  opendatahub.io/dashboard=true \
  -n my-project

kubectl annotate secret model-uri-connection \
  opendatahub.io/connection-type-ref=uri-v1 \
  openshift.io/display-name="Model URI Connection" \
  -n my-project

Listing and Viewing Data Connections

List All Data Connections

# List all data connections in current namespace
kubectl get secrets -l opendatahub.io/dashboard=true

# List with more details
kubectl get secrets -l opendatahub.io/dashboard=true \
  -o custom-columns=NAME:.metadata.name,TYPE:.metadata.annotations.opendatahub\.io/connection-type,DISPLAY:.metadata.annotations.openshift\.io/display-name

# List across all namespaces
kubectl get secrets --all-namespaces -l opendatahub.io/dashboard=true

View Specific Connection

# View connection details (without showing secret data)
kubectl describe secret my-s3-connection

# View connection with decoded data (be careful with credentials)
kubectl get secret my-s3-connection -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d

# View as YAML
kubectl get secret my-s3-connection -o yaml

Filter Connections by Type

# List only S3 connections
kubectl get secrets -l opendatahub.io/dashboard=true \
  -o json | jq -r '.items[] | select(.metadata.annotations."opendatahub.io/connection-type" == "s3") | .metadata.name'

# List only URI connections
kubectl get secrets -l opendatahub.io/dashboard=true \
  -o json | jq -r '.items[] | select(.metadata.annotations."opendatahub.io/connection-type-ref" == "uri-v1") | .metadata.name'

Updating Data Connections

Using kubectl apply (Declarative)

Update your YAML file and reapply:

kubectl apply -f data-connection-s3-basic.yaml

Using kubectl edit (Interactive)

# Edit connection interactively
kubectl edit secret my-s3-connection

Using kubectl patch

Update S3 Credentials

# Update S3 credentials (base64 encode the values)
kubectl patch secret my-s3-connection --type='json' -p='[
  {"op": "replace", "path": "/data/AWS_ACCESS_KEY_ID", "value": "'$(echo -n "new-key" | base64)'"},
  {"op": "replace", "path": "/data/AWS_SECRET_ACCESS_KEY", "value": "'$(echo -n "new-secret" | base64)'"}
]'

Update Display Name

# Update display name annotation
kubectl annotate secret my-s3-connection \
  openshift.io/display-name="Production S3 Storage" \
  --overwrite

Add or Update S3 Endpoint

# Add new field or update existing
kubectl patch secret my-s3-connection --type='json' -p='[
  {"op": "add", "path": "/data/AWS_S3_ENDPOINT", "value": "'$(echo -n "https://new-endpoint.com" | base64)'"}
]'

Deleting Data Connections

Basic Deletion

# Delete a specific connection
kubectl delete secret my-s3-connection

# Delete multiple connections
kubectl delete secret my-s3-connection another-connection

# Delete by label selector
kubectl delete secrets -l opendatahub.io/dashboard=true,environment=dev

Important Notes on Deletion

WARNING: Before deleting a data connection:

Ensure no workbenches are using the connection
Check that no model deployments reference the connection
Verify no pipelines depend on the connection
Consider backing up the connection configuration

Check Connection Usage

# Check if any pods are using the S3 connection as environment variables
kubectl get pods -o json | jq -r '.items[] | select(.spec.containers[].envFrom[]?.secretRef.name == "my-s3-connection") | .metadata.name'

# Check if any InferenceServices reference S3 connections
kubectl get inferenceservices -o json | jq -r '.items[] | select(.spec.predictor.model.storage.key == "my-s3-connection") | .metadata.name'

# Note: URI connections are not directly referenced by InferenceServices
# Their values are manually copied to storageUri fields

Practical Examples

Example 1: Create S3 Connection for Dataset Storage

# dataset-storage-connection.yaml
apiVersion: v1
kind: Secret
metadata:
  name: dataset-storage
  namespace: fraud-detection-project
  labels:
    opendatahub.io/dashboard: 'true'
    opendatahub.io/managed: 'true'
    purpose: dataset-storage
  annotations:
    opendatahub.io/connection-type: s3
    openshift.io/display-name: Fraud Detection Datasets
    openshift.io/description: 'S3 bucket containing fraud detection training data'
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: fraud-detection-key
  AWS_SECRET_ACCESS_KEY: fraud-detection-secret
  AWS_S3_ENDPOINT: https://s3.us-west-2.amazonaws.com
  AWS_S3_BUCKET: fraud-detection-datasets
  AWS_DEFAULT_REGION: us-west-2
---
# Usage in a workbench pod
apiVersion: v1
kind: Pod
metadata:
  name: data-prep-workbench
  namespace: fraud-detection-project
spec:
  containers:
  - name: jupyter
    image: quay.io/modh/odh-minimal-notebook-container:v2-2023a
    envFrom:
    # Inject all S3 credentials as environment variables
    - secretRef:
        name: dataset-storage
    env:
    # Override specific values if needed
    - name: DATA_BUCKET
      value: fraud-detection-datasets/processed

Example 2: Create URI Connection for Model Serving

# model-uri-connection.yaml
apiVersion: v1
kind: Secret
metadata:
  name: granite-model-uri
  namespace: nlp-project
  labels:
    opendatahub.io/dashboard: 'true'
    model-type: llm
    model-family: granite
  annotations:
    opendatahub.io/connection-type-ref: uri-v1
    openshift.io/display-name: Granite 3.1 8B Model
    openshift.io/description: 'Red Hat Granite model from registry'
type: Opaque
stringData:
  URI: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
---
# Use in InferenceService - copy URI value to storageUri
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: granite-deployment
  namespace: nlp-project
spec:
  predictor:
    model:
      modelFormat:
        name: vLLM
      runtime: granite-runtime
      # Copy the URI value directly (not a reference)
      storageUri: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5

Example 3: Pipeline Artifacts Connection

# pipeline-artifacts-connection.yaml
apiVersion: v1
kind: Secret
metadata:
  name: pipeline-artifacts
  namespace: ml-pipelines
  labels:
    opendatahub.io/dashboard: 'true'
    opendatahub.io/managed: 'true'
    component: data-science-pipelines
  annotations:
    opendatahub.io/connection-type: s3
    openshift.io/display-name: ML Pipeline Artifacts
    openshift.io/description: 'Storage for Kubeflow pipeline artifacts and metadata'
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: pipeline-access
  AWS_SECRET_ACCESS_KEY: pipeline-secret-123
  AWS_S3_ENDPOINT: https://minio-service.minio.svc.cluster.local:9000
  AWS_S3_BUCKET: mlpipeline
  AWS_DEFAULT_REGION: us-east-1
  # Additional fields for pipeline configuration
  AWS_S3_USE_PATH_STYLE: 'true'
  AWS_S3_VERIFY_SSL: 'false'  # For self-signed certificates

Verification and Troubleshooting

Verify Connection Visibility

# Check if connection appears in dashboard list
kubectl get secret my-s3-connection -o jsonpath='{.metadata.labels.opendatahub\.io/dashboard}'

# Verify required annotations
kubectl get secret my-s3-connection -o jsonpath='{.metadata.annotations}' | jq

Test S3 Connection

# Create a test pod with AWS CLI
kubectl run s3-test --rm -i --tty --image=amazon/aws-cli:latest \
  --env-from=secret/my-s3-connection \
  --command -- /bin/bash

# Inside the pod, test the connection
aws s3 ls s3://$AWS_S3_BUCKET --endpoint-url $AWS_S3_ENDPOINT

Common Issues and Solutions

Connection Not Visible in Dashboard

# Check label is exactly 'true' (string, not boolean)
kubectl get secret my-s3-connection -o yaml | grep "opendatahub.io/dashboard"

# Fix incorrect label
kubectl label secret my-s3-connection opendatahub.io/dashboard=true --overwrite

Invalid Credentials Error

# Verify credentials are base64 encoded correctly
kubectl get secret my-s3-connection -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d

# Re-encode and update if needed
echo -n "correct-access-key" | base64
# Then update using kubectl patch or edit

Connection Type Not Recognized

# Ensure annotation uses correct key and value
kubectl annotate secret my-s3-connection \
  opendatahub.io/connection-type=s3 --overwrite

# For URI connections, use different annotation
kubectl annotate secret my-uri-connection \
  opendatahub.io/connection-type-ref=uri-v1 --overwrite

Debugging Connection Usage

# Find all resources using a specific connection
kubectl get all -o json | jq -r '.. | objects | select(.secretRef?.name == "my-s3-connection" or .valueFrom?.secretKeyRef?.name == "my-s3-connection") | "\(.kind)/\(.metadata.name)"' 2>/dev/null | sort -u

Best Practices

Secure Credential Management

Never commit credentials to version control

# Use environment variables or external secret management
stringData:
  AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY}  # Set via CI/CD
  AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY}

Use separate connections per environment

# Development
kubectl apply -f connections/dev-s3-connection.yaml -n dev-project
   
# Production  
kubectl apply -f connections/prod-s3-connection.yaml -n prod-project

Implement least-privilege access
- Create S3 policies that only grant necessary permissions
- Use separate credentials for different purposes

Naming Conventions

Use descriptive names indicating purpose
- Good: training-data-s3, model-artifacts-storage
- Avoid: s3-connection-1, my-connection
Include environment in name
- dev-dataset-storage
- prod-model-registry
Use consistent prefixes for grouping
- dataset- (for training/test datasets)
- model- (for model storage and serving)
- experiment- (for experiment tracking)

When to Use Each Connection Type

Use S3 Connections for:

Training datasets
Model artifacts and checkpoints
Model serving (loading models from S3 buckets)
Pipeline intermediate results
Experiment tracking data
General file storage

Use URI Connections for:

Storing and organizing model URIs in the OpenShift AI dashboard
Managing references to models in PVCs (pvc://)
Managing references to models in container registries (oci://)
Creating a catalog of available models for easy reference

Using URI Connections

Unlike S3 connections which provide credentials, URI connections simply store URI values that can be:

Viewed in the OpenShift AI dashboard for reference
Manually copied into InferenceService storageUri fields
Used as documentation for available model locations

Note: URI connections are not automatically injected or referenced by InferenceServices. The URI value must be manually copied to the appropriate configuration.

Integration with Workbenches

Environment variable injection ```yaml
Workbench automatically receives all secret fields as env vars

envFrom:
- secretRef: name: my-s3-connection ```
Custom environment mapping ```yaml
Map to specific variable names

env:
- name: TRAINING_DATA_BUCKET valueFrom: secretKeyRef: name: dataset-storage key: AWS_S3_BUCKET ```

Field Reference

S3 Connection Fields

Field	Type	Required	Description	Example
AWS_ACCESS_KEY_ID	string	Yes	S3 access key identifier	`AKIAIOSFODNN7EXAMPLE`
AWS_SECRET_ACCESS_KEY	string	Yes	S3 secret access key	`wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`
AWS_S3_ENDPOINT	string	Yes	S3 endpoint URL	`https://s3.amazonaws.com`
AWS_S3_BUCKET	string	Yes	S3 bucket name	`my-ml-datasets`
AWS_DEFAULT_REGION	string	No	AWS region (default: us-east-1)	`us-west-2`
AWS_S3_USE_PATH_STYLE	string	No	Use path-style URLs	`true` or `false`
AWS_S3_VERIFY_SSL	string	No	Verify SSL certificates	`true` or `false`

URI Connection Fields

Field	Type	Required	Description	Example
URI	string	Yes	Model or resource URI	`pvc://models/llama-2-7b`

Common Annotations

Annotation	Type	Required	Description	Example
opendatahub.io/connection-type	string	For S3	Specifies S3 connection	`s3`
opendatahub.io/connection-type-ref	string	For URI	Specifies URI connection	`uri-v1`
openshift.io/display-name	string	Yes	Human-readable name	`Production Dataset Storage`
openshift.io/description	string	No	Extended description	`S3 bucket for production ML datasets`

Using with Kubernetes MCP Server

The MCP Kubernetes server provides tools for managing secrets but has limitations for data connections.

MCP Tool Mapping

Operation	MCP Tool	Notes
Create Secret	`resources_create_or_update`	Requires base64 encoding
List Secrets	`resources_list`	Can filter by labels
Get Secret	`resources_get`	Returns base64 encoded data
Update Secret	`resources_create_or_update`	Full replacement
Delete Secret	`resources_delete`	Standard deletion

Creating Data Connections with MCP

# Note: stringData is not supported, must use base64 encoded data
apiVersion: v1
kind: Secret
metadata:
  name: mcp-s3-connection
  namespace: my-project
  labels:
    opendatahub.io/dashboard: 'true'
  annotations:
    opendatahub.io/connection-type: s3
    openshift.io/display-name: MCP S3 Connection
type: Opaque
data:
  # Values must be base64 encoded
  AWS_ACCESS_KEY_ID: bXktYWNjZXNzLWtleQ==
  AWS_SECRET_ACCESS_KEY: bXktc2VjcmV0LWtleQ==
  AWS_S3_ENDPOINT: aHR0cHM6Ly9zMy5hbWF6b25hd3MuY29t
  AWS_S3_BUCKET: bXktYnVja2V0
  AWS_DEFAULT_REGION: dXMtZWFzdC0x

Listing Data Connections with MCP

Use resources_list with label selector:

apiVersion: v1
kind: Secret
labelSelector: opendatahub.io/dashboard=true
namespace: my-project

Getting a Specific Connection

Use resources_get:

apiVersion: v1
kind: Secret
name: my-s3-connection
namespace: my-project

Updating Connections with MCP

MCP requires full resource replacement:

Get current connection with resources_get
Modify the YAML
Apply with resources_create_or_update

Deleting Connections with MCP

Use resources_delete:

apiVersion: v1
kind: Secret
name: my-s3-connection
namespace: my-project

MCP Limitations

No stringData support - Must base64 encode all values manually
No patch operations - Must replace entire resource
No imperative commands - Cannot use simple create/label/annotate workflow
Limited filtering - Basic label selection only

Best Practices for MCP

Prepare base64 values externally
```
echo -n "my-access-key" | base64
```
Use templates for consistency
- Create template YAML files
- Replace only the base64 encoded values
Verify with native kubectl
- After MCP operations, verify with kubectl
- Check dashboard visibility