Data Connections
Data Connections in Red Hat OpenShift AI provide secure access to external data sources and model registries. These connections enable workbenches, model serving, and pipelines to access S3-compatible object storage, model files on persistent volumes, and container registries without embedding credentials directly in your code.
Overview
A data connection is a Kubernetes Secret with the label opendatahub.io/dashboard: 'true'
and specific annotations that define the connection type. OpenShift AI recognizes these secrets and makes them available through the dashboard for use in:
- Jupyter workbenches (as environment variables)
- Model serving deployments
- Data science pipelines
- Direct API access for custom applications
Connection Types
OpenShift AI supports two primary connection types:
- S3 Connections (
opendatahub.io/connection-type: s3
)- For S3-compatible object storage (AWS S3, MinIO, OpenShift Data Foundation)
- Used for datasets, model artifacts, pipeline storage, and model serving
- Provides AWS-style credentials
- URI Connections (
opendatahub.io/connection-type-ref: uri-v1
)- For storing model URIs that can be copied to InferenceService configurations
- Supports PVC paths (
pvc://
), OCI registries (oci://
), HTTP/HTTPS URLs - Provides a convenient way to manage and organize model locations
- URI values are manually copied to
storageUri
in InferenceServices
Required Labels and Annotations
All data connections must include:
- Label:
opendatahub.io/dashboard: 'true'
- Makes the connection visible in the dashboard - Annotation:
openshift.io/display-name
- Human-readable name shown in the UI - Connection type annotation - Defines how the connection is interpreted
Creating Data Connections
Method 1: Declarative (Using YAML)
The declarative approach is recommended for version control, automation, and reproducible deployments.
Basic S3 Connection
# data-connection-s3-basic.yaml
apiVersion: v1
kind: Secret
metadata:
name: my-s3-storage
namespace: my-project
labels:
# Required: Makes connection available in OpenShift AI
opendatahub.io/dashboard: 'true'
# Optional: Indicates this is managed by the dashboard
opendatahub.io/managed: 'true'
annotations:
# Required: Specifies this is an S3 connection
opendatahub.io/connection-type: s3
# Required: Display name in the dashboard
openshift.io/display-name: My S3 Storage
type: Opaque
stringData:
# Required: S3 access key ID
AWS_ACCESS_KEY_ID: my-access-key
# Required: S3 secret access key
AWS_SECRET_ACCESS_KEY: my-secret-key
# Required: S3 endpoint URL
AWS_S3_ENDPOINT: https://s3.amazonaws.com
# Required: S3 bucket name
AWS_S3_BUCKET: my-bucket
# Optional: AWS region (defaults to us-east-1)
AWS_DEFAULT_REGION: us-west-2
Apply the connection:
kubectl apply -f data-connection-s3-basic.yaml
Standard S3 Connection with Custom Endpoint
# data-connection-s3-minio.yaml
apiVersion: v1
kind: Secret
metadata:
name: minio-storage
namespace: my-project
labels:
opendatahub.io/dashboard: 'true'
opendatahub.io/managed: 'true'
annotations:
opendatahub.io/connection-type: s3
openshift.io/display-name: MinIO Storage
# Optional: Additional description
openshift.io/description: 'Local MinIO instance for development'
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: minio
AWS_SECRET_ACCESS_KEY: minio123
# Custom endpoint for MinIO
AWS_S3_ENDPOINT: https://minio-service.minio-namespace.svc.cluster.local:9000
AWS_S3_BUCKET: ml-datasets
AWS_DEFAULT_REGION: us-east-1
Basic URI Connection for PVC
# data-connection-uri-pvc.yaml
apiVersion: v1
kind: Secret
metadata:
name: model-pvc-connection
namespace: my-project
labels:
opendatahub.io/dashboard: 'true'
annotations:
# Required: Specifies URI connection type
opendatahub.io/connection-type-ref: uri-v1
openshift.io/display-name: Model PVC Storage
openshift.io/description: 'Points to models stored on PVC'
type: Opaque
stringData:
# PVC URI format: pvc://<mount-path>/<model-path>
URI: pvc://models/llama-2-7b-chat
URI Connection for OCI Registry
# data-connection-uri-oci.yaml
apiVersion: v1
kind: Secret
metadata:
name: model-registry-connection
namespace: my-project
labels:
opendatahub.io/dashboard: 'true'
annotations:
opendatahub.io/connection-type-ref: uri-v1
openshift.io/display-name: Red Hat Model Registry
type: Opaque
stringData:
# OCI URI format for container registries
URI: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
Method 2: Imperative (Using Commands)
Create S3 Connection
# Create secret with S3 credentials
kubectl create secret generic my-s3-connection \
--from-literal=AWS_ACCESS_KEY_ID=my-key \
--from-literal=AWS_SECRET_ACCESS_KEY=my-secret \
--from-literal=AWS_S3_ENDPOINT=https://s3.amazonaws.com \
--from-literal=AWS_S3_BUCKET=my-bucket \
--from-literal=AWS_DEFAULT_REGION=us-east-1 \
-n my-project
# Add required labels and annotations
kubectl label secret my-s3-connection \
opendatahub.io/dashboard=true \
opendatahub.io/managed=true \
-n my-project
kubectl annotate secret my-s3-connection \
opendatahub.io/connection-type=s3 \
openshift.io/display-name="My S3 Connection" \
-n my-project
Create URI Connection
# Create secret with URI
kubectl create secret generic model-uri-connection \
--from-literal=URI=pvc://models/my-model \
-n my-project
# Add required labels and annotations
kubectl label secret model-uri-connection \
opendatahub.io/dashboard=true \
-n my-project
kubectl annotate secret model-uri-connection \
opendatahub.io/connection-type-ref=uri-v1 \
openshift.io/display-name="Model URI Connection" \
-n my-project
Listing and Viewing Data Connections
List All Data Connections
# List all data connections in current namespace
kubectl get secrets -l opendatahub.io/dashboard=true
# List with more details
kubectl get secrets -l opendatahub.io/dashboard=true \
-o custom-columns=NAME:.metadata.name,TYPE:.metadata.annotations.opendatahub\.io/connection-type,DISPLAY:.metadata.annotations.openshift\.io/display-name
# List across all namespaces
kubectl get secrets --all-namespaces -l opendatahub.io/dashboard=true
View Specific Connection
# View connection details (without showing secret data)
kubectl describe secret my-s3-connection
# View connection with decoded data (be careful with credentials)
kubectl get secret my-s3-connection -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d
# View as YAML
kubectl get secret my-s3-connection -o yaml
Filter Connections by Type
# List only S3 connections
kubectl get secrets -l opendatahub.io/dashboard=true \
-o json | jq -r '.items[] | select(.metadata.annotations."opendatahub.io/connection-type" == "s3") | .metadata.name'
# List only URI connections
kubectl get secrets -l opendatahub.io/dashboard=true \
-o json | jq -r '.items[] | select(.metadata.annotations."opendatahub.io/connection-type-ref" == "uri-v1") | .metadata.name'
Updating Data Connections
Using kubectl apply (Declarative)
Update your YAML file and reapply:
kubectl apply -f data-connection-s3-basic.yaml
Using kubectl edit (Interactive)
# Edit connection interactively
kubectl edit secret my-s3-connection
Using kubectl patch
Update S3 Credentials
# Update S3 credentials (base64 encode the values)
kubectl patch secret my-s3-connection --type='json' -p='[
{"op": "replace", "path": "/data/AWS_ACCESS_KEY_ID", "value": "'$(echo -n "new-key" | base64)'"},
{"op": "replace", "path": "/data/AWS_SECRET_ACCESS_KEY", "value": "'$(echo -n "new-secret" | base64)'"}
]'
Update Display Name
# Update display name annotation
kubectl annotate secret my-s3-connection \
openshift.io/display-name="Production S3 Storage" \
--overwrite
Add or Update S3 Endpoint
# Add new field or update existing
kubectl patch secret my-s3-connection --type='json' -p='[
{"op": "add", "path": "/data/AWS_S3_ENDPOINT", "value": "'$(echo -n "https://new-endpoint.com" | base64)'"}
]'
Deleting Data Connections
Basic Deletion
# Delete a specific connection
kubectl delete secret my-s3-connection
# Delete multiple connections
kubectl delete secret my-s3-connection another-connection
# Delete by label selector
kubectl delete secrets -l opendatahub.io/dashboard=true,environment=dev
Important Notes on Deletion
WARNING: Before deleting a data connection:
- Ensure no workbenches are using the connection
- Check that no model deployments reference the connection
- Verify no pipelines depend on the connection
- Consider backing up the connection configuration
Check Connection Usage
# Check if any pods are using the S3 connection as environment variables
kubectl get pods -o json | jq -r '.items[] | select(.spec.containers[].envFrom[]?.secretRef.name == "my-s3-connection") | .metadata.name'
# Check if any InferenceServices reference S3 connections
kubectl get inferenceservices -o json | jq -r '.items[] | select(.spec.predictor.model.storage.key == "my-s3-connection") | .metadata.name'
# Note: URI connections are not directly referenced by InferenceServices
# Their values are manually copied to storageUri fields
Practical Examples
Example 1: Create S3 Connection for Dataset Storage
# dataset-storage-connection.yaml
apiVersion: v1
kind: Secret
metadata:
name: dataset-storage
namespace: fraud-detection-project
labels:
opendatahub.io/dashboard: 'true'
opendatahub.io/managed: 'true'
purpose: dataset-storage
annotations:
opendatahub.io/connection-type: s3
openshift.io/display-name: Fraud Detection Datasets
openshift.io/description: 'S3 bucket containing fraud detection training data'
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: fraud-detection-key
AWS_SECRET_ACCESS_KEY: fraud-detection-secret
AWS_S3_ENDPOINT: https://s3.us-west-2.amazonaws.com
AWS_S3_BUCKET: fraud-detection-datasets
AWS_DEFAULT_REGION: us-west-2
---
# Usage in a workbench pod
apiVersion: v1
kind: Pod
metadata:
name: data-prep-workbench
namespace: fraud-detection-project
spec:
containers:
- name: jupyter
image: quay.io/modh/odh-minimal-notebook-container:v2-2023a
envFrom:
# Inject all S3 credentials as environment variables
- secretRef:
name: dataset-storage
env:
# Override specific values if needed
- name: DATA_BUCKET
value: fraud-detection-datasets/processed
Example 2: Create URI Connection for Model Serving
# model-uri-connection.yaml
apiVersion: v1
kind: Secret
metadata:
name: granite-model-uri
namespace: nlp-project
labels:
opendatahub.io/dashboard: 'true'
model-type: llm
model-family: granite
annotations:
opendatahub.io/connection-type-ref: uri-v1
openshift.io/display-name: Granite 3.1 8B Model
openshift.io/description: 'Red Hat Granite model from registry'
type: Opaque
stringData:
URI: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
---
# Use in InferenceService - copy URI value to storageUri
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: granite-deployment
namespace: nlp-project
spec:
predictor:
model:
modelFormat:
name: vLLM
runtime: granite-runtime
# Copy the URI value directly (not a reference)
storageUri: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
Example 3: Pipeline Artifacts Connection
# pipeline-artifacts-connection.yaml
apiVersion: v1
kind: Secret
metadata:
name: pipeline-artifacts
namespace: ml-pipelines
labels:
opendatahub.io/dashboard: 'true'
opendatahub.io/managed: 'true'
component: data-science-pipelines
annotations:
opendatahub.io/connection-type: s3
openshift.io/display-name: ML Pipeline Artifacts
openshift.io/description: 'Storage for Kubeflow pipeline artifacts and metadata'
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: pipeline-access
AWS_SECRET_ACCESS_KEY: pipeline-secret-123
AWS_S3_ENDPOINT: https://minio-service.minio.svc.cluster.local:9000
AWS_S3_BUCKET: mlpipeline
AWS_DEFAULT_REGION: us-east-1
# Additional fields for pipeline configuration
AWS_S3_USE_PATH_STYLE: 'true'
AWS_S3_VERIFY_SSL: 'false' # For self-signed certificates
Verification and Troubleshooting
Verify Connection Visibility
# Check if connection appears in dashboard list
kubectl get secret my-s3-connection -o jsonpath='{.metadata.labels.opendatahub\.io/dashboard}'
# Verify required annotations
kubectl get secret my-s3-connection -o jsonpath='{.metadata.annotations}' | jq
Test S3 Connection
# Create a test pod with AWS CLI
kubectl run s3-test --rm -i --tty --image=amazon/aws-cli:latest \
--env-from=secret/my-s3-connection \
--command -- /bin/bash
# Inside the pod, test the connection
aws s3 ls s3://$AWS_S3_BUCKET --endpoint-url $AWS_S3_ENDPOINT
Common Issues and Solutions
Connection Not Visible in Dashboard
# Check label is exactly 'true' (string, not boolean)
kubectl get secret my-s3-connection -o yaml | grep "opendatahub.io/dashboard"
# Fix incorrect label
kubectl label secret my-s3-connection opendatahub.io/dashboard=true --overwrite
Invalid Credentials Error
# Verify credentials are base64 encoded correctly
kubectl get secret my-s3-connection -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d
# Re-encode and update if needed
echo -n "correct-access-key" | base64
# Then update using kubectl patch or edit
Connection Type Not Recognized
# Ensure annotation uses correct key and value
kubectl annotate secret my-s3-connection \
opendatahub.io/connection-type=s3 --overwrite
# For URI connections, use different annotation
kubectl annotate secret my-uri-connection \
opendatahub.io/connection-type-ref=uri-v1 --overwrite
Debugging Connection Usage
# Find all resources using a specific connection
kubectl get all -o json | jq -r '.. | objects | select(.secretRef?.name == "my-s3-connection" or .valueFrom?.secretKeyRef?.name == "my-s3-connection") | "\(.kind)/\(.metadata.name)"' 2>/dev/null | sort -u
Best Practices
Secure Credential Management
- Never commit credentials to version control
# Use environment variables or external secret management stringData: AWS_ACCESS_KEY_ID: ${S3_ACCESS_KEY} # Set via CI/CD AWS_SECRET_ACCESS_KEY: ${S3_SECRET_KEY}
- Use separate connections per environment
# Development kubectl apply -f connections/dev-s3-connection.yaml -n dev-project # Production kubectl apply -f connections/prod-s3-connection.yaml -n prod-project
- Implement least-privilege access
- Create S3 policies that only grant necessary permissions
- Use separate credentials for different purposes
Naming Conventions
- Use descriptive names indicating purpose
- Good:
training-data-s3
,model-artifacts-storage
- Avoid:
s3-connection-1
,my-connection
- Good:
- Include environment in name
dev-dataset-storage
prod-model-registry
- Use consistent prefixes for grouping
dataset-
(for training/test datasets)model-
(for model storage and serving)experiment-
(for experiment tracking)
When to Use Each Connection Type
Use S3 Connections for:
- Training datasets
- Model artifacts and checkpoints
- Model serving (loading models from S3 buckets)
- Pipeline intermediate results
- Experiment tracking data
- General file storage
Use URI Connections for:
- Storing and organizing model URIs in the OpenShift AI dashboard
- Managing references to models in PVCs (
pvc://
) - Managing references to models in container registries (
oci://
) - Creating a catalog of available models for easy reference
Using URI Connections
Unlike S3 connections which provide credentials, URI connections simply store URI values that can be:
- Viewed in the OpenShift AI dashboard for reference
- Manually copied into InferenceService
storageUri
fields - Used as documentation for available model locations
Note: URI connections are not automatically injected or referenced by InferenceServices. The URI value must be manually copied to the appropriate configuration.
Integration with Workbenches
- Environment variable injection ```yaml
Workbench automatically receives all secret fields as env vars
envFrom:
- secretRef: name: my-s3-connection ```
- Custom environment mapping ```yaml
Map to specific variable names
env:
- name: TRAINING_DATA_BUCKET valueFrom: secretKeyRef: name: dataset-storage key: AWS_S3_BUCKET ```
Field Reference
S3 Connection Fields
Field | Type | Required | Description | Example |
---|---|---|---|---|
AWS_ACCESS_KEY_ID | string | Yes | S3 access key identifier | AKIAIOSFODNN7EXAMPLE |
AWS_SECRET_ACCESS_KEY | string | Yes | S3 secret access key | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
AWS_S3_ENDPOINT | string | Yes | S3 endpoint URL | https://s3.amazonaws.com |
AWS_S3_BUCKET | string | Yes | S3 bucket name | my-ml-datasets |
AWS_DEFAULT_REGION | string | No | AWS region (default: us-east-1) | us-west-2 |
AWS_S3_USE_PATH_STYLE | string | No | Use path-style URLs | true or false |
AWS_S3_VERIFY_SSL | string | No | Verify SSL certificates | true or false |
URI Connection Fields
Field | Type | Required | Description | Example |
---|---|---|---|---|
URI | string | Yes | Model or resource URI | pvc://models/llama-2-7b |
Common Annotations
Annotation | Type | Required | Description | Example |
---|---|---|---|---|
opendatahub.io/connection-type | string | For S3 | Specifies S3 connection | s3 |
opendatahub.io/connection-type-ref | string | For URI | Specifies URI connection | uri-v1 |
openshift.io/display-name | string | Yes | Human-readable name | Production Dataset Storage |
openshift.io/description | string | No | Extended description | S3 bucket for production ML datasets |
Using with Kubernetes MCP Server
The MCP Kubernetes server provides tools for managing secrets but has limitations for data connections.
MCP Tool Mapping
Operation | MCP Tool | Notes |
---|---|---|
Create Secret | resources_create_or_update | Requires base64 encoding |
List Secrets | resources_list | Can filter by labels |
Get Secret | resources_get | Returns base64 encoded data |
Update Secret | resources_create_or_update | Full replacement |
Delete Secret | resources_delete | Standard deletion |
Creating Data Connections with MCP
# Note: stringData is not supported, must use base64 encoded data
apiVersion: v1
kind: Secret
metadata:
name: mcp-s3-connection
namespace: my-project
labels:
opendatahub.io/dashboard: 'true'
annotations:
opendatahub.io/connection-type: s3
openshift.io/display-name: MCP S3 Connection
type: Opaque
data:
# Values must be base64 encoded
AWS_ACCESS_KEY_ID: bXktYWNjZXNzLWtleQ==
AWS_SECRET_ACCESS_KEY: bXktc2VjcmV0LWtleQ==
AWS_S3_ENDPOINT: aHR0cHM6Ly9zMy5hbWF6b25hd3MuY29t
AWS_S3_BUCKET: bXktYnVja2V0
AWS_DEFAULT_REGION: dXMtZWFzdC0x
Listing Data Connections with MCP
Use resources_list
with label selector:
apiVersion: v1
kind: Secret
labelSelector: opendatahub.io/dashboard=true
namespace: my-project
Getting a Specific Connection
Use resources_get
:
apiVersion: v1
kind: Secret
name: my-s3-connection
namespace: my-project
Updating Connections with MCP
MCP requires full resource replacement:
- Get current connection with
resources_get
- Modify the YAML
- Apply with
resources_create_or_update
Deleting Connections with MCP
Use resources_delete
:
apiVersion: v1
kind: Secret
name: my-s3-connection
namespace: my-project
MCP Limitations
- No stringData support - Must base64 encode all values manually
- No patch operations - Must replace entire resource
- No imperative commands - Cannot use simple create/label/annotate workflow
- Limited filtering - Basic label selection only
Best Practices for MCP
- Prepare base64 values externally
echo -n "my-access-key" | base64
- Use templates for consistency
- Create template YAML files
- Replace only the base64 encoded values
- Verify with native kubectl
- After MCP operations, verify with kubectl
- Check dashboard visibility