Data Science Projects

Data Science Projects in Red Hat OpenShift AI provide isolated environments for organizing your machine learning work. These projects are OpenShift projects (Kubernetes namespaces) with specific labels and annotations that enable integration with the OpenShift AI dashboard and features.

Overview

A data science project is essentially an OpenShift project with the label opendatahub.io/dashboard: 'true'. This label makes the project visible in the OpenShift AI dashboard and enables AI/ML-specific features like:

Workbench creation (Jupyter notebooks)
Data connections
Model serving
Pipeline management
Persistent storage

Creating Projects

Method 1: Declarative (Using YAML)

The declarative approach uses YAML files to define the desired state of your project. This method is recommended for:

Version control and GitOps workflows
Reproducible deployments
Automated provisioning

Basic Project

# project-basic.yaml
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: my-ds-project
  labels:
    # Required: Makes project visible in OpenShift AI dashboard
    opendatahub.io/dashboard: 'true'
    # Automatically added: Matches the project name
    kubernetes.io/metadata.name: my-ds-project
  annotations:
    # Human-readable display name shown in the dashboard
    openshift.io/display-name: My Data Science Project
    # Optional: Project description
    openshift.io/description: 'Project for machine learning experiments'
spec: {}

Apply the project:

kubectl apply -f project-basic.yaml

Standard Project with Common Annotations

# project-standard.yaml
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: ml-fraud-detection
  labels:
    # Required for OpenShift AI
    opendatahub.io/dashboard: 'true'
    kubernetes.io/metadata.name: ml-fraud-detection
    # Optional: Custom labels for organization
    team: data-science
    environment: development
    project-type: ml-experiment
  annotations:
    openshift.io/display-name: Fraud Detection ML
    openshift.io/description: 'Machine learning models for credit card fraud detection'
    # Optional: Who requested/owns the project
    openshift.io/requester: john.doe@example.com
    # Optional: Project documentation link
    project.docs.url: 'https://wiki.example.com/fraud-detection'
spec: {}

Advanced Project with Resource Quotas

# project-advanced.yaml
apiVersion: v1
kind: List
items:
  # The Project
  - apiVersion: project.openshift.io/v1
    kind: Project
    metadata:
      name: production-ml-models
      labels:
        opendatahub.io/dashboard: 'true'
        kubernetes.io/metadata.name: production-ml-models
        environment: production
        compliance: pci-dss
      annotations:
        openshift.io/display-name: Production ML Models
        openshift.io/description: 'Production-ready ML models with resource limits'
        openshift.io/requester: ml-ops-team@example.com
    spec: {}
  
  # Resource Quota (applied after project creation)
  - apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: compute-quota
      namespace: production-ml-models
    spec:
      hard:
        requests.cpu: "100"
        requests.memory: 200Gi
        requests.storage: 1Ti
        persistentvolumeclaims: "10"
        pods: "50"
        requests.nvidia.com/gpu: "4"

Method 2: Imperative (Using Commands)

The imperative approach uses oc or kubectl commands directly. Note that while kubectl can create namespaces, the oc new-project command provides OpenShift-specific functionality.

Using oc (OpenShift CLI)

# Basic project creation
oc new-project my-ds-project

# With display name and description
oc new-project fraud-detection \
  --display-name="Fraud Detection ML" \
  --description="Machine learning models for fraud detection"

# Add the required label to make it visible in OpenShift AI
oc label project fraud-detection opendatahub.io/dashboard=true

Using kubectl

# Create a namespace (project)
kubectl create namespace my-ds-project

# Add required labels
kubectl label namespace my-ds-project opendatahub.io/dashboard=true

# Add annotations
kubectl annotate namespace my-ds-project \
  openshift.io/display-name="My Data Science Project" \
  openshift.io/description="Project for ML experiments"

Listing and Viewing Projects

List All Projects

# List all projects
kubectl get projects

# List with additional information
kubectl get projects -o wide

# List only data science projects
kubectl get projects -l opendatahub.io/dashboard=true

# Custom output showing key fields
kubectl get projects -o custom-columns=\
NAME:.metadata.name,\
DISPLAY:.metadata.annotations.openshift\\.io/display-name,\
STATUS:.status.phase

View Specific Project

# Get project details
kubectl get project my-ds-project

# Get detailed description
kubectl describe project my-ds-project

# Get project in YAML format
kubectl get project my-ds-project -o yaml

# Get project in JSON format (useful for parsing)
kubectl get project my-ds-project -o json

Filter Projects

# List projects by label
kubectl get projects -l team=data-science

# List projects by multiple labels
kubectl get projects -l opendatahub.io/dashboard=true,environment=production

# List projects with specific annotation (using jsonpath)
kubectl get projects -o jsonpath='{.items[?(@.metadata.annotations.openshift\.io/requester=="john.doe@example.com")].metadata.name}'

Updating Projects

Using kubectl apply (Declarative)

Modify your YAML file and reapply:

kubectl apply -f project-updated.yaml

Using kubectl edit (Interactive)

# Opens project in your default editor
kubectl edit project my-ds-project

Using kubectl patch

Update Annotations

# Add or update single annotation
kubectl annotate project my-ds-project \
  openshift.io/description="Updated ML project description" \
  --overwrite

# Add multiple annotations
kubectl annotate project my-ds-project \
  project.version="2.0" \
  project.owner="ml-team@example.com" \
  --overwrite

# Remove an annotation
kubectl annotate project my-ds-project project.version-

Update Labels

# Add or update labels
kubectl label project my-ds-project \
  environment=staging \
  compliance=hipaa \
  --overwrite

# Remove a label
kubectl label project my-ds-project compliance-

Using JSON Patch

# Update display name using JSON patch
kubectl patch project my-ds-project --type='json' \
  -p='[{"op": "replace", "path": "/metadata/annotations/openshift.io~1display-name", "value": "New Display Name"}]'

# Add multiple labels using merge patch
kubectl patch project my-ds-project --type='merge' \
  -p='{"metadata":{"labels":{"tier":"gpu-compute","cost-center":"ml-research"}}}'

Deleting Projects

Basic Deletion

# Delete a specific project
kubectl delete project my-ds-project

# Delete using YAML file
kubectl delete -f project.yaml

# Force deletion (use with caution)
kubectl delete project my-ds-project --force --grace-period=0

Important Notes on Deletion

Project deletion is irreversible - All resources within the project will be deleted
Terminating state - Projects enter a “Terminating” state before complete removal
Finalizers - Some resources may have finalizers that prevent immediate deletion
PVCs - PersistentVolumeClaims might retain data depending on reclaim policy

Check Deletion Status

# Monitor project deletion
kubectl get project my-ds-project -w

# Check for resources preventing deletion
kubectl api-resources --verbs=list --namespaced -o name \
  | xargs -n 1 kubectl get --show-kind --ignore-not-found -n my-ds-project

Practical Examples

Example 1: Create a Complete Data Science Project

# Create project YAML
cat <<EOF > datascience-project.yaml
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: customer-churn-analysis
  labels:
    opendatahub.io/dashboard: 'true'
    kubernetes.io/metadata.name: customer-churn-analysis
    project-type: ml-classification
    team: customer-analytics
    cost-center: marketing
  annotations:
    openshift.io/display-name: Customer Churn Analysis
    openshift.io/description: 'ML models to predict customer churn using historical data'
    openshift.io/requester: sarah.chen@example.com
    project.start-date: '2024-01-15'
    project.ml-framework: 'pytorch,scikit-learn'
spec: {}
EOF

# Apply the project
kubectl apply -f datascience-project.yaml

# Verify creation
kubectl get project customer-churn-analysis

Example 2: Migrate Existing Project to Data Science

# Add data science label to existing project
kubectl label project existing-project opendatahub.io/dashboard=true

# Update annotations for better organization
kubectl annotate project existing-project \
  openshift.io/display-name="Migrated ML Project" \
  openshift.io/description="Legacy project now enabled for OpenShift AI" \
  migration.date="$(date +%Y-%m-%d)" \
  --overwrite

Example 3: Bulk Operations on Projects

# Add cost tracking label to all data science projects
kubectl get projects -l opendatahub.io/dashboard=true -o name | \
  xargs -I {} kubectl label {} cost-tracking=enabled --overwrite

# Export all data science projects
kubectl get projects -l opendatahub.io/dashboard=true -o yaml > all-ds-projects.yaml

# List projects with their descriptions
kubectl get projects -l opendatahub.io/dashboard=true \
  -o custom-columns=NAME:.metadata.name,DESCRIPTION:.metadata.annotations.openshift\\.io/description

Verification and Troubleshooting

Verify Project in OpenShift AI Dashboard

Check the label is present:

kubectl get project my-ds-project -o jsonpath='{.metadata.labels.opendatahub\.io/dashboard}'

Verify project appears in dashboard (via API):

# List all projects visible to OpenShift AI
kubectl get projects -l opendatahub.io/dashboard=true

Common Issues and Solutions

Project Not Visible in Dashboard

# Check if label exists
kubectl get project my-ds-project --show-labels

# Add missing label
kubectl label project my-ds-project opendatahub.io/dashboard=true --overwrite

Permission Denied

# Check your permissions
kubectl auth can-i create projects

# Check specific project access
kubectl auth can-i get project my-ds-project

Project Stuck in Terminating

# Check what's preventing deletion
kubectl get all -n my-ds-project

# Check for finalizers
kubectl get project my-ds-project -o jsonpath='{.metadata.finalizers}'

# Remove finalizers if needed (use with caution)
kubectl patch project my-ds-project -p '{"metadata":{"finalizers":[]}}' --type=merge

Best Practices

Naming Conventions

Use lowercase letters, numbers, and hyphens only
- Good: ml-fraud-detection, customer-churn-v2
- Bad: ML_Fraud_Detection, Customer.Churn
Include purpose in the name
- Good: image-classification-prod, nlp-sentiment-dev
- Bad: project1, test
Avoid generic names
- Use specific, descriptive names that indicate the project’s purpose

Label and Annotation Strategy

Required Labels

labels:
  opendatahub.io/dashboard: 'true'  # Required for OpenShift AI

Recommended Labels

labels:
  team: data-science              # Team ownership
  environment: development        # dev/staging/production
  project-type: ml-training      # Project category
  cost-center: ml-research       # Cost tracking

Useful Annotations

annotations:
  openshift.io/display-name: "Human Readable Name"
  openshift.io/description: "Detailed project description"
  openshift.io/requester: "email@example.com"
  project.docs.url: "https://docs.example.com/project"
  project.git.url: "https://github.com/org/repo"

Security Considerations

Limit project creation to authorized users
Use ResourceQuotas to prevent resource exhaustion
Apply NetworkPolicies for network isolation
Regular cleanup of unused projects
Audit project access periodically

When to Use Declarative vs Imperative

Use Declarative (YAML) when:

Creating projects in production
Need version control
Automating with CI/CD
Creating multiple related resources
Need reproducible deployments

Use Imperative (Commands) when:

Quick testing or development
One-time operations
Interactive troubleshooting
Simple label/annotation updates

Field Reference

Field Path	Type	Required	Description	Example
`apiVersion`	string	Yes	API version for Project resource	`project.openshift.io/v1`
`kind`	string	Yes	Resource type	`Project`
`metadata.name`	string	Yes	Project name (lowercase, hyphens)	`my-ds-project`
`metadata.labels`	object	No*	Key-value pairs for organization	`team: data-science`
`metadata.labels."opendatahub.io/dashboard"`	string	Yes**	Enable OpenShift AI integration	`'true'`
`metadata.labels."kubernetes.io/metadata.name"`	string	Auto	Automatically set to match name	`my-ds-project`
`metadata.annotations`	object	No	Non-identifying metadata	See below
`metadata.annotations."openshift.io/display-name"`	string	No	Human-readable name	`My Data Science Project`
`metadata.annotations."openshift.io/description"`	string	No	Project description	`ML experiments for customer analysis`
`metadata.annotations."openshift.io/requester"`	string	No	Project creator/owner	`john.doe@example.com`
`spec`	object	Yes	Project specification (usually empty)	`{}`
`status`	object	Read-only	Project status (set by system)	N/A

* Labels are optional but opendatahub.io/dashboard is required for OpenShift AI integration
** Required only for data science projects to appear in OpenShift AI dashboard

Common Custom Annotations

Annotation	Description	Example
`project.version`	Project version tracking	`'1.2.0'`
`project.owner`	Project owner/team	`ml-ops-team`
`project.docs.url`	Documentation link	`https://wiki.example.com/project`
`project.git.url`	Source code repository	`https://github.com/org/repo`
`project.jira.key`	Issue tracking reference	`MLOPS-123`
`project.start-date`	Project start date	`'2024-01-15'`
`project.ml-framework`	ML frameworks used	`tensorflow,pytorch`
`project.compliance`	Compliance requirements	`hipaa,pci-dss`

Using with Kubernetes MCP Server

If you’re using the Kubernetes MCP server for AI-assisted operations, you’ll need to adapt some commands since MCP tools work differently than direct kubectl commands.

MCP Tool Mapping

kubectl Command	MCP Tool	Parameters
`kubectl apply -f project.yaml`	`resources_create_or_update`	Pass YAML content as `resource`
`kubectl get projects`	`projects_list`	No parameters needed
`kubectl get project <name>`	`resources_get`	`apiVersion`, `kind`, `name`
`kubectl get projects -l <label>`	`resources_list`	`apiVersion`, `kind`, `labelSelector`
`kubectl delete project <name>`	`resources_delete`	`apiVersion`, `kind`, `name`

Creating Projects with MCP

Use the resources_create_or_update tool with the YAML content:

# Pass this YAML to the resources_create_or_update tool
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  name: my-ds-project
  labels:
    opendatahub.io/dashboard: 'true'
    kubernetes.io/metadata.name: my-ds-project
  annotations:
    openshift.io/display-name: My Data Science Project
    openshift.io/description: 'Project for ML experiments'
spec: {}

Listing Projects with MCP

# List all OpenShift projects
# Use: projects_list (no parameters)

# List projects with specific labels
# Use: resources_list with parameters:
apiVersion: project.openshift.io/v1
kind: Project
labelSelector: opendatahub.io/dashboard=true

Getting a Specific Project

# Use: resources_get with parameters:
apiVersion: project.openshift.io/v1
kind: Project
name: my-ds-project

Updating Projects with MCP

Since MCP doesn’t support kubectl patch or kubectl label directly:

Get the current project using resources_get
Modify the YAML (add/update labels or annotations)
Apply the updated YAML using resources_create_or_update

Example workflow:

# 1. Get current project state
# 2. Modify the returned YAML to add a label:
metadata:
  labels:
    opendatahub.io/dashboard: 'true'
    environment: production  # New label
# 3. Pass modified YAML to resources_create_or_update

Deleting Projects with MCP

# Use: resources_delete with parameters:
apiVersion: project.openshift.io/v1
kind: Project
name: my-ds-project

MCP Limitations

The following operations from our documentation are not directly supported by MCP:

Interactive editing (kubectl edit) - Use get, modify, and update workflow instead
Direct label/annotation commands (kubectl label, kubectl annotate) - Update full resource
JSONPath queries - MCP returns full resources; filtering happens client-side
Watch operations (-w flag) - Not supported
Custom output columns - MCP returns standard formats
Imperative namespace creation - Use declarative YAML approach

Best Practices for MCP

Use declarative YAML - This aligns perfectly with MCP’s design
Batch operations - Get all resources and process them programmatically
Full resource updates - Always work with complete resource definitions
Leverage projects_list - Use the dedicated tool for listing OpenShift projects