MLflow Service
Tracks experiments and integrates with GitLab Model Registry.
Provides unified interface for experiment tracking and model registry using GitLab’s MLflow-compatible API.
GitLab acts as both experiment tracking server and model registry, storing runs, metrics, parameters, and model artifacts.
- Usage:
from ai4drpm.services.shared.mlflow_service import get_mlflow_service
service = get_mlflow_service()
- if service.enabled:
- with service.start_run(“ai4drpm-classifiers”, “data-1.0.1”):
mlflow.log_param(“C”, 10) mlflow.log_metric(“f1_score”, 0.847) mlflow.sklearn.log_model(model, artifact_path=””)
Service for GitLab MLflow experiment tracking and model registry.
Uses GitLab as the MLflow backend via their API v4 compatibility layer. Supports both experiment tracking and model registry operations.
Features: - Experiment tracking with runs, parameters, metrics - Model registry with semantic versioning - Automatic CI/CD job linking when running in GitLab CI - Graceful degradation when MLflow is not configured
Experiment name for classifier training
Experiment name for LLM pipeline tracking
Experiment name for ML inference tracking
- EXPERIMENT_CLASSIFIERS = 'ai4drpm-classifiers'
- EXPERIMENT_LLM_PIPELINES = 'ai4drpm-llm-pipelines'
- EXPERIMENT_ML_INFERENCE = 'ai4drpm-ml-inference'
Initialize with environment configuration.
Reads MLFLOW_TRACKING_URI and MLFLOW_TRACKING_TOKEN from config. If both are set, configures MLflow to use GitLab backend.
Check if MLflow tracking is enabled and configured.
- Returns:
True if MLflow is configured and ready to use
Get the mlflow module.
- Returns:
The mlflow module for direct API access
- Raises:
RuntimeError – If MLflow is not configured
Get MLflow client for registry operations.
- Returns:
MlflowClient instance for model registry operations
- Raises:
RuntimeError – If MLflow is not configured
Context manager for MLflow run with GitLab CI integration.
Automatically creates the experiment if it doesn’t exist. Tags the run with CI job information when running in GitLab CI.
- Parameters:
- Yields:
MLflow Run object if enabled, None otherwise
- Return type:
Example
- with service.start_run(“ai4drpm-classifiers”, “data-1.0.1”) as run:
- if run:
mlflow.log_param(“C”, 10) mlflow.log_metric(“f1_score”, 0.847)
List all registered models from GitLab Model Registry.
Supports two discovery methods controlled by MLFLOW_REGISTRY_DISCOVERY_METHOD: - “http”: Direct REST API call (default, works with GitLab) - “mlflow_client”: Uses MLflow Python client (for when GitLab fixes compatibility)
Register a new model version in GitLab Model Registry.
Creates the model if it doesn’t exist, then creates a new version.
IMPORTANT: On GitLab, each model version IS its own run. The run_id parameter is ignored by GitLab’s create_model_version. To log artifacts to a model version, use the run_id returned by get_model_version() after calling this method.
- Parameters:
- Return type:
- Returns:
ModelVersion object if successful, None if MLflow disabled. Use model_version.run_id to log artifacts to this version.
Get latest version of a model from registry.
Uses get_registered_model() which is reliably supported by GitLab’s MLflow API (search_model_versions returns 404 on some GitLab instances).
Prefers versions with a gitlab.version tag and a run_id (indicating artifacts are available). Falls back to any version with a gitlab.version tag, then to the MLflow version.
Load model from GitLab Model Registry.
Downloads model artifacts from GitLab Package Registry and loads using the appropriate MLflow flavor.
- Parameters:
- Return type:
- Returns:
Loaded model object
- Raises:
RuntimeError – If MLflow is not configured
Exception – If model cannot be loaded
Log an sklearn model to the current run and optionally register it.
Convenience method that combines logging and registration.
- Parameters:
- Return type:
- Returns:
Version string if registered, None otherwise
Increment a semantic version string.
- Parameters:
- Return type:
- Returns:
Incremented version string, or “1.0.0” if version is invalid
Example
increment_version(“1.0.0”, “patch”) → “1.0.1” increment_version(“1.0.1”, “minor”) → “1.1.0” increment_version(“1.1.0”, “major”) → “2.0.0”
Determine next version for a model based on latest in registry.
Get (run_id, resolved_version) for a model version.
On GitLab, each model version IS its own run. The run_id is available via get_model_version() — not via get_registered_model() which returns run_id=None in latest_versions.
Load sklearn model from a specific MLflow run.
Downloads artifacts and loads the model.joblib file. Falls back to mlflow.sklearn.load_model for backward compatibility with models logged via mlflow.sklearn.log_model.
- Parameters:
run_id (
str) – The MLflow run ID containing the model artifact- Return type:
- Returns:
Loaded sklearn model object
- Raises:
RuntimeError – If MLflow is not configured
Download and load the vectorizer joblib artifact from a run.
Looks for vectorizer.joblib first (new format), then falls back to any .joblib file that isn’t model.joblib (backward compat).
- Parameters:
run_id (
str) – The MLflow run ID containing the vectorizer artifact- Return type:
- Returns:
Loaded vectorizer object
- Raises:
RuntimeError – If MLflow is not configured
Get or create the MLflow service singleton.
- Return type:
- Returns:
GitLabMLflowService instance (same instance on repeated calls)
Example
service = get_mlflow_service() if service.enabled:
- with service.start_run(“experiment”, “run-name”):
…
Reset the MLflow service singleton (for testing).
Forces re-initialization on next get_mlflow_service() call.
- Return type: