System Architecture Overview

Introduction

AI4DRPM follows a layered architecture pattern with clear separation of concerns.

Architectural Layers

        graph TB
    subgraph "Web Layer"
        A["FastAPI REST API fa:fa-rocket"]
        C["Middleware fa:fa-shield-alt"]
        B["API Routers fa:fa-router"]
    end

    subgraph "Service Layer"
        D["Resource Services fa:fa-book"]
        E["Engine Services fa:fa-cog"]
        F["Shared Services fa:fa-tools"]
    end

    subgraph "Asynchronous Tasks"
        N[("Redis Broker fa:fa-database")]
        M["Celery Workers fa:fa-tachometer-alt"]
    end

    subgraph "AI services"
        H["Haystack Pipelines fa:fa-project-diagram"]
    end

    subgraph "Data Layer"
        L["Repositories fa:fa-folder"]
        K["SQLAlchemy ORM fa:fa-link"]
        O[("PostgreSQL fa:fa-database")]
    end

    Q["LLM APIs fa:fa-robot"]
    MR["Model Registry fa:fa-archive"]
    CL["EU Cellar fa:fa-globe"]

    %% Web Layer Flow
    A -->|"Request"| C
    C -->|"OIDC/JWT Auth"| B
    B -->|"Resource Ops"| D
    B -->|"Pipeline Ops"| E
    B -->|"Shared Ops"| F

    %% Engine dispatches pipeline as task
    E -->|"Execute Pipeline"| N

    %% Broker delivers to workers
    N <-->|"Broker"| M

    %% Worker executes locally via Haystack
    M -->|"Execute Locally"| H

    %% LLM calls only via Haystack
    H -->|"LLM Calls"| Q

    %% Model Registry — bidirectional with Haystack
    H <-->|"Register/Retrieve"| MR

    %% Cellar — read only via SPARQL and REST
    D -->|"SPARQL/REST"| CL

    %% Data Access
    D -->|"CRUD"| L
    E -->|"CRUD"| L
    F -->|"CRUD"| L
    L -->|"ORM"| K
    K -->|"Query"| O

    %% Styling
    style O fill:#e1f5ff,stroke:#333
    style K fill:#fff4e1,stroke:#333
    style Q fill:#f0f8ff,stroke:#333
    style N fill:#ffe0b2,stroke:#333
    style H fill:#e8f5e9,stroke:#333
    style MR fill:#f3e5f5,stroke:#333
    style CL fill:#f0fff0,stroke:#333
    

Target architectural diagram

        graph TB
    subgraph "Web Layer"
        A["FastAPI REST API fa:fa-rocket"]
        C["Middleware fa:fa-shield-alt"]
        B["API Routers fa:fa-router"]
    end

    subgraph "Service Layer"
        F["Shared Services fa:fa-tools"]
        D["Resource Services fa:fa-book"]
        E["Engine Services fa:fa-cog"]
    end

    subgraph "Data Layer"
        RR["Resource Repositories fa:fa-folder"]
        VTS[("Vector-Capable Triple Store fa:fa-database")]
        SR["Repositories fa:fa-folder"]
        K["SQLAlchemy ORM fa:fa-link"]
        O[("PostgreSQL fa:fa-database")]
    end

    subgraph "Asynchronous Tasks"
        N[("Redis Broker fa:fa-database")]
        M["Celery Workers fa:fa-tachometer-alt"]
    end

    subgraph "AI Services"
        H["Haystack Pipelines fa:fa-project-diagram"]
        DE["Deepset Enterprise fa:fa-cloud"]
    end

    CL["EU Cellar fa:fa-globe"]
    Q["LLM APIs fa:fa-robot"]
    MR["Model Registry fa:fa-archive"]

    %% Web Layer Flow
    A -->|"Request"| C
    C -->|"OIDC/JWT Auth"| B
    B -->|"Shared Ops"| F
    B -->|"Resource Ops"| D
    B -->|"Pipeline Ops"| E

    %% Data Access — Resource Services → Resource Repositories → Vector Triple Store → Cellar
    D -->|"CRUD"| RR
    RR -->|"SPARQL"| VTS
    VTS <-->|"REST/SPARQL"| CL

    %% Data Access — Engine & Shared → Repositories → PostgreSQL
    F -->|"CRUD"| SR
    E -->|"CRUD"| SR
    SR -->|"ORM"| K
    K -->|"Query"| O

    %% Engine dispatches pipeline as task
    E -->|"Execute Pipeline"| N

    %% Broker delivers to workers
    N <-->|"Broker"| M

    %% Worker decides execution path
    M -->|"Execute Locally"| H
    M -->|"Deploy"| DE

    %% LLM calls only via Haystack or Deepset Enterprise
    H -->|"LLM Calls"| Q
    DE -->|"LLM Calls"| Q

    %% Model Registry — bidirectional with both Haystack and Deepset Enterprise
    H <-->|"Register/Retrieve"| MR
    DE <-->|"Register/Retrieve"| MR

    %% AI Services → Vector-capable triple store
    H -->|"SPARQL/Vector Search"| VTS
    DE -->|"SPARQL/Vector Search"| VTS

    %% Styling
    style O fill:#e1f5ff,stroke:#333
    style VTS fill:#e8f5e9,stroke:#333
    style K fill:#fff4e1,stroke:#333
    style Q fill:#f0f8ff,stroke:#333
    style DE fill:#bbdefb,stroke:#333
    style N fill:#ffe0b2,stroke:#333
    style H fill:#e8f5e9,stroke:#333
    style MR fill:#f3e5f5,stroke:#333
    style CL fill:#f0fff0,stroke:#333
    style RR fill:#fce4ec,stroke:#333
    style SR fill:#ede7f6,stroke:#333

    

Layer Descriptions

1. Web Layer

Responsibility: HTTP request handling, response formatting, authentication

Components:

  • api.py - FastAPI application initialization

  • routers/ - Endpoint definitions organized by domain

  • dependencies.py - Dependency injection (auth, database sessions)

  • schemas/ - Domain-organized Pydantic request/response models

  • Middleware - CORS, security headers, logging

2. Service Layer

Responsibility: Business logic, orchestration, data validation

Components:

Resource Services (services/resources/)

  • document_collection_service.py - Document collection from EU Cellar

  • document_parsing_service.py - Document parsing with tulit

  • document_metadata_service.py - SPARQL-based document discovery, metadata enrichment, and CELEX metadata extraction

  • legal_resource_service.py - Legal resource CRUD

  • provision_service.py - Legal provision CRUD

  • classification_service.py - Legal Provision Classification CRUD

  • analysis_service.py - Analysis CRUD

  • category_service.py - Category CRUD

  • statement_service.py - Statement generation

Engine Services (services/engine/)

  • pipeline_service.py - Pipeline lookup, orchestration and execution repository.

  • training_service.py - Model training and evaluation orchestration

  • token_usage_service.py - LLM token usage tracking

Haystack Integration (services/engine/haystack/)

  • components/ - Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)

    • base.py - Base component class

    • dependency_parsing_classifier.py - Dependency parsing classifier

    • gitlab_model_persister.py - GitLab model persister

    • json_extractor.py - JSON extractor

    • llm_analysis_parser.py - LLM analysis parser

    • llm_classification_parser.py - LLM classification parser

    • model_loader.py - Model loader

    • multi_classifier.py - Multi classifier

    • text_preprocessor.py - Text preprocessor

    • training/ - Training components

      • data_splitter.py - Data splitter

      • model_persister.py - Model persister

      • svm_trainer.py - SVM trainer

      • tfidf_trainer.py - TF-IDF trainer

      • training_data_loader.py - Training data loader

  • streaming.py - Utilities for streaming Haystack pipeline results

All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by pipeline_service. This consolidates Haystack-specific logic under services/engine/haystack/ while keeping orchestration and execution concerns in services/engine.

Shared Services (services/shared/)

  • user_service.py - User CRUD

  • refresh_token_service.py - Token lifecycle management

  • statistics_service.py - System statistics and dashboards

  • token_usage_service.py - LLM token usage tracking

  • task_service.py - Task lifecycle management

  • mlflow_service.py - MLflow tracking service

3. Data Layer

Responsibility: Data access, ORM mapping, database operations

Components:

  • db/models/ - SQLAlchemy models organized by domain

  • db/repositories/ - Data access patterns

  • db/migrations/ - Alembic migration scripts

  • db/database.py - Database connection and session management

4. Task Queue & Workers

Responsibility: Asynchronous job processing, background tasks

Components:

  • tasks/celery_worker.py - Celery application configuration

  • tasks/tasks.py - Task definitions

  • tasks/handler.py - Task execution handlers

  • tasks/factory.py - Task factory pattern

  • tasks/utils.py - Task utilities

  • tasks/types.py - Task status and record types

5. Authentication & Security

Responsibility: User authentication, authorization, security

Components:

  • auth/security.py - JWT token generation/validation

6. Utilities Layer

Responsibility: Cross-cutting concerns, helper functions

Components:

  • utils/sparql_utils.py - SPARQL query execution

  • utils/refresh_token_utils.py - Token utilities

  • utils/serialization.py - Serialization helpers

  • utils/utils.py - General utilities

Technology Integration Points

External APIs

  • OpenAI-compatible API: LLM-based text analysis and annotation

  • SPARQL Endpoints: Knowledge graph queries (e.g., Cellar)

Databases

  • PostgreSQL: Primary data store

Message Queues

  • Celery + Redis: Asynchronous task processing

Configuration Management

Configuration is managed through:

  1. Environment Variables (.env file)

  2. Config JSON (config.json for paths)

  3. Database Configuration (Alembic for migrations)

Logging & Monitoring

Logging

  • Structured logging to logs/ai4drpm.log

  • Log rotation (via logrotate.conf)

  • Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL

Deployment Architecture

Docker Compose Deployment

  • Multi-container setup

  • Separate containers for: API, Worker, PostgreSQL, Redis

  • Volume mounts for persistence

  • Network isolation