System Architecture Overview
Introduction
AI4DRPM follows a layered architecture pattern with clear separation of concerns.
Architectural Layers
graph TB
subgraph "Web Layer"
A["FastAPI REST API fa:fa-rocket"]
C["Middleware fa:fa-shield-alt"]
B["API Routers fa:fa-router"]
end
subgraph "Service Layer"
D["Resource Services fa:fa-book"]
E["Engine Services fa:fa-cog"]
F["Shared Services fa:fa-tools"]
end
subgraph "Asynchronous Tasks"
N[("Redis Broker fa:fa-database")]
M["Celery Workers fa:fa-tachometer-alt"]
end
subgraph "AI services"
H["Haystack Pipelines fa:fa-project-diagram"]
end
subgraph "Data Layer"
L["Repositories fa:fa-folder"]
K["SQLAlchemy ORM fa:fa-link"]
O[("PostgreSQL fa:fa-database")]
end
Q["LLM APIs fa:fa-robot"]
MR["Model Registry fa:fa-archive"]
CL["EU Cellar fa:fa-globe"]
%% Web Layer Flow
A -->|"Request"| C
C -->|"OIDC/JWT Auth"| B
B -->|"Resource Ops"| D
B -->|"Pipeline Ops"| E
B -->|"Shared Ops"| F
%% Engine dispatches pipeline as task
E -->|"Execute Pipeline"| N
%% Broker delivers to workers
N <-->|"Broker"| M
%% Worker executes locally via Haystack
M -->|"Execute Locally"| H
%% LLM calls only via Haystack
H -->|"LLM Calls"| Q
%% Model Registry — bidirectional with Haystack
H <-->|"Register/Retrieve"| MR
%% Cellar — read only via SPARQL and REST
D -->|"SPARQL/REST"| CL
%% Data Access
D -->|"CRUD"| L
E -->|"CRUD"| L
F -->|"CRUD"| L
L -->|"ORM"| K
K -->|"Query"| O
%% Styling
style O fill:#e1f5ff,stroke:#333
style K fill:#fff4e1,stroke:#333
style Q fill:#f0f8ff,stroke:#333
style N fill:#ffe0b2,stroke:#333
style H fill:#e8f5e9,stroke:#333
style MR fill:#f3e5f5,stroke:#333
style CL fill:#f0fff0,stroke:#333
Target architectural diagram
graph TB
subgraph "Web Layer"
A["FastAPI REST API fa:fa-rocket"]
C["Middleware fa:fa-shield-alt"]
B["API Routers fa:fa-router"]
end
subgraph "Service Layer"
F["Shared Services fa:fa-tools"]
D["Resource Services fa:fa-book"]
E["Engine Services fa:fa-cog"]
end
subgraph "Data Layer"
RR["Resource Repositories fa:fa-folder"]
VTS[("Vector-Capable Triple Store fa:fa-database")]
SR["Repositories fa:fa-folder"]
K["SQLAlchemy ORM fa:fa-link"]
O[("PostgreSQL fa:fa-database")]
end
subgraph "Asynchronous Tasks"
N[("Redis Broker fa:fa-database")]
M["Celery Workers fa:fa-tachometer-alt"]
end
subgraph "AI Services"
H["Haystack Pipelines fa:fa-project-diagram"]
DE["Deepset Enterprise fa:fa-cloud"]
end
CL["EU Cellar fa:fa-globe"]
Q["LLM APIs fa:fa-robot"]
MR["Model Registry fa:fa-archive"]
%% Web Layer Flow
A -->|"Request"| C
C -->|"OIDC/JWT Auth"| B
B -->|"Shared Ops"| F
B -->|"Resource Ops"| D
B -->|"Pipeline Ops"| E
%% Data Access — Resource Services → Resource Repositories → Vector Triple Store → Cellar
D -->|"CRUD"| RR
RR -->|"SPARQL"| VTS
VTS <-->|"REST/SPARQL"| CL
%% Data Access — Engine & Shared → Repositories → PostgreSQL
F -->|"CRUD"| SR
E -->|"CRUD"| SR
SR -->|"ORM"| K
K -->|"Query"| O
%% Engine dispatches pipeline as task
E -->|"Execute Pipeline"| N
%% Broker delivers to workers
N <-->|"Broker"| M
%% Worker decides execution path
M -->|"Execute Locally"| H
M -->|"Deploy"| DE
%% LLM calls only via Haystack or Deepset Enterprise
H -->|"LLM Calls"| Q
DE -->|"LLM Calls"| Q
%% Model Registry — bidirectional with both Haystack and Deepset Enterprise
H <-->|"Register/Retrieve"| MR
DE <-->|"Register/Retrieve"| MR
%% AI Services → Vector-capable triple store
H -->|"SPARQL/Vector Search"| VTS
DE -->|"SPARQL/Vector Search"| VTS
%% Styling
style O fill:#e1f5ff,stroke:#333
style VTS fill:#e8f5e9,stroke:#333
style K fill:#fff4e1,stroke:#333
style Q fill:#f0f8ff,stroke:#333
style DE fill:#bbdefb,stroke:#333
style N fill:#ffe0b2,stroke:#333
style H fill:#e8f5e9,stroke:#333
style MR fill:#f3e5f5,stroke:#333
style CL fill:#f0fff0,stroke:#333
style RR fill:#fce4ec,stroke:#333
style SR fill:#ede7f6,stroke:#333
Layer Descriptions
1. Web Layer
Responsibility: HTTP request handling, response formatting, authentication
Components:
api.py- FastAPI application initializationrouters/- Endpoint definitions organized by domaindependencies.py- Dependency injection (auth, database sessions)schemas/- Domain-organized Pydantic request/response modelsMiddleware - CORS, security headers, logging
2. Service Layer
Responsibility: Business logic, orchestration, data validation
Components:
Resource Services (services/resources/)
document_collection_service.py- Document collection from EU Cellardocument_parsing_service.py- Document parsing with tulitdocument_metadata_service.py- SPARQL-based document discovery, metadata enrichment, and CELEX metadata extractionlegal_resource_service.py- Legal resource CRUDprovision_service.py- Legal provision CRUDclassification_service.py- Legal Provision Classification CRUDanalysis_service.py- Analysis CRUDcategory_service.py- Category CRUDstatement_service.py- Statement generation
Engine Services (services/engine/)
pipeline_service.py- Pipeline lookup, orchestration and execution repository.training_service.py- Model training and evaluation orchestrationtoken_usage_service.py- LLM token usage tracking
Haystack Integration (services/engine/haystack/)
components/- Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)base.py- Base component classdependency_parsing_classifier.py- Dependency parsing classifiergitlab_model_persister.py- GitLab model persisterjson_extractor.py- JSON extractorllm_analysis_parser.py- LLM analysis parserllm_classification_parser.py- LLM classification parsermodel_loader.py- Model loadermulti_classifier.py- Multi classifiertext_preprocessor.py- Text preprocessortraining/- Training componentsdata_splitter.py- Data splittermodel_persister.py- Model persistersvm_trainer.py- SVM trainertfidf_trainer.py- TF-IDF trainertraining_data_loader.py- Training data loader
streaming.py- Utilities for streaming Haystack pipeline results
All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by pipeline_service. This consolidates Haystack-specific logic under services/engine/haystack/ while keeping orchestration and execution concerns in services/engine.
3. Data Layer
Responsibility: Data access, ORM mapping, database operations
Components:
db/models/- SQLAlchemy models organized by domaindb/repositories/- Data access patternsdb/migrations/- Alembic migration scriptsdb/database.py- Database connection and session management
4. Task Queue & Workers
Responsibility: Asynchronous job processing, background tasks
Components:
tasks/celery_worker.py- Celery application configurationtasks/tasks.py- Task definitionstasks/handler.py- Task execution handlerstasks/factory.py- Task factory patterntasks/utils.py- Task utilitiestasks/types.py- Task status and record types
5. Authentication & Security
Responsibility: User authentication, authorization, security
Components:
auth/security.py- JWT token generation/validation
6. Utilities Layer
Responsibility: Cross-cutting concerns, helper functions
Components:
utils/sparql_utils.py- SPARQL query executionutils/refresh_token_utils.py- Token utilitiesutils/serialization.py- Serialization helpersutils/utils.py- General utilities
Technology Integration Points
External APIs
OpenAI-compatible API: LLM-based text analysis and annotation
SPARQL Endpoints: Knowledge graph queries (e.g., Cellar)
Databases
PostgreSQL: Primary data store
Message Queues
Celery + Redis: Asynchronous task processing
Configuration Management
Configuration is managed through:
Environment Variables (
.envfile)Config JSON (
config.jsonfor paths)Database Configuration (Alembic for migrations)
Logging & Monitoring
Logging
Structured logging to
logs/ai4drpm.logLog rotation (via
logrotate.conf)Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
Deployment Architecture
Docker Compose Deployment
Multi-container setup
Separate containers for: API, Worker, PostgreSQL, Redis
Volume mounts for persistence
Network isolation