System Architecture Overview
Introduction
Architectural Diagram (AS-IS)
graph TB
subgraph "Web Layer"
A["FastAPI REST API"]
C["Middleware"]
B["API Routers"]
end
subgraph "Service Layer"
D["Resource Services"]
E["Engine Services"]
F["Shared Services"]
G["Bulk Services"]
end
subgraph "Asynchronous Tasks"
N[("Redis Broker")]
M["Celery Workers"]
end
subgraph "AI Services"
H["Haystack Pipelines"]
HE["Haystack Enterprise"]
end
subgraph "Data Layer"
L["Repositories"]
K["SQLAlchemy ORM"]
O[("PostgreSQL")]
end
Q["LLM APIs"]
MR["Model Registry"]
CL["EU Cellar"]
%% Web Layer Flow
A -->|"Request"| C
C -->|"OIDC/JWT Auth"| B
B -->|"Resource Ops"| D
B -->|"Pipeline Ops"| E
B -->|"Shared Ops"| F
B -->|"Bulk Ops"| G
%% Engine dispatches pipeline as task
E -->|"Execute Pipeline"| N
%% Broker delivers to workers
N <-->|"Broker"| M
%% Worker decides execution path
M -->|"Execute Locally"| H
M -->|"Deploy"| HE
%% LLM calls via Haystack or Haystack Enterprise
H -->|"LLM Calls"| Q
HE -->|"LLM Calls"| Q
%% Model Registry - bidirectional with both Haystack and Haystack Enterprise
H <-->|"Register/Retrieve"| MR
HE <-->|"Register/Retrieve"| MR
%% Cellar - read only via SPARQL and REST
D -->|"SPARQL/REST"| CL
%% Data Access
D -->|"CRUD"| L
E -->|"CRUD"| L
F -->|"CRUD"| L
G -->|"CRUD"| L
L -->|"ORM"| K
K -->|"Query"| O
%% Styling
style O fill:#e1f5ff,stroke:#333
style K fill:#fff4e1,stroke:#333
style Q fill:#f0f8ff,stroke:#333
style N fill:#ffe0b2,stroke:#333
style H fill:#e8f5e9,stroke:#333
style HE fill:#bbdefb,stroke:#333
style MR fill:#f3e5f5,stroke:#333
style CL fill:#f0fff0,stroke:#333
Target Architectural Diagram (TO-BE)
graph TB
subgraph "Web Layer"
A["FastAPI REST API"]
C["Middleware"]
B["API Routers"]
end
subgraph "Service Layer"
D["Resource Services"]
E["Engine Services"]
F["Shared Services"]
G["Bulk Services"]
end
subgraph "Data Layer"
RR["Resource Repositories"]
VTS[("Vector-Capable Triple Store")]
SR["Repositories"]
K["SQLAlchemy ORM"]
O[("PostgreSQL")]
end
subgraph "Asynchronous Tasks"
N[("Redis Broker")]
M["Celery Workers"]
end
subgraph "AI Services"
H["Haystack Pipelines"]
HE["Haystack Enterprise"]
end
CL["EU Cellar"]
Q["LLM APIs"]
MR["Model Registry"]
%% Web Layer Flow
A -->|"Request"| C
C -->|"OIDC/JWT Auth"| B
B -->|"Resource Ops"| D
B -->|"Pipeline Ops"| E
B -->|"Shared Ops"| F
B -->|"Bulk Ops"| G
%% Data Access - Resource Services -> Resource Repositories -> Vector Triple Store -> Cellar
D -->|"CRUD"| RR
RR -->|"SPARQL"| VTS
VTS <-->|"REST/SPARQL"| CL
%% Data Access - Engine & Shared -> Repositories -> PostgreSQL
E -->|"CRUD"| SR
F -->|"CRUD"| SR
G -->|"CRUD"| SR
SR -->|"ORM"| K
K -->|"Query"| O
%% Engine dispatches pipeline as task
E -->|"Execute Pipeline"| N
%% Broker delivers to workers
N <-->|"Broker"| M
%% Worker decides execution path
M -->|"Execute Locally"| H
M -->|"Deploy"| HE
%% LLM calls via Haystack or Haystack Enterprise
H -->|"LLM Calls"| Q
HE -->|"LLM Calls"| Q
%% Model Registry - bidirectional with both Haystack and Haystack Enterprise
H <-->|"Register/Retrieve"| MR
HE <-->|"Register/Retrieve"| MR
%% AI Services -> Vector-capable triple store
H -->|"SPARQL/Vector Search"| VTS
HE -->|"SPARQL/Vector Search"| VTS
%% Styling
style O fill:#e1f5ff,stroke:#333
style VTS fill:#e8f5e9,stroke:#333
style K fill:#fff4e1,stroke:#333
style Q fill:#f0f8ff,stroke:#333
style HE fill:#bbdefb,stroke:#333
style N fill:#ffe0b2,stroke:#333
style H fill:#e8f5e9,stroke:#333
style MR fill:#f3e5f5,stroke:#333
style CL fill:#f0fff0,stroke:#333
style RR fill:#fce4ec,stroke:#333
style SR fill:#ede7f6,stroke:#333
Layer Descriptions
Web Layer
Responsibility: HTTP request handling, response formatting, authentication
Components:
api.py- FastAPI application initializationrouters/- Endpoint definitions organized by domaindependencies.py- Dependency injection (auth, database sessions)schemas/- Domain-organized Pydantic request/response modelsMiddleware - CORS, security headers, logging
Service Layer
Responsibility: Business logic, orchestration, data validation
Components:
Resource Services (services/resources/)
document_collection_service.py- Document collection from EU Cellardocument_parsing_service.py- Document parsing with tulitdocument_metadata_service.py- SPARQL-based document discovery, metadata enrichment, and CELEX metadata extractiondocument_conversion.py- Document format conversiondocument_utils.py- Document processing utilitieslegal_resource_service.py- Legal resource CRUDprovision_service.py- Legal provision CRUDclassification_service.py- Legal Provision Classification CRUDanalysis_service.py- Analysis CRUDcategory_service.py- Category CRUDstatement_service.py- Statement generationexceptions.py- Resource service exceptions
Engine Services (services/engine/)
pipeline_service.py- Pipeline lookup, orchestration and executiontoken_usage_service.py- LLM token usage trackingembedding_service.py- Document embedding generationhaystack_enterprise_service.py- Haystack Enterprise API integrationexceptions.py- Engine service exceptions
Haystack Integration (services/engine/haystack/)
components/- Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)base.py- Base component classdependency_parsing_classifier.py- Dependency parsing classifierjson_extractor.py- JSON extractorllm_analysis_parser.py- LLM analysis parserllm_classification_parser.py- LLM classification parsermodel_loader.py- Model loadersingle_classifier.py- Single classifier
streaming.py- Utilities for streaming Haystack pipeline results
All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by pipeline_service. This consolidates Haystack-specific logic under services/engine/haystack/ while keeping orchestration and execution concerns in services/engine.
Bulk Services (services/bulk/)
orchestrator.py- Bulk operation orchestration (create, execute, update, delete)types.py- Bulk operation type definitionsresource_handlers/- Resource-specific bulk handlersanalysis.py- Analysis bulk operationsbase.py- Base bulk handler classclassification.py- Classification bulk operationslegal_resource.py- Legal resource bulk operations
Data Layer
Responsibility: Data access, ORM mapping, database operations
Components:
db/models/- SQLAlchemy models organized by domaindb/repositories/- Data access patternsdb/migrations/- Alembic migration scriptsdb/database.py- Database connection and session managementdb/base.py- SQLAlchemy base model and type exports
Task Queue & Workers
Responsibility: Asynchronous job processing, background tasks
Components:
tasks/celery_worker.py- Celery application configurationtasks/tasks.py- Task definitionstasks/handler.py- Task execution handlerstasks/factory.py- Task factory patterntasks/types.py- Task status and record types
Authentication & Security
Responsibility: User authentication, authorization, security
Components:
auth/security.py- JWT token generation/validationauth/backends/- Authentication backends (JWT, OIDC)auth/providers/- Identity providers (standard, EULogin)
Utilities Layer
Responsibility: Cross-cutting concerns, helper functions
Components:
utils/sparql_utils.py- SPARQL query executionutils/refresh_token_utils.py- Token utilitiesutils/serialization.py- Serialization helpersutils/utils.py- General utilities
Technology Integration Points
External APIs
OpenAI-compatible API: LLM-based text analysis and annotation
SPARQL Endpoints: Knowledge graph queries (e.g., Cellar)
Databases
PostgreSQL: Primary data store
Message Queues
Celery + Redis: Asynchronous task processing
Configuration Management
Configuration is managed through:
Environment Variables (
.envfile)Config JSON (
config.jsonfor paths)Database Configuration (Alembic for migrations)
Logging & Monitoring
Logging
Structured logging to
logs/ai4drpm.logLog rotation (via
logrotate.conf)Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
Deployment Architecture
Docker Compose Deployment
Multi-container setup
Separate containers for: API, Worker, PostgreSQL, Redis
Volume mounts for persistence
Network isolation