System Architecture Overview
Introduction
AI4DRPM follows a layered architecture pattern with clear separation of concerns.
Architectural Layers
graph TB
subgraph "Web Layer"
A[FastAPI REST API]
B[API Routers]
C[Middleware]
end
subgraph "Service Layer"
D[Resource Services]
E[Engine Services]
F[Shared Services]
end
subgraph "Asynchronous Tasks"
M[Celery Workers]
N[Task Queue<br/>Redis]
end
subgraph "Data Layer"
L[Repositories]
K[SQLAlchemy ORM]
O[(PostgreSQL)]
end
subgraph "External Systems"
Q[LLM APIs]
R[EU Cellar SPARQL]
end
C --> A
A --> B
B --> D
B --> E
B --> F
D --> L
F --> L
L --> K
K --> O
B --> M
M --> E
N --> M
E --> Q
D --> R
style O fill:#e1f5ff
style K fill:#fff4e1
Layer Descriptions
1. Web Layer
Responsibility: HTTP request handling, response formatting, authentication
Components:
api.py- FastAPI application initializationrouters/- Endpoint definitions organized by domaindependencies.py- Dependency injection (auth, database sessions)schemas/- Domain-organized Pydantic request/response modelsMiddleware - CORS, security headers, logging
2. Service Layer
Responsibility: Business logic, orchestration, data validation
Components:
Resource Services (services/resources/)
document_collection_service.py- Document collection from EU Cellardocument_parsing_service.py- Document parsing with tulitdocument_metadata_service.py- SPARQL-based document discovery, metadata enrichment, and CELEX metadata extractionlegal_resource_service.py- Legal resource CRUDprovision_service.py- Legal provision CRUDclassification_service.py- Legal Provision Classification CRUDanalysis_service.py- Analysis CRUDcategory_service.py- Category CRUDstatement_service.py- Statement generationtoken_usage_service.py- LLM token usage tracking
Engine Services (services/engine/)
pipeline_service.py- Pipeline lookup, orchestration and execution repository.prompt_service.py- Prompt CRUDtraining_service.py- Model training and evaluation orchestration
Haystack Integration (services/haystack/)
components/- Haystack components used in pipelines (retrievers, classifiers, parsers, custom processors)component_registry.py- Registry for available Haystack components and factoriesconfig.py- Haystack configuration managementdocument_contract.py- Utilities for converting between HaystackDocumentobjects and internal data modelspipeline_validator.py- Pipeline definition validation and graph checksstreaming.py- Utilities for streaming Haystack pipeline resultsutils.py- Haystack-related utilities
All the workflows are implemented as Haystack pipelines composed from the above components and orchestrated by pipeline_service. This consolidates Haystack-specific logic under services/haystack/ while keeping orchestration and execution concerns in services/engine.
3. Data Layer
Responsibility: Data access, ORM mapping, database operations
Components:
db/models/- SQLAlchemy models organized by domaindb/repositories/- Data access patternsdb/migrations/- Alembic migration scriptsdb/database.py- Database connection and session management
4. Task Queue & Workers
Responsibility: Asynchronous job processing, background tasks
Components:
tasks/celery_worker.py- Celery application configurationtasks/tasks.py- Task definitionstasks/handler.py- Task execution handlerstasks/factory.py- Task factory patterntasks/utils.py- Task utilitiestasks/types.py- Task status and record types
5. Authentication & Security
Responsibility: User authentication, authorization, security
Components:
auth/security.py- JWT token generation/validation
6. Utilities Layer
Responsibility: Cross-cutting concerns, helper functions
Components:
utils/sparql_utils.py- SPARQL query executionutils/refresh_token_utils.py- Token utilitiesutils/serialization.py- Serialization helpersutils/utils.py- General utilities
Technology Integration Points
External APIs
OpenAI-compatible API: LLM-based text analysis and annotation
SPARQL Endpoints: Knowledge graph queries (e.g., Cellar)
Databases
PostgreSQL: Primary data store
Message Queues
Celery + Redis: Asynchronous task processing
Configuration Management
Configuration is managed through:
Environment Variables (
.envfile)Config JSON (
config.jsonfor paths)Database Configuration (Alembic for migrations)
Logging & Monitoring
Logging
Structured logging to
logs/ai4drpm.logLog rotation (via
logrotate.conf)Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
Deployment Architecture
Docker Compose Deployment
Multi-container setup
Separate containers for: API, Worker, PostgreSQL, Redis
Volume mounts for persistence
Network isolation