Overview =================== The Data Layer implements the persistence and data access patterns for the AI4DRPM system using SQLAlchemy ORM Architecture ------------ Layer Organization ~~~~~~~~~~~~~~~~~~ The data layer follows clean architecture principles: .. code-block:: text ┌─────────────────────────────────────┐ │ Service Layer (Business Logic) │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Repository Layer (Data Access) │ │ - BaseRepository │ │ - Domain Repositories │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ Model Layer (ORM Entities) │ │ - SQLAlchemy Models │ │ - Relationships │ └─────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ PostgreSQL Database │ └─────────────────────────────────────┘ Domain Organization ~~~~~~~~~~~~~~~~~~~ Models are organized by business domain: * **auth** - User authentication and authorization * **resources** - Legal documents and provisions * **analyses** - Various analysis types (description, data, solutions, interoperability) * **categories** - Hierarchical category taxonomy * **classifiers** - ML model metadata and training data * **prompts** - LLM prompt templates * **queries** - SPARQL query templates * **tasks** - Celery task tracking * **token_usage** - LLM token consumption tracking Entity Relationship Diagram ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. mermaid:: erDiagram %% Authentication Domain User ||--o{ RefreshToken : "has" User ||--o{ Prompt : "creates" User ||--o{ Pipeline : "creates" User ||--o{ SparqlQuery : "creates" User ||--o{ TokenUsage : "incurs" %% Pipeline Domain Pipeline ||--o{ PipelineExecution : "has" Pipeline ||--o{ TokenUsage : "tracks" PipelineExecution ||--o{ TokenUsage : "generates" %% Category Domain (Self-referencing and relationships) Category ||--o{ Category : "parent/subcategories" Category ||--o{ ClassifierModel : "has" Category ||--o{ ClassifierTrainingResult : "has" %% Many-to-Many: Category with Classifications Category }o--o{ LegalProvisionClassification : "via legal_provision_classification_categories" %% Legal Resources Domain LegalResource ||--o{ LegalProvision : "contains" LegalProvision ||--o{ LegalProvisionClassification : "classified_by" %% Many-to-Many: LegalProvisionClassification with all 5 Analysis types LegalProvisionClassification }o--o{ DescriptionAnalysis : "via description_analysis_provisions" LegalProvisionClassification }o--o{ DataAnalysis : "via data_analysis_provisions" LegalProvisionClassification }o--o{ SolutionsAnalysis : "via solutions_analysis_provisions" LegalProvisionClassification }o--o{ InteroperabilityAnalysis : "via interoperability_analysis_provisions" LegalProvisionClassification }o--o{ DataFlowsAnalysis : "via data_flows_analysis_provisions" %% Classifier/ML Model Domain ClassifierModel ||--o{ ClassifierTrainingResult : "has training results" %% Key Entity Definitions User { int id PK string username UK string email UK boolean is_admin } RefreshToken { int id PK string token UK int user_id FK datetime expires_at boolean is_revoked } Category { int id PK string name UK string category_type int parent_id FK boolean is_active } LegalResource { string celex PK string eli string preface int year string act_type } LegalProvision { int id PK string eId string article_heading string text string legal_resource_celex FK } LegalProvisionClassification { int id PK string pipeline string legal_provision_eId FK string legal_resource_celex FK string explanation boolean validated } DescriptionAnalysis { int id PK json eId text description json high_level_process json actors boolean validated } DataAnalysis { int id PK json eId string explanation string type_of_data string description json standard boolean validated } SolutionsAnalysis { int id PK json eId string explanation string digital_solution json functionalities json responsible_actor boolean validated } InteroperabilityAnalysis { int id PK json eId string digital_public_service string description boolean cross_border_interaction json legal_interoperability_measures json organisational_interoperability_measures json semantic_interoperability_measures json technical_interoperability_measures boolean validated } DataFlowsAnalysis { int id PK json eId string addresser string action string action_result string addressee boolean validated } ClassifierModel { int id PK string name UK int category_id FK boolean is_active } ClassifierTrainingResult { int id PK string classifier_name int category_id FK float f1_score datetime training_date } Prompt { int id PK string name UK text content text system_prompt string version int created_by FK } SparqlQuery { int id PK string name UK text query_template string endpoint_url string category json parameters_schema json result_mapping int timeout_seconds boolean is_active int created_by FK } Pipeline { int id PK string name UK text description json definition int version boolean is_active int created_by FK } PipelineExecution { uuid id PK int pipeline_id FK int pipeline_version string status json input_data json output_data text error_message string webhook_url int document_count datetime started_at datetime completed_at int created_by FK } TokenUsage { int id PK uuid execution_id FK int user_id FK int pipeline_id FK int prompt_tokens int completion_tokens int total_tokens string model string provider float estimated_cost_usd string component_name } Task { string id PK string name string status json args json kwargs json result string error_message int progress } **Database Statistics:** * **Core Tables**: 19 * **Association Tables**: 6 * **Total Relations**: 25 For detailed field descriptions, see individual :doc:`model documentation `. Migrations ~~~~~~~~~~ Schema migrations are managed with `Alembic `_. Migration files are in ``ai4drpm/db/migrations/versions/``. .. code-block:: bash alembic upgrade head # Apply migrations alembic revision --autogenerate -m "desc" # Create migration Session Management ~~~~~~~~~~~~~~~~~~ Sessions are managed through FastAPI dependency injection: .. code-block:: python from ai4drpm.db.database import get_db from fastapi import Depends @app.get("/resources") def get_resources(db: Session = Depends(get_db)): # Session automatically created and closed repo = LegalResourceRepository(db) return repo.get_multi()