Technology Stack

Core Technologies

Backend Framework

FastAPI (v0.115+)

  • Purpose: Web framework for building APIs

  • Why: High performance, automatic API documentation, type hints, async support

  • Features Used:

    • Automatic OpenAPI schema generation

    • Pydantic data validation

    • Dependency injection

    • Background tasks

    • Security utilities (OAuth2, JWT)

Programming Language

Python 3.12+

  • Why: Rich ecosystem, excellent AI/ML libraries, readability

  • Key Libraries:

    • Type hints for better code quality

    • Async/await for concurrency

    • Dataclasses for structured data

Database & Persistence

Relational Database

PostgreSQL 12+

  • Purpose: Primary data store

  • Why: ACID compliance, advanced features, JSON support, full-text search

  • Features Used:

    • Complex queries and joins

    • JSON/JSONB columns

    • Foreign key constraints

    • Database triggers and events

ORM

SQLAlchemy 2.0+

  • Purpose: Object-Relational Mapping

  • Why: Powerful query API, relationship handling, migration support

  • Features Used:

    • Declarative models

    • Relationship configurations

    • Session management

    • Query optimization

Database Migrations

Alembic 1.16+

  • Purpose: Database schema versioning

  • Why: Track changes, rollback capability, team collaboration

  • Usage:

    • Auto-generate migrations from model changes

    • Version control for database schema

    • Upgrade/downgrade paths

Caching & Message Broker

Redis 6.4+

  • Purpose: Caching, session storage, message broker

  • Why: In-memory speed, pub/sub messaging, data structures

  • Use Cases:

    • Celery task queue broker

    • Celery result backend

    • Session caching

    • Application-level caching

Asynchronous Task Processing

Celery 5.5+

  • Purpose: Distributed task queue

  • Why: Async job processing, scheduling, retry mechanisms

  • Features Used:

    • Task queuing and execution

    • Task chaining and grouping

    • Periodic tasks

    • Result tracking

    • Retry logic

AI & Machine Learning

Large Language Models

OpenAI-compatible API (v1.75+)

  • Purpose: LLM-based text analysis and annotation

  • Why: Uniform interface for multiple LLM providers

  • Use Cases:

    • Legal text annotation

    • Provision classification

    • Entity extraction

    • Text generation

Machine Learning

scikit-learn 1.6+

  • Purpose: Traditional ML classification

  • Why: Proven algorithms, easy training, lightweight

  • Features Used:

    • Text vectorization (TF-IDF)

    • Classification algorithms (SVM, Random Forest, Logistic Regression)

    • Model persistence (joblib)

    • Cross-validation

joblib 1.5+

  • Purpose: Model serialization

  • Why: Efficient storage of trained models

  • Usage: Saving/loading trained classifiers

Natural Language Processing

spaCy 3.8+

  • Purpose: NLP preprocessing

  • Why: Fast, production-ready, pre-trained models

  • Models Used:

    • en_core_web_sm - English language model

  • Use Cases:

    • Text tokenization

    • Named entity recognition

    • Part-of-speech tagging

    • Dependency parsing

Semantic Web & RDF

rdflib

  • Purpose: RDF graph manipulation

  • Why: Python-native, comprehensive RDF support

  • Features Used:

    • Graph creation and manipulation

    • Turtle/JSON-LD serialization

    • SPARQL query execution

    • Namespace management

SPARQL

  • Purpose: RDF query language

  • Why: Standard for querying knowledge graphs

  • Usage: Querying EUR-Lex and other SPARQL endpoints

SaxonC (saxonche) 12.6+

  • Purpose: XSLT transformations

  • Why: Industry-standard XML processing

  • Usage: Transforming legal documents (FMX to AKN-LEOS)

Authentication & Security

JWT (PyJWT) 2.10+

  • Purpose: JSON Web Token handling

  • Why: Stateless authentication, standard-compliant

  • Features Used:

    • Token generation

    • Token validation

    • Expiration handling

    • Custom claims

bcrypt 4.1+

  • Purpose: Password hashing

  • Why: Industry-standard, resistant to rainbow tables

  • Usage: Secure password storage

HTTP & Networking

httpx

  • Purpose: HTTP client

  • Why: Modern, async-capable, HTTP/2 support

  • Usage: External API calls

python-dotenv 1.1+

  • Purpose: Environment variable management

  • Why: Easy configuration management

  • Usage: Loading .env files

Document Processing

python-docx 1.1+

  • Purpose: Microsoft Word document processing

  • Why: Read/write DOCX files programmatically

  • Usage: Extracting text from legal documents

Development Tools

Testing

pytest 8.3+

  • Purpose: Testing framework

  • Why: Simple syntax, powerful fixtures, extensive plugins

  • Plugins Used:

    • pytest-asyncio - Async test support

    • pytest-cov - Coverage reporting

    • requests-mock - HTTP mocking

coverage 7.9+

  • Purpose: Code coverage analysis

  • Why: Identify untested code

  • Usage: HTML and terminal coverage reports

Documentation

Sphinx 8.2+

  • Purpose: Documentation generation

  • Why: Python standard, extensible, multiple output formats

  • Extensions:

    • sphinx.ext.autodoc - Auto-generate from docstrings

    • sphinx.ext.napoleon - Google/NumPy docstring support

    • sphinx.ext.viewcode - Link to source code

    • sphinx.ext.intersphinx - Cross-project linking

MyST Parser 4.0+

  • Purpose: Markdown support in Sphinx

  • Why: Write docs in Markdown

  • Features: CommonMark + extensions

Sphinx RTD Theme 3.0+

  • Purpose: Documentation theme

  • Why: Clean, responsive, professional

Dependency Management

Poetry

  • Purpose: Dependency and environment management

  • Why: Deterministic builds, lock files, virtual env management

  • Usage:

    • pyproject.toml - Dependency specification

    • poetry.lock - Lock file for reproducibility

Data Validation

Pydantic

  • Purpose: Data validation using Python type hints

  • Why: Runtime validation, JSON schema generation, editor support

  • Usage:

    • API request/response models

    • Configuration validation

    • Data serialization

jsonschema 4.25+

  • Purpose: JSON schema validation

  • Why: Standard-compliant validation

  • Usage: Validating complex JSON structures

Containerization

Docker

  • Purpose: Application containerization

  • Why: Consistent environments, easy deployment

  • Usage:

    • Dockerfile for backend image

    • Multi-stage builds for optimization

Docker Compose

  • Purpose: Multi-container orchestration

  • Why: Local development, integration testing

  • Services:

    • Backend API

    • Celery Worker

    • PostgreSQL

    • Redis

Web Server (Production)

Uvicorn

  • Purpose: ASGI server

  • Why: High performance, WebSocket support

  • Usage: Running FastAPI application

Gunicorn (Optional)

  • Purpose: Process manager

  • Why: Multi-worker management, automatic restarts

  • Usage: Production deployment with Uvicorn workers

Monitoring & Logging

Python logging

  • Purpose: Application logging

  • Why: Built-in, configurable, handlers

  • Configuration: Custom formatting, rotation