Technology Stack
Core Technologies
Backend Framework
FastAPI (v0.115+)
Purpose: Web framework for building APIs
Why: High performance, automatic API documentation, type hints, async support
Features Used:
Automatic OpenAPI schema generation
Pydantic data validation
Dependency injection
Background tasks
Security utilities (OAuth2, JWT)
Programming Language
Python 3.12+
Why: Rich ecosystem, excellent AI/ML libraries, readability
Key Libraries:
Type hints for better code quality
Async/await for concurrency
Dataclasses for structured data
Database & Persistence
Relational Database
PostgreSQL 12+
Purpose: Primary data store
Why: ACID compliance, advanced features, JSON support, full-text search
Features Used:
Complex queries and joins
JSON/JSONB columns
Foreign key constraints
Database triggers and events
ORM
SQLAlchemy 2.0+
Purpose: Object-Relational Mapping
Why: Powerful query API, relationship handling, migration support
Features Used:
Declarative models
Relationship configurations
Session management
Query optimization
Database Migrations
Alembic 1.16+
Purpose: Database schema versioning
Why: Track changes, rollback capability, team collaboration
Usage:
Auto-generate migrations from model changes
Version control for database schema
Upgrade/downgrade paths
Caching & Message Broker
Redis 6.4+
Purpose: Caching, session storage, message broker
Why: In-memory speed, pub/sub messaging, data structures
Use Cases:
Celery task queue broker
Celery result backend
Session caching
Application-level caching
Asynchronous Task Processing
Celery 5.5+
Purpose: Distributed task queue
Why: Async job processing, scheduling, retry mechanisms
Features Used:
Task queuing and execution
Task chaining and grouping
Periodic tasks
Result tracking
Retry logic
AI & Machine Learning
Large Language Models
OpenAI-compatible API (v1.75+)
Purpose: LLM-based text analysis and annotation
Why: Uniform interface for multiple LLM providers
Use Cases:
Legal text annotation
Provision classification
Entity extraction
Text generation
Machine Learning
scikit-learn 1.6+
Purpose: Traditional ML classification
Why: Proven algorithms, easy training, lightweight
Features Used:
Text vectorization (TF-IDF)
Classification algorithms (SVM, Random Forest, Logistic Regression)
Model persistence (joblib)
Cross-validation
joblib 1.5+
Purpose: Model serialization
Why: Efficient storage of trained models
Usage: Saving/loading trained classifiers
Natural Language Processing
spaCy 3.8+
Purpose: NLP preprocessing
Why: Fast, production-ready, pre-trained models
Models Used:
en_core_web_sm- English language model
Use Cases:
Text tokenization
Named entity recognition
Part-of-speech tagging
Dependency parsing
Semantic Web & RDF
rdflib
Purpose: RDF graph manipulation
Why: Python-native, comprehensive RDF support
Features Used:
Graph creation and manipulation
Turtle/JSON-LD serialization
SPARQL query execution
Namespace management
SPARQL
Purpose: RDF query language
Why: Standard for querying knowledge graphs
Usage: Querying EUR-Lex and other SPARQL endpoints
SaxonC (saxonche) 12.6+
Purpose: XSLT transformations
Why: Industry-standard XML processing
Usage: Transforming legal documents (FMX to AKN-LEOS)
Authentication & Security
JWT (PyJWT) 2.10+
Purpose: JSON Web Token handling
Why: Stateless authentication, standard-compliant
Features Used:
Token generation
Token validation
Expiration handling
Custom claims
bcrypt 4.1+
Purpose: Password hashing
Why: Industry-standard, resistant to rainbow tables
Usage: Secure password storage
HTTP & Networking
httpx
Purpose: HTTP client
Why: Modern, async-capable, HTTP/2 support
Usage: External API calls
python-dotenv 1.1+
Purpose: Environment variable management
Why: Easy configuration management
Usage: Loading
.envfiles
Document Processing
python-docx 1.1+
Purpose: Microsoft Word document processing
Why: Read/write DOCX files programmatically
Usage: Extracting text from legal documents
Development Tools
Testing
pytest 8.3+
Purpose: Testing framework
Why: Simple syntax, powerful fixtures, extensive plugins
Plugins Used:
pytest-asyncio- Async test supportpytest-cov- Coverage reportingrequests-mock- HTTP mocking
coverage 7.9+
Purpose: Code coverage analysis
Why: Identify untested code
Usage: HTML and terminal coverage reports
Documentation
Sphinx 8.2+
Purpose: Documentation generation
Why: Python standard, extensible, multiple output formats
Extensions:
sphinx.ext.autodoc- Auto-generate from docstringssphinx.ext.napoleon- Google/NumPy docstring supportsphinx.ext.viewcode- Link to source codesphinx.ext.intersphinx- Cross-project linking
MyST Parser 4.0+
Purpose: Markdown support in Sphinx
Why: Write docs in Markdown
Features: CommonMark + extensions
Sphinx RTD Theme 3.0+
Purpose: Documentation theme
Why: Clean, responsive, professional
Dependency Management
Poetry
Purpose: Dependency and environment management
Why: Deterministic builds, lock files, virtual env management
Usage:
pyproject.toml- Dependency specificationpoetry.lock- Lock file for reproducibility
Data Validation
Pydantic
Purpose: Data validation using Python type hints
Why: Runtime validation, JSON schema generation, editor support
Usage:
API request/response models
Configuration validation
Data serialization
jsonschema 4.25+
Purpose: JSON schema validation
Why: Standard-compliant validation
Usage: Validating complex JSON structures
Containerization
Docker
Purpose: Application containerization
Why: Consistent environments, easy deployment
Usage:
Dockerfilefor backend imageMulti-stage builds for optimization
Docker Compose
Purpose: Multi-container orchestration
Why: Local development, integration testing
Services:
Backend API
Celery Worker
PostgreSQL
Redis
Web Server (Production)
Uvicorn
Purpose: ASGI server
Why: High performance, WebSocket support
Usage: Running FastAPI application
Gunicorn (Optional)
Purpose: Process manager
Why: Multi-worker management, automatic restarts
Usage: Production deployment with Uvicorn workers
Monitoring & Logging
Python logging
Purpose: Application logging
Why: Built-in, configurable, handlers
Configuration: Custom formatting, rotation