Technology Stack ================ Core Technologies ---------------- Backend Framework ~~~~~~~~~~~~~~~~ **FastAPI (v0.115+)** - **Purpose**: Web framework for building APIs - **Why**: High performance, automatic API documentation, type hints, async support - **Features Used**: - Automatic OpenAPI schema generation - Pydantic data validation - Dependency injection - Background tasks - Security utilities (OAuth2, JWT) Programming Language ~~~~~~~~~~~~~~~~~~ **Python 3.12+** - **Why**: Rich ecosystem, excellent AI/ML libraries, readability - **Key Libraries**: - Type hints for better code quality - Async/await for concurrency - Dataclasses for structured data Database & Persistence --------------------- Relational Database ~~~~~~~~~~~~~~~~~~ **PostgreSQL 12+** - **Purpose**: Primary data store - **Why**: ACID compliance, advanced features, JSON support, full-text search - **Features Used**: - Complex queries and joins - JSON/JSONB columns - Foreign key constraints - Database triggers and events ORM ~~~ **SQLAlchemy 2.0+** - **Purpose**: Object-Relational Mapping - **Why**: Powerful query API, relationship handling, migration support - **Features Used**: - Declarative models - Relationship configurations - Session management - Query optimization Database Migrations ~~~~~~~~~~~~~~~~~~ **Alembic 1.16+** - **Purpose**: Database schema versioning - **Why**: Track changes, rollback capability, team collaboration - **Usage**: - Auto-generate migrations from model changes - Version control for database schema - Upgrade/downgrade paths Caching & Message Broker ------------------------ **Redis 6.4+** - **Purpose**: Caching, session storage, message broker - **Why**: In-memory speed, pub/sub messaging, data structures - **Use Cases**: - Celery task queue broker - Celery result backend - Session caching - Application-level caching Asynchronous Task Processing --------------------------- **Celery 5.5+** - **Purpose**: Distributed task queue - **Why**: Async job processing, scheduling, retry mechanisms - **Features Used**: - Task queuing and execution - Task chaining and grouping - Periodic tasks - Result tracking - Retry logic AI & Machine Learning -------------------- Large Language Models ~~~~~~~~~~~~~~~~~~~~ **OpenAI-compatible API (v1.75+)** - **Purpose**: LLM-based text analysis and annotation - **Why**: Uniform interface for multiple LLM providers - **Use Cases**: - Legal text annotation - Provision classification - Entity extraction - Text generation Machine Learning ~~~~~~~~~~~~~~ **scikit-learn 1.6+** - **Purpose**: Traditional ML classification - **Why**: Proven algorithms, easy training, lightweight - **Features Used**: - Text vectorization (TF-IDF) - Classification algorithms (SVM, Random Forest, Logistic Regression) - Model persistence (joblib) - Cross-validation **joblib 1.5+** - **Purpose**: Model serialization - **Why**: Efficient storage of trained models - **Usage**: Saving/loading trained classifiers Natural Language Processing ~~~~~~~~~~~~~~~~~~~~~~~~~~ **spaCy 3.8+** - **Purpose**: NLP preprocessing - **Why**: Fast, production-ready, pre-trained models - **Models Used**: - ``en_core_web_sm`` - English language model - **Use Cases**: - Text tokenization - Named entity recognition - Part-of-speech tagging - Dependency parsing Semantic Web & RDF ----------------- **SPARQL** - **Purpose**: RDF query language - **Why**: Standard for querying knowledge graphs - **Usage**: Querying EUR-Lex and other SPARQL endpoints via HTTP clients Authentication & Security ------------------------ **JWT (PyJWT) 2.10+** - **Purpose**: JSON Web Token handling - **Why**: Stateless authentication, standard-compliant - **Features Used**: - Token generation - Token validation - Expiration handling - Custom claims HTTP & Networking ---------------- **httpx** - **Purpose**: HTTP client - **Why**: Modern, async-capable, HTTP/2 support - **Usage**: External API calls **python-dotenv 1.1+** - **Purpose**: Environment variable management - **Why**: Easy configuration management - **Usage**: Loading ``.env`` files Document Processing ------------------ **python-docx 1.1+** - **Purpose**: Microsoft Word document processing - **Why**: Read/write DOCX files programmatically - **Usage**: Extracting text from legal documents Development Tools ---------------- Testing ~~~~~~~ **pytest 8.3+** - **Purpose**: Testing framework - **Why**: Simple syntax, powerful fixtures, extensive plugins - **Plugins Used**: - ``pytest-asyncio`` - Async test support - ``pytest-cov`` - Coverage reporting - ``requests-mock`` - HTTP mocking **coverage 7.9+** - **Purpose**: Code coverage analysis - **Why**: Identify untested code - **Usage**: HTML and terminal coverage reports Documentation ~~~~~~~~~~~~ **Sphinx 8.2+** - **Purpose**: Documentation generation - **Why**: Python standard, extensible, multiple output formats - **Extensions**: - ``sphinx.ext.autodoc`` - Auto-generate from docstrings - ``sphinx.ext.napoleon`` - Google/NumPy docstring support - ``sphinx.ext.viewcode`` - Link to source code - ``sphinx.ext.intersphinx`` - Cross-project linking **MyST Parser 4.0+** - **Purpose**: Markdown support in Sphinx - **Why**: Write docs in Markdown - **Features**: CommonMark + extensions **Sphinx RTD Theme 3.0+** - **Purpose**: Documentation theme - **Why**: Clean, responsive, professional Dependency Management ~~~~~~~~~~~~~~~~~~~~~ **Poetry** - **Purpose**: Dependency and environment management - **Why**: Deterministic builds, lock files, virtual env management - **Usage**: - ``pyproject.toml`` - Dependency specification - ``poetry.lock`` - Lock file for reproducibility Data Validation -------------- **Pydantic** - **Purpose**: Data validation using Python type hints - **Why**: Runtime validation, JSON schema generation, editor support - **Usage**: - API request/response models - Configuration validation - Data serialization **jsonschema 4.25+** - **Purpose**: JSON schema validation - **Why**: Standard-compliant validation - **Usage**: Validating complex JSON structures Containerization -------------- **Docker** - **Purpose**: Application containerization - **Why**: Consistent environments, easy deployment - **Usage**: - ``Dockerfile`` for backend image - Multi-stage builds for optimization **Docker Compose** - **Purpose**: Multi-container orchestration - **Why**: Local development, integration testing - **Services**: - Backend API - Celery Worker - PostgreSQL - Redis Web Server (Production) ---------------------- **Uvicorn** - **Purpose**: ASGI server - **Why**: High performance, WebSocket support - **Usage**: Running FastAPI application **Gunicorn (Optional)** - **Purpose**: Process manager - **Why**: Multi-worker management, automatic restarts - **Usage**: Production deployment with Uvicorn workers Monitoring & Logging ------------------- **Python logging** - **Purpose**: Application logging - **Why**: Built-in, configurable, handlers - **Configuration**: Custom formatting, rotation