Experience Summary
5–10 years of Data Engineer specialized in building document and knowledge-oriented data pipelines for regulatory/compliance domains, with strong capabilities in structured transformations, knowledge graphs, and containerized platform integration.
Core Responsibilities / Focus
Build and operate data ingestion and transformation pipelines for legal/regulatory content
Normalize and transform heterogeneous source formats (e.g., XML/HTML/structured exports) using tools such as XSLT
Implement pipelines for embeddings generation, indexing, and enrichment for downstream AI/RAG systems
Design and manage RDF-based knowledge representations and SPARQL-accessible datasets
Integrate storage and processing components across containerized/cloud environments
Support event-driven or integration-heavy workflows (e.g., via Apache Camel, message brokers)
Ensure reproducibility, maintainability, and operational handover of data pipelines
Core Skills (Must-Have)
Python/
Java
Docker / Docker Compose
Kubernetes
Knowledge Graphs (RDF)
SPARQL
XSLT
Embeddings pipelines / vector preparation
Azure Storage (or equivalent cloud storage services)
Apache Camel
Git
Preferred / Nice-to-Have
Docling (or similar Document conversion)
CloudEvents
Kafka (or other message brokers)
Event-based systems / event-driven architecture
Dev Containers
GitOps
Documentation practices
Domain Advantage
Experience processing legal/regulatory source documents and preserving semantic structure / provenance
Familiarity with content domains such as EU regulation, privacy, ESG, and compliance frameworks