Project Description:
Digital Business (DBIZ) is the innovation and engineering unit within Deutsche Telekom Technik. We develop scalable GenAI and agentic AI solutions for Telekom’s technical domains—ranging from fiber rollout, construction supervision, and regulatory processes to multi-agent systems with access to complex network data.
With T-GAIA, our internal GenAI platform, we provide shared enablers, including LLM endpoints, RAG chatbots, agentic frameworks (LangGraph, LangChain, MCP), as well as end-to-end applications such as TextmAIner, AI@Behördenkommunikation, FiberChatbot, and the multi-agent system “00Site”.
Our mission: to enable AI-driven transformation within technology and to scale productive AI securely, reliably, and in close collaboration with domain experts.
About the Role:
We are looking for a motivated DevOps Engineer to join our team and support the operation and further development of our T-GAIA Chatbot Framework and the Chatbots. This role combines administrative and technical responsibilities, focusing on ensuring smooth operations, monitoring system performance and implementing improvements to optimize application reliability and accuracy.
Your Responsibilities:
- Set up, debug, and deploy T-GAIA Chatbots as a cloud service and other GenAI products and enablers
- Maintain and monitor the Chatbots and the underlying framework and pipelines
- Monitor and ensure the SLA-Levels of the Chatbots, esp. Capacity and Performance
- Monitor and ensure the SLA-Levles of the underlying T-GAIA Plattform.
- Keep all Documentation on current status
- Ensure Operation by supporting Incident-, Problem- and Changemanagement processes
- Continuously improve Monitoring and Alarming Capabilities of the T-GAIA Framework and the hosted Chatbots
- Support the solving of Service Desk Tickets from our internal customers
- Deploy and manage use cases through GitLab pipelines.
- Perform administrative activities such as user management and technical activities such as network configuration, IaC deployment and service monitoring.
- Set up, maintain, and administer Jira Service Desk and internal Wiki/Documentation systems
- Manage internal customer tickets (own organization + CCOE)
- Handle internal customer communication: updates, roadmap communication, etc.
- Assist with MS Azure Cloud tooling and pipelines (Deploying, Monitoring, Logging, Security, IAM)
- Develop baseline operational standards for hosting/support (e.g., SLAs, alerts, support coverage)
- Perform quality assurance checks for billing accuracy, security posture, and compliance
- Collect internal requirements for Platform- and Chatbot Operations, coordinate discussions with CCOE