Apply for this role

Infrastructure & Deployment Architect (DevOps Engineer)

As Infrastructure & Deployment Architect at Bildup AI, you build the backbone that powers AI education at scale. You'll design, deploy, and optimize the cloud infrastructure and deployment systems that ensure our platform runs reliably, securely, and efficiently for millions of learners across Africa. This is a role for someone who sees infrastructure as code, treats reliability as a feature, and understands that when students depend on your platform to learn, downtime isn't just an inconvenience—it's a barrier to education.

Location (On-site)

Enugu

Key Responsibilities

  • Design and manage cloud infrastructure—architecting scalable, resilient systems on AWS, Google Cloud, or Azure that can grow from thousands to millions of learners without breaking while managing infrastructure as code using Terraform, Ansible, or CloudFormation to define, version, and deploy infrastructure in a repeatable, auditable way.
  • Build and maintain CI/CD pipelines—automating deployments, testing, and releases to enable rapid, reliable delivery of new features and improvements to production.
  • Implement containerization and orchestration—managing Docker containers and Kubernetes clusters that enable efficient resource utilization and seamless scaling of AI services.
  • Ensure platform security and compliance—implementing security best practices, managing access controls, conducting security audits, and ensuring compliance with data protection regulations.
  • Monitor system health, performance, and optimize infrastructure costs—setting up comprehensive monitoring, logging, and alerting systems (Prometheus, Grafana, ELK Stack) while analyzing resource utilization, right-sizing instances, implementing auto-scaling, and ensuring maximum value without wasting resources.
  • Ensure high availability and disaster recovery—implementing backup strategies, failover systems, and recovery procedures that minimize downtime and protect learner data.
  • Respond to incidents and conduct post-mortems—troubleshooting production issues, leading incident response, conducting root cause analysis, and implementing preventive measures.
  • Automate operational workflows—scripting routine tasks (Python, Bash), eliminating manual processes, and building self-service tools that empower development teams to ship faster.

Ideal Candidate

  • 3+ years of experience in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure roles with proven expertise in cloud platforms (AWS, Google Cloud Platform, or Azure) and their core services.
  • Strong knowledge of CI/CD pipelines, containerization (Docker, Kubernetes), and automation tools with proficiency in scripting languages (Python, Bash) and infrastructure as code tools (Terraform, Ansible, CloudFormation).
  • Experience with monitoring, logging, alerting systems, and performance optimization.
  • Deep understanding of security best practices, networking, and system administration.
  • Problem-solver who thrives on optimizing systems, improving reliability, and preventing incidents.
  • Experience supporting AI/ML infrastructure and workloads is a strong plus.
  • Passion for building scalable, secure infrastructure that empowers learners across Africa.
  • Commitment to a full-time, onsite role and long-term organizational growth.

What we offer

  • Growth Opportunities: Invest in your career with us and grow alongside a team of top innovators
  • Creative Freedom: Work in an environment that values creativity and gives you the autonomy to bring bold ideas to life
  • Exciting Work Environment: Thrive in a dynamic, collaborative, and inspiring environment where innovation is encouraged
  • Competitive Compensation: including Relocation and Accommodation Support (if needed)
BildUp - Personalized Learning Platform