Job Description
Job Description
Looking for a software engineer specialized in infrastructure and platforms to design, build and maintain the control plane that manages hundreds of sandboxes for developers. The candidate will be responsible for improving the development experience, optimizing resource usage, and ensuring the reliability of large-scale platforms.
Responsibilities:
Design, build and maintain the control plane.
Develop backend tools and services that automate creation.
Productize and offer infrastructure as a platform, including databases, caches, object storage and queuing systems.
Scale and tune the cluster to provide capacity for volatile workloads, working with Karpenter/VPA/KEDA policies, sizing, pod density, bin-packing and scheduling strategies for diverse and variable loads.
Constantly monitor and optimize costs to be efficient, considering that at this scale every decision, from computing to storage and data transfer, is relevant.
Manage network configuration to replicate the production environment, improving the development experience at scale across hundreds of sandboxes, including traffic routing, intersection, revenue adjustment, and resource distribution.
Implement and evolve observability within sandboxes.
Collaborate with the SRE team to ensure availability and reliability.
Work with Deel engineering to improve the development experience and convert daily needs into a self-service platform.
Requirements:
More than 8 years of experience in Software Engineering, Infrastructure or Platform Engineering.
Backend engineering skills: API design, Postgres, Kafka/Nats.
Experience with Node.js, Go or Python.
Experience in AWS, GCP or Azure.
Advanced experience with Kubernetes, including creating tools, controllers, or operators that extend its capabilities.
Experience with standard Kubernetes tools:
Networks: ingress controllers, CoreDNS, external-dns, AWS LBC, oauth2-proxy.
Secrets management: Vault, External Secrets.
Cluster autoscaling and resource tuning: Karpenter, VPA, goldilocks, KEDA.
Storage provisioning.
Experience with Helm charts and GitOps.
Experience in maintaining large Kubernetes clusters - 1000 nodes, 30k+ pods.
Skills in system design and problem solving.
Communication and collaboration capabilities between multiple teams.
Salary to receive
To agree