Job Description
Coderslab.io is a global company dedicated to transforming and growing our customers' businesses through innovative technological solutions. We have more than 3,000 employees around the world and workshops in Latin America and the United States. Our team is made up of the top 1% of technological talent, working on challenging and innovative projects that boost your career.
Currently looking for an SRE Architect responsible for defining, controlling and implementing advanced solutions to improve architecture and incident management, formalizing incident-derived learning and ensuring measurement, observability and effective monitoring of service levels through tables and technical indicators, with the aim of guaranteeing continued availability, stability and optimal performance of services and technological platforms, based on a proactive and systematic reliability approach (SRE) aligned with the Technological Architecture team.
Job opportunity published on getonbrd.com.
Develop and refine incident management and problem management processes,
establishing clear procedures for identification, root cause analysis and resolution,
minimizing impacts on the business.
Formalize post-incident learning through documentation, indicators and
tables that allow for continued improvement and reduction in recurrence.
Analyze incidents and propose improvements or changes for your application, reconciling them
application areas.
Degree or Systems Engineering, Industrial or similar.
Mastery in Technology or Administration is desirable.
Desirable certifications in SRE, DevOps, Cloud, Observability (Grafana, Elastic),
TOGAF or other related items.
3-4 years in high-impact technological projects.
3-4 years in design and evolution of technological architectures.
Proven experience in building observability and monitoring tables
using:
Grafana
Kibana
Fluent Bit
Elastic Stack / Observability
Desirable experience in SRE, DevOps, ITIL and improved operational practices.
Knowledge of:
Site Reliability Engineering (SRE).
Observability: logs, metrics, alerting and traversability.
Grafana, Kibana, Fluent Bit and Elastic Stack.
Cloud Computing (AWS, Azure, GCP).
DevOps / DevSecOps.
Incident Management, ITIL and Agile Methodologies.
Contracting method:
Service Provision