Engineering

Senior Site Reliability Engineer

Coimbatore
Work Type: Full Time

Essential Responsibilities:


  • Partner with product owners and business SMEs to analyze the business needs and improve support ability, scalability and recovery for the engineered solution.
  • Ensure that the overall technical solution is aligned with the business needs and operational teams methodologies
  • Drive the improvement of service availability to reduce the mean time to recovery using automation.
  • Develop methods for autonomous recovery and self-repairing systems. Ensure the solution is consistent with RFPIO architecture, design and development standards
  • Coordinate and plan system releases and hotfixes.
  • Develop methods that allow simplified triage following a set of checklists, run books and standard operating procedures.
  • Make adjustments to adopt new methodologies that provide the business with increased flexibility and agility
  • Support software development by providing operational improvements to non-functional requirements.
  • Develop enhancements to improve service levels by leveraging key performance indicators consisting of monitoring, non-functional testing and availability reports.
  • Provide a service-focused approach leveraging continuous process improvement.
  • Participate in chaos testing to improve system resiliency. Mentor other engineers. Provide overall technical leadership to smaller working teams as needed
  • Stay current with latest development tools, technology ideas, patterns and methodologies; share knowledge by clearly articulating results and ideas to key stakeholders

Experience:


  • At least 3 to 5 years in a Site Reliability Engineering, DevOps, or Infrastructure focused role
  • Experience supporting internet-facing production services and distributed systems
  • Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana or Prometheus
  • Coding experience using a high-level programming languages like: Java, or Python
  • Automation advocate - you truly believe in removing operational load via software
  • A strong sense of ownership.
  • Experience managing, scaling, and troubleshooting Java applications
  • Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc)
  • An understanding of a variety of software service deployment packaging, strategies, and tooling
  • Working understanding of common authentication schemes, certificates, and securely managing secrets
  • Capable of designing and implementing automated configuration management processes for repeatable and consistent service deployment

Education:


  • BS or MS in Computer Science or equivalent industry experience

Knowledge,  Skills & Ability:


  • Prior experience as an SRE, software engineer, DevOps Engineer, or system administrator
  • Experience in system automation technology, such as Ansible
  • Experience in container technologies
  • Experience using cloud services.

Submit Your Application

You have successfully applied
  • You have errors in applying