Site Reliability Engineer (SRE)
Summary
| Title: | Site Reliability Engineer (SRE) |
|---|---|
| ID: | 10294 |
| Department: | Information Technology |
| Location : | Vienna, VA |
Description
Clearance: Minimum Active Clearance
Job Details:
The AWS Site Reliability Engineer (SRE) is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team. You prepare and take ownership of "day two" operations, focusing on observability, incident response, and capacity planning. You will design and implement comprehensive monitoring solutions using tools like AWS CloudWatch to track the health of Databricks clusters, job performance, and underlying AWS resources.
Your goal is to minimize downtime and inefficiencies (manual, repetitive work) by automating operational tasks and recovery procedures. You will define and track Service Level Objectives (SLOs) to balance reliability with innovation as well as create the operations Service Operating Procedures (SOPs).
Skills:
CloudWatch, performance tuning in cloud environments, IaC tools, Databricks management and performance instrumentation
Job Details:
The AWS Site Reliability Engineer (SRE) is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team. You prepare and take ownership of "day two" operations, focusing on observability, incident response, and capacity planning. You will design and implement comprehensive monitoring solutions using tools like AWS CloudWatch to track the health of Databricks clusters, job performance, and underlying AWS resources.
Your goal is to minimize downtime and inefficiencies (manual, repetitive work) by automating operational tasks and recovery procedures. You will define and track Service Level Objectives (SLOs) to balance reliability with innovation as well as create the operations Service Operating Procedures (SOPs).
Skills:
CloudWatch, performance tuning in cloud environments, IaC tools, Databricks management and performance instrumentation

