Senior Manager, Site Reliability Engineering



Software Engineering
United States
Posted on Tuesday, May 2, 2023

Aware is a collaboration intelligence platform that identifies and reduces risk, maintains compliance, and uncovers new business insights from conversations at scale. Consolidate, enrich, search, and manage data across tools like Slack, WorkJam, Teams and Zoom for immediate visibility across the organization. Aware’s comprehensive platform solves common challenges that legal, compliance, information security and IT departments face when rolling out collaboration, including archiving, monitoring, organization insights, DLP, eDiscovery, retention and legal holds. Aware is a Microsoft Gold Partner, Slack eDiscovery and compliance partner and a Workplace from Meta integration partner.

The Senior Manager, Site Reliability Engineering & Operations executes the vision and overseas the staff responsible for ensuring that the critical systems exceed the performance and reliability of Aware’s technology production systems. The right candidate will serve as a champion of service reliability and availability, automation, capacity management and monitoring. He or She will leverage quantitative, and operations engineering techniques to measure success and will function as the catalyst for promoting change and process improvements across the various sports platforms. The Senior Manager, SRE & Operations is also responsible for production security monitoring and remediation. Collaboration is key as you will work directly with the Platform Engineering team to partner and prioritize initiatives while focusing your attention on production systems.


  • Lead, manage and grow a small group of SRE / Infrastructure engineers
  • Work cross functionally with Development teams, Product and Customer success to provide service metrics
  • Compile KPIs and evangelize the adoption of best practices in relation to performance and reliability across the organization;
  • Embed into SRE projects and on-call rotations to keep your skills sharp and stay close to the operational workflows and issues;
  • Promote a healthy and functional work environment;
  • Maintain project and operational workload statistics;
  • Provide a solid foundation for building and maintaining successful SRE teams.


  • Exposure to Cloud, SaaS, and virtualization concepts and performance concerns;
  • Experience with stream-processing open-source frameworks/systems, i.e. Kafka, Spark, etc;
  • Extensive experience deploying, running and troubleshooting production SaaS applications on Azure, AWS or GCP(Azurestrongly preferred)
  • Experience performing Cloud and operational cost management will be an added preference
  • Knowledge of defining and monitoring system quality measures, including SLO and SLA;
  • Experience deploying observability platforms like Grafana, DataDog, etc.
  • Built tooling to improve reliability of systems, automated remediation of issues, or improve scalability;
  • Hands-on experience collecting performance data, analyzing, troubleshooting, and tuning;
  • Experience delivering software designed for high concurrency, scalability, or availability;
  • Experience leading high performing engineering teams;
  • Ability to be hands-on, helping the team manage issues and tickets when needed
  • Experience with containers and container orchestration tools (Docker, Kubernetes, etc.)
  • Proven track record of designing, building, optimizing, and maintaining infrastructure on a large scale;
  • Experience with Cloud infrastructure like Kafka, MySQL, Influxdb, Elasticsearch, Redis, and/or Memcached.

Aware serves some of the largest enterprises in the world, in doing so we can provide them insights into their diversity and inclusion efforts. Because of this, Aware strives to cultivate its own diverse culture so we can better understand those we serve. If you share our values and enthusiasm for making companies better, you’ll find a home at Aware.

Disclaimer: The duties and responsibilities described are not a comprehensive list and that additional tasks may be assigned to the employee from time to time.

• Company Equity
• 100% paid monthly health insurance for you and your family
• 401K match
• Tuition Reimbursement
• Open vacation policy
• Fully stocked kitchen with drinks, goodies and balanced snacks at HQ
• Flexible/Remote working options
• Cross-functional, open learning environment