Site Reliability Engineer

Redmond, Washington, United States 🇺🇸

Full-time, Hybrid

$98.3K - $193.2K / Year

Site Reliability 🛠️Cloud ☁️

Overview

Microsoft is looking for an Site Reliability Engineer to support and expand Viva Engage. Viva Engage (formerly Yammer) is the industry-defining social network for the enterprise. We provide a platform for millions of employees, including those from 85% of Fortune 500 companies, to build community and culture, share knowledge, and connect with their leaders and each other.

The user base for Viva Engage is growing quickly. The Site Reliability team is responsible for keeping the services reliable as we scale and modernize our tech stack. We need a demonstrated SRE who knows how to manage the conflicting priorities of keeping things running today while making sure we have the architecture we need for the future.

Acquired by Microsoft in 2012, Viva Engage combines the benefits of a startup - rapid innovation, cutting-edge technology, outsized individual impact - with the advantages of working for one of the most successful software companies in the world. We believe in mission-driven work and in this post-Covid world, our platform has become more indispensable than ever as it fosters connection and a sense of belonging among remote teams.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Qualifications

Required Qualifications:

4+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
- OR Master's Degree in Computer Science, Information Technology, or related field.

Other requirements:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship Verification:
- This position requires verification of citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, and as a condition of employment, the successful candidate’s citizenship will be verified with a valid passport.
This position requires passing a background check conducted through the CJIS criminal justice information system by authorized local, state, and/or federal agencies and across multiple states. This role requires candidates to maintain CJIS screening eligibility. This position is required to work in GCC-M, GCC-H, and DoD environments.

Preferred Qualifications:

5+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in computer science, information technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
- OR Master's Degree in computer science, information technology, or related field AND 1+ years technical experience in software engineering, network engineering, or systems administration
Experience applying SRE principles in a large production environment.
Demonstrated proficiency in cloud computing platforms (e.g., AWS, Azure, GCP) and related services (e.g., EC2, S3, VPC, IAM, Lambda).
Expertise in automation tools and frameworks (e.g., Terraform, Ansible, Chef, Puppet) and scripting languages (e.g., Python, Bash).
Deep understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) and incident response processes.
Demonstrated problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
Effective communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $127,200 - $208,800 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until July 2, 2024

Responsibilities

Participate in on-call rotations and incident responses throughout product development and operations cycles. On-call will require responding to support requests after normal business hours to include the weekends and/or holidays in a designated Microsoft office.
Monitor system performance and proactively identify and resolve issues to ensure high availability and performance.
Develop and maintain automation tools and processes for deployment, monitoring, and configuration management.
Apply troubleshooting skills, debugging tools, and examines logs, telemetry, and other methods to verify assumptions and customer impact. Proactively and reactively address findings with customer and/or service engineering efficiently via written and verbal communications.
Lead blameless postmortems for root cause and production resiliency.
Consult with developers to design services that scale in Azure.
Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud computing.

Please mention you found this job on JobDevOps. Thanks and good luck!