jobs description
Role:
Site Reliability Engineer
Employment Type:
Work location:
Sanjose, CA
Work mode:
Onsite - 2 days a week / 3 days Remote
About The Role:
We seek a highly skilled and dynamic Site Reliability Engineer. In this role, you will:
- Maintain and improve the reliability, performance, and availability of software systems.
- Act as a bridge between traditional IT operations and software development, bringing a software engineering approach to system administration.
Job Responsibilities:
- Creating and supporting automation scripts (shell/ansible/python) for infrastructure deployments, validations, and monitoring to improve operational tasks.
- Scheduling monitoring scripts using cron and airflow.
- Monitoring using tools including Dynatrace, Apica, Grafana, etc.
- Database handling.
- Build CICD pipelines.
- Incident handling and problem management.
Mandatory Skills:
- Experience in Ansible/Python.
- Monitoring Tools: Dynatrace/Apica/Grafana.
Required Education:
Bachelor's degree in computer science or a related field.
Required Experience:
- 14 plus years of IT Infrastructure experience.
- Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities.
- Experience in programming languages like Python, Ansible.
- Knowledge of distributed computing and experience working with container orchestration frameworks including on-prem and Rancher Kubernetes, with good knowledge of Kubernetes objects.
- Experience working with Storage; ONTAP is preferable: volume, aggregates, backups, DR planning.
- Experience scheduling monitoring scripts using cron and airflow.
- Experience with monitoring tools including Dynatrace, Apica, Grafana, etc.
- Database knowledge including SQL and NoSQL databases.
- Experience building CICD pipelines (preferred).
- Cloud platform knowledge (specifically AWS) is required.
Travel Requirement (%):
NA
#J-18808-Ljbffr
San Jose CA United States
salary-criteria
Apply - Site Reliability Engineer