Senior Site Reliability Engineer Cloud Platform
Company: Zilliz
Location: Redwood City
Posted on: February 15, 2026
|
|
|
Job Description:
Job Description Job Description Zilliz is a fast-growing startup
developing the industry’s leading vector database company for
enterprise-grade AI. Founded by the engineers behind Milvus, the
world’s most popular open-source vector database, the company
builds next-generation database technologies to help organizations
quickly create AI applications. On a mission to democratize AI,
Zilliz is committed to simplifying data management for AI
applications and making vector databases accessible to every
organization. What you will do: Work at the intersection of
development and site reliability. Creating SRE tools and systems,
as well as supporting existing infrastructure and platforms. Ensure
the reliability, availability, and performance of Zilliz’s
distributed database systems. Develop and implement strategies for
monitoring, incident management, and disaster recovery. Automate
system operations and maintenance tasks to improve efficiency and
reduce manual intervention. Design and build tools to manage and
monitor infrastructure, ensuring scalability and robustness.
Collaborate with software engineers to enhance system reliability,
scalability, and performance. Maintain and improve the CI/CD
pipeline to ensure smooth and rapid deployment of changes. Actively
contribute to the Milvus Vector Database open-source community,
focusing on improving reliability and operational efficiency. What
we are looking for: 4 years of experience in site reliability
engineering or similar roles with a focus on cloud-native systems.
Proficiency in scripting languages such as Python, Go, or Java.
Strong knowledge of container orchestration technologies like
Kubernetes and Docker. Expertise with cloud platforms such as AWS,
GCP, or Azure, and their respective monitoring and management
tools. Experience with infrastructure as code tools such as
Terraform or Ansible. Familiarity with CI/CD tools such as Jenkins,
GitLab CI, or Argo. Proven ability to troubleshoot complex
distributed systems and resolve issues promptly. Bachelor’s degree
or above in computer science, software engineering, or other
relevant disciplines. Ability to thrive in a fast-paced, startup
environment and handle multiple projects simultaneously. Experience
with Open Source Milvus Vector Database is nice to have Zilliz is
an Equal Opportunity Employer and welcome people from all
backgrounds, experiences, abilities, and perspectives. All
qualified applicants will receive consideration for employment
regardless of race, color, national origin, religion, sexual
orientation, gender, gender identity, age, physical disability, or
length of time spent unemployed. We may use artificial intelligence
(AI) tools to support parts of the hiring process, such as
reviewing applications, analyzing resumes, or assessing responses.
These tools assist our recruitment team but do not replace human
judgment. Final hiring decisions are ultimately made by humans. If
you would like more information about how your data is processed,
please contact us.
Keywords: Zilliz, Castro Valley , Senior Site Reliability Engineer Cloud Platform, IT / Software / Systems , Redwood City, California