Lead Site Reliability Engineer
Zeta · Hyderabad · 6–10 yrs experience · Posted 2026-02-13
Tech stack: Linux, MySQL, SQL, Unix, PostgreSQL, Kubernetes, Security
About the role
Our focus is on establishing product lines that focus on key outcomes by addressing real customer pain points, modernizing legacy systems, and strengthening core fundamentals. As a result, our systems and platforms support a wide range of banking and payments capabilities, including:
1. Tachyon, our cloud-native banking stack built for population-scale systems
2. Cipher, our unified authentication platform for secure, high-volume banking environments
3. Digital Credit as a Service, enabling banks to launch credit lines on UPI
4. Elena, our intelligent and conversational AI platform for banking.
5. Pixel, India’s first digital-native credit card, launched in partnership with HDFC Bank, for whom we also revamped their PayZapp mobile app: Winner of the Celent Model Bank Award for Payments Innovation 2024.
6. Sparrow, the leading card experience for non-prime cardholders in the US
…and more across cards, payments, lending, and core banking.
We are an engineering-first organization that values ownership, bias for action, and long-term thinking. Together, we solve some of the hardest problems in banking tech. Our culture is built around trust, collaboration, and creating the conditions f
Responsibilities:
- Establish a SRE site and help build an effective, inclusive SRE team.
- Provide technical leadership for the local team and work closely with partner team technical leads and cloud leadership.
- Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
- Manage execution of project priorities, deadlines, and deliverables.
- Lead Incident Management during Incidents.
- Responsible for driving MTTR as per the Incident SLA.
- Responsible for having 100% coverage for various alerts covering Application, Infrasture, Security, Flows etc
Qualifications:
- 6-10 years of experience in distributed systems, storage systems, or databases, algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls) and administration.
- Experience designing, analyzing, and troubleshooting large-scale distributed systems.
- Experience in MySQL or Postgres SQL in database.
- Hands-on experience on operating with k8s and any cloud.
- Excellent communication skills and a sense of ownership, with a systematic problem-solving approach
Qualifications
- 6-10 years of experience in distributed systems, storage systems, or databases, algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls) and administration.
- Experience designing, analyzing, and troubleshooting large-scale distributed systems.
- Experience in MySQL or Postgres SQL in database.
- Hands-on experience on operating with k8s and any cloud.
- Excellent communication skills and a sense of ownership, with a systematic problem-solving approach
Responsibilities
- Establish a SRE site and help build an effective, inclusive SRE team.
- Provide technical leadership for the local team and work closely with partner team technical leads and cloud leadership.
- Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
- Manage execution of project priorities, deadlines, and deliverables.
- Lead Incident Management during Incidents.
- Responsible for driving MTTR as per the Incident SLA.
- Responsible for having 100% coverage for various alerts covering Application, Infrasture, Security, Flows etc