Job
Description
Role Overview
Own the reliability, availability, performance, and security of our databases PostgreSQL, MongoDB, and MySQLrunning container-native, highly available on Kubernetes. Youll design the architecture, automate operations, guide domain teams on optimal schema & query design, and lead backup/restore and disaster recovery (DR) with clear RPO/RTO objectives. What Youll Do Architecture & HA Design and operate HA clusters on Kubernetes (StatefulSets, PDBs, anti-affinity, topology spread, CSI storage, auto-failover). Choose and operate DB operators/distributions (e.g., Zalando/Crunchy for Postgres, Percona for MongoDB/ MySQL or equivalents); maintain versioning/patching strategy. Implement multi-AZ replicas, read-replicas, and connection pooling/proxying (e.g., pgBouncer/HAProxy/Envoy) for scale. Performance & Scalability Capacity planning, sizing, and benchmarking; own p95/p99 latency and throughput KPIs. Query/index tuning, VACUUM/ANALYZE/auto-analyze strategies (Postgres), sharding/replica-set tuning (MongoDB), and InnoDB/Group Replication tuning (MySQL). Data Modeling & Reviews Partner with domain teams on schema/design reviews, migration strategies (Flyway/Liquibase/Alembic), and query best practices to keep services fast and cost-efficient. Backup, DR & Compliance Own backup/restore (base backups, PITR/WAL/GTID, snapshotting), tested DR runbooks, and RPO/RTO targets; run quarterly game days. Implement data retention, encryption (at rest & in transit), auditing, and access controls (RBAC/KMS/Secrets). Automation & CI/CD Everything-as-code: Helm/Terraform/Ansible for DB infra; GitOps pipelines for changes, safe rollout/rollback, and automated smoke checks. Observability & Ops Metrics, logs, and tracing with Prometheus/Grafana/ELK/Datadog; alerting on golden signals (availability, saturation, errors, latency, replication lag, disk/IO). Participate in on-call; drive postmortems and permanent fixes. Cost & FinOps Optimize storage classes, compression, caching, and query plans to balance cost vs. performance. Minimum Qualifications 58 years as a DBA/DBRE with production experience in PostgreSQL plus MongoDB and MySQL (at least two in depth). Strong Kubernetes fundamentals for stateful workloads (StatefulSets, PV/PVC, CSI, network policies, PDBs). Deep skills in SQL, indexing, query plans, transaction/locking, replication, and failover. Proven backup/restore & DR ownership (PITR, cross-AZ/region replicas) with documented RPO/RTO. Hands-on with automation/IaC (Helm/Terraform/Ansible) and CI/CD for DB changes. Observability in practice (exporters, dashboards, SLOs, alerting) and rigorous incident management. Excellent collaboration/communication; confident running schema/design reviews with product teams. Preferred (Nice to Have) Experience with pgBouncer, Patroni/Crunchy/Zalando operators (Postgres), Percona Operators (MongoDB/MySQL).