Candescent is the largest non-core digital banking provider. We bring together the transformative technologies that power and connect account opening, digital banking and branch solutions for banks and credit unions of all sizes on any core. Our Candescent solutions power the top three U.S. mobile banking apps and are trusted by banks and credit unions of all sizes.We offer an extensive portfolio of industry-leading products and services with an extensible ecosystem of out-of-the-box and integrated partner solutions. In addition, our API-first architecture and developer tools enable financial institutions to optimize and expand upon their existing capabilities by seamlessly integrating custom-built or third-party solutions. And our connected in-person, remote and digital experiences reinvent customer service across all channels.Self-service configuration and marketing tools give financial institutions greater control of their branding, targeted messaging and overall user experience. And data-driven analytics and reporting tools provide valuable insights to help drive continued growth and profitability. From conversions and implementations to custom development and customer care, our clients get expert, end-to-end support at every step.TITLE
: Site Reliability Engineer
Exp: 3-6 Years
Job Role
We are looking for a
Site Reliability Engineer
(SRE) who will be part of our SRE team and help build scalable systems, using best practices around automation, that improve reliability, velocity and enable monitoring of the operational health of stacks throughout their life cycle including metrics collection, aggregation, and visualization.As a member of the SRE team you will support Candescent Financial Services, product and technology teams to improve the design and operation of systems, focusing on making them scalable, reliable, and efficient while ensuring performance and high availability of products/services primarily residing in the cloud. You will influence the development and implementation of reliable production systems and services to address emerging business needs (such as Cloud-based SaaS). SRE’s pride themselves on the resiliency and stability of production systems, yet at the same time are committed to innovation and operational improvement through the application of software engineering practices to operations.The SRE will facilitate innovation and operational improvement through the application of software engineering practices to operations. You will make our products easier to adopt and use by making improvements to the product, tools, processes and documentation. You are someone who strives for six 9’s or better for service availability!
- You will be responsible for maintaining and scaling production services and servers for complex and high throughput cloud services.
- You will bridge and own the union between development, quality, security and operations.
- You will improve scalability, service reliability, capacity, and performance.
- You will write automation code for provisioning and operating infrastructure at massive scale.
- You are not an operator, you’re an experienced software engineer focused on operations.
- You will initiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development.
- You will use automation extensively to design, configure, manage, and monitor systems in support of our product development teams.
- You will participate in disaster recovery planning and execution.
- You will be responsible for maintaining / patching servers supporting SaaS products. This includes Windows Servers, Linux Servers running in in-house Datacenters and/or using cloud PaaS providers (Primarily GCP & Azure).
- You’ll work hand-in-hand with all teams to ship our code to production using Continuous Integration / Continuous Deployment (CI/CD) and AppSec tooling.
- You will collaborate with development teams and use intuition, experience and understanding to create SLIs, SLOs, and SLAs.
- You will provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems. (You will be on-call for periods of time.)
- You will develop monitoring architecture, implementing monitoring agents, build dashboards, manage escalations and alerts
- You will participate in incident management and driving root cause analysis (RCA) and risk management processes.
EEO Statement
Integrated into our shared values is Candescent’s commitment to diversity and equal employment opportunity. All qualified applicants will receive consideration for employment without regard to sex, age, race, color, creed, religion, national origin, disability, sexual orientation, gender identity, veteran status, military service, genetic information, or any other characteristic or conduct protected by law. Candescent is committed to being a globally inclusive company where all people are treated fairly, recognized for their individuality, promoted based on performance and encouraged to strive to reach their full potential. We believe in understanding and respecting differences among all people. Every individual at Candescent has an ongoing responsibility to respect and support a globally diverse environment.
Statement to Third Party Agencies
To ALL recruitment agencies: Candescent only accepts resumes from agencies on the preferred supplier list. Please do not forward resumes to our applicant tracking system, Candescent employees, or any Candescent facility. Candescent is not responsible for any fees or charges associated with unsolicited resumes.