ViralMoment is an AI social listening platform that analyzes social videos to identify trending topics and provide insights to brands and agencies. Our mission is to help our clients stay ahead of the curve by leveraging cutting-edge AI technology.
About the Role:
We are seeking a Site Reliability Engineer to join our dynamic team at ViralMoment. In this critical role, you will be responsible for optimizing our cloud infrastructure for scaling, and ensuring high reliability, performance, and availability of our AI-driven platform. Reporting directly to the CTO, you will have the opportunity to influence architectural decisions and lead initiatives for a multi-cloud environment.
Key Responsibilities:
Optimize and manage our cloud infrastructure, focusing on scalability, performance, and reliability
Develop and enhance observability systems for monitoring and alerting
Ensure the stability and efficiency of large-scale systems through effective DevOps practices
Handle multi-cloud environments, primarily AWS, with potential implementations on GCP and Azure
Collaborate with engineering teams to integrate and optimize backend processes
Research and implement systematic solutions for large model applications
Maintain and improve system performance through proactive monitoring and troubleshooting
Qualifications:
Bachelor’s or Master’s degree in Computer Science, Engineering, or related fields from an accredited institution
At least 5 years of experience in a similar role, focusing on cloud infrastructure and site reliability
Proficiency in cloud-native technologies and strong understanding of the relevant technology stack
Expertise in AWS, with additional knowledge of GCP and Azure preferred
Strong programming skills in Python with the ability to it them proficiently in a professional setting
Familiarity with infrastructure as code, particularly Terraform
Experience with large-scale cluster management and cloud-native technologies for log collection, monitoring, and alerting
Preferred Qualifications:
Prior experience in constructing and maintaining stability systems for large-scale infrastructures
Experience with infrastructure as code, especially Terraform
Proven track record in operating and maintaining large-scale systems
What We Offer:
A pivotal role in a rapidly growing startup at the forefront of AI technology
Direct impact on the platform's performance and scalability that supports major global brands
Remote work flexibility with a supportive and dynamic team environment
Competitive salary and opportunities for advancement and leadership
How to Apply:
If you are passionate about optimizing cloud infrastructure and ensuring system reliability, we encourage you to apply. Please submit your resume highlighting your experience with cloud platforms, programming languages, and system reliability.
Powered by JazzHR
8IBASh9g40
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Engineering and Information Technology
Industries
Internet Publishing
Referrals increase your chances of interviewing at ViralMoment by 2x