DevOps Reliability Engineer

Full Time
Remote
Posted
Job description
Fyusion is a leading machine learning and computer vision company focused on automotive inspections and related applications. Our patented 3D format enables anyone to capture and display interactive 3D images using their smartphone, and enables significant added functionality with deep visual understanding and machine learning-driven analysis.

Founded in 2014, Fyusion is now part of the Cox Automotive family. Our team includes some of the world's top researchers and developers in light field imaging and AI, continuing to push boundaries and innovate at the highest level from our San Francisco research center.

Fyusion is seeking an awesome DevOps Reliability Engineer (intersection of DevOps & SRE) to join our Web and Cloud Infrastructure team. We are a close-knit team that enjoys challenges and solving real world problems. You will have a key role in solving those problems, helping to shape our core automation, data processing, and deployment practices. You will leverage deep knowledge of Amazon Web Services, as well as automated build and orchestration tools such as Terraform and Kubernetes, to develop and maintain a wide range of infrastructure components—including web stacks, database systems, security tools, and networking/cloud environment configurations.

Further, you will proactively seek out system weaknesses and find ways to fix them beforethey cause production issues using monitoring data, watching trends, and using Chaos Engineering.

We understand this is a complex role, and do not expect you to be an expert in every tool we use. However, we do expect you to be motivated and open to continual self-improvement, adapting to new tools and overcoming new challenges as they come. If you are looking to be challenged, enjoy wearing multiple hats, and thrive in a fast-paced, agile environment, we think you’ll love this role!

Here's what you will be doing:

  • Actively troubleshoot any issues that arise during testing and production, catching and solving issues before launch
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Monitor and troubleshoot highly scalable and distributed server clusters that perform various functions, from web-servers to machine learning processing
  • Participate in SRE activities, (chaos engineering gamedays, disaster recovery scenarios etc).
  • Manage code deployments, fixes, updates, and related processes
  • Work with a close-knit team and brainstorm on the best ways to tackle complex problems in infrastructure, security and monitoring
  • Provide technical guidance and educate team members and coworkers on monitoring and logging. (Have an interesting idea or solution? Present it!)
  • Automating any software maintenance processes which previously required a manual procedure.

Here's what we are looking for:

  • Bachelor’s Degree or equivalent experience required.
  • 3+ years experience with software engineering, software development, or system operations on high available and high traffic environments
  • Strong experience with Linux-based infrastructures, Linux/Unix administration, and AWS
  • Experience with databases such as MySQL (or sql based), Elasticsearch, Redis
  • Experience administering linux servers as well as docker based infrastructure (like Kubernetes, EKS, etc.) in a highly available environment
  • Experience of scripting languages such as Python, Bash
  • Experience with message broker/queue technologies like RabbitMQ
  • Experience with modern monitoring, logging and observability tools in complex distributed systems such as with Grafana, New Relic, Splunk, Elastic stack, Datadog, Prometheus, etc.
  • Practical experience with infrastructure-as-code (with tools like Terraform, Chef, Ansible, etc.).
  • Good understanding of cybersecurity fundamentals and best practices.
  • Containerizing and clustering (Dockerfiles, docker-compose, Helm, Kubernetes, etc.)
  • Stellar problem-solving and troubleshooting skills with the ability to spot issues before they become problems.
  • Excellent oral and written communication skills.
  • Process-oriented with great documentation skills
  • Solid team player!
Here's what we can offer you:

A competitive compensation, health, vision and dental benefits with premiums paid by Fyusion, unlimited PTO plan, company holidays (including your birthday), and the chance to be part of a pioneering technology team!

We offer some amazing perks for those working from our SF HQ: commuter benefits, company catered lunches, a fully stocked snack pantry, tons of company off-sites, and a pup friendly workplace.

If you read this job description and saw your name all over this, apply! If you read this, and think that you might need some help hitting all of the points, please apply! We have an entire team who is happy to help and share our knowledge with you.

The benefits do not apply to contract or internship positions.

***PLEASE ONLY RESPOND TO EMAILS FROM OUR TEAM MEMBERS WITH A fyusion.com DOMAIN. BEWARE OF SCAMMERS AND PHISHING SCAMS RELATED TO EMPLOYMENT AND IDENTITY THEFT

learninglandscape.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, learninglandscape.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, learninglandscape.com is the ideal place to find your next job.

Intrested in this job?

Related Jobs

All Related Listed jobs