Private Cloud Infrastructure Engineer - (Remote, Part-time)
Job description
AAFIE Artificial Intelligence Labs at AAFIE focuses on the research and development of digital transformation solutions for manufacturing and other industries. Our founding members include AI professors from UC Berkeley as well as senior engineers from Silicon Valley. Accomplishing our mission requires building really big things for our customers. We are currently seeking engineers for building the private cloud infrastructure to support our AI research and development.
Responsibilities
- Work in a closely integrated team for HPC system design, deployment, configuration, monitoring, and alerting.
- Configure, maintain, and build upon deployments of industry-standard tools (e.g. Slurm, Kubernetes, Docker, Gitlab, Jira, etc)
- Respond to, and document submitted support tickets relating to the functionality of various clusters, storage systems, and software solutions.
- Help develop automated tools to collect information for issues in job submissions.
Requirements
- Bachelor’s degree in computer science, electrical engineering or related field with 3 years of additional equivalent experience or evidence of exceptional ability related to the position.
- Experience with cluster deployment and operations
- Experience with GPU and CPU virtualization and workload managers.
- Experience with systems monitoring and alerting (Ganglia, Telegraf, Splunk, etc.)
- Experience with administering job schedulers (SLURM, LSF, etc.)
- Experience with container-based deployment using Docker, Singularity, and Kubernetes.
- Experience with Machine Learning frameworks and libraries such as H2O, TensorFlow, PyTorch, SciKit-Learn is a plus.
- Experience with data processing technologies (batch & streaming) and pipelines such as Spark, Flink, Google Dataflow, Dataproc etc. is a plus.
- Experience with leading industry Machine Learning tools and operation framework such as MLflow, Kubeflow, Airflow, Seldon Core, TFServing etc. is a plus.
- Experience in monitoring and performance analysis of Machine Learning AI and GPU server platforms using tech stacks and tools such as Splunk/SignalFX, ELK, Grafana, Prometheus etc. is a plus.
Job Types: Part-time, Contract
Pay: $60.00 - $80.00 per hour
Benefits:
- Flexible schedule
Experience level:
- 3 years
Schedule:
- Choose your own hours
Education:
- Master's (Required)
Experience:
- Cloud architecture: 3 years (Required)
Work Location: Remote
learninglandscape.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, learninglandscape.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, learninglandscape.com is the ideal place to find your next job.