Senior Software Engineer, Platform
ROLE SUMMARY
Firmus Technologies is seeking a Senior Software Engineer focussing on Platform Engineering to join our Engineering and Technology team. You will drive the enhancement of our observability capabilities to achieve ClusterMAX Platinum tier recognition from SemiAnalysis. You will also enhance internal tooling to improve developer and operations productivity. This role is ideal for a self-starter with passion for building things from first principles. You naturally break down complex problems into their fundamental truths to uncover novel and elegant solutions—rather than relying on conventional patterns.
KEY RESPONSIBILITIES
- Drive and collaborate with AI/ML engineers to develop and integrate AI/ML application-level monitoring from the ground up, including model accuracy tracking and performance observability.
- Develop purpose-built Prometheus exporters to provide necessary granularity for robust low-level components and interconnect fabric monitoring.
- Build and enhance internal tooling to automate workflows, improve developer and operations productivity, and streamline platform operations (e.g., dashboards, CLI tools, automation scripts, self-service portals).
- Continuously improve automated test coverage and effectiveness by adopting new testing frameworks, tools, and best practices.
- Own net-new product experiments (e.g., VR with Meta Quest), driving innovation from concept to production deployment and mass adoption.
- Contribute to the adoption and integration of AI-augmented development tools and workflows.
SKILLS AND EXPERIENCE
- Bachelor's degree in computer science or a related technical field.
- 7+ years of experience as Software Engineer, with a minimum of 3 years in a dedicated Platform/Observability engineering focus role.
- Demonstrated strong proficiency on the following areas:
- Modern application development frameworks and languages (e.g., Go, Python, Node.js).
- Advanced querying and optimization using SQL, PromQL, LogQL, GraphQL.
- Observability stack (e.g., Loki, Grafana, Tempo, Prometheus, Thanos, ClickHouse).
- Data streaming (e.g., Kafka, Pulsar).
- Automated unit, integration, security, load and end-to-end testing frameworks (e.g., Pytest, JUnit, K6, Go test, Cypress) and integrating tests into CI/CD pipelines.
- Cloud platforms (e.g., AWS, Azure, or GCP).
- Containerization technologies (e.g., Docker).
- Experience with AI-augmented development tools and workflows.
- Working knowledge on configuration management and CI/CD (e.g., Ansible, GitHub Actions, Jenkins, ArgoCD)
- Clear and effective English communication, written and spoken.
- Bonus Points:
- Familiarity with Linux internals, networking stacks, distributed storage and high-performance computing.
- Experience in high-growth startups or regulated industries with robust security and data privacy requirements, including SOC 2 Type 2 and ISO 27001.
Firmus Technologies is a global leader pioneering the solution to AI’s energy challenge, founded in Australia in 2019 by a visionary team of entrepreneurs and engineers passionate about sustainable computing infrastructure.
Firmus builds and operates AI infrastructure across Asia-Pacific, utilising its proprietary AI Factory platform to deliver transformative cost-effective GPU clusters and AI cloud services for developers, enterprise, education and government users.
We are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.
Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering.




