H-1B Job Board

Finding companies that sponsor visas is a lot of work.
We've made your life easier by compiling top companies and startups that hire foreign nationals.

Software Engineer, Machine Learning

Fireworks AI

Fireworks AI

Software Engineering
New York, NY, USA · San Francisco, CA, USA
Posted on Mar 13, 2025

About Us:

Here at Fireworks, we’re building the future of generative AI infrastructure. Fireworks offers the generative AI platform with the highest-quality models and the fastest, most scalable inference. We’ve been independently benchmarked to have the fastest LLM inference and have been getting great traction with innovative research projects, like our own function calling and multi-modal models. Fireworks is funded by top investors, like Benchmark and Sequoia, and we’re an ambitious, fun team composed primarily of veterans from Pytorch and Google Vertex AI.

The Role:

We’re looking for a Machine Learning Software Engineer to develop the core systems and APIs that power LLM inference and fine-tuning at scale. You will work on high-performance, distributed systems that optimize model execution, ensuring low-latency inference and efficient fine-tuning pipelines. As part of our engineering team, you’ll collaborate with researchers and infrastructure engineers to bring cutting-edge AI capabilities to production, delivering robust and scalable solutions for real-world applications.

Key Responsibilities:

  • Develop and optimize ML inference systems that enable fast, scalable model execution.

  • Build APIs for LLM inference and fine-tuning, ensuring reliability, efficiency, and performance.

  • Design distributed systems for serving large-scale AI workloads, ensuring fault tolerance and scalability.

  • Analyze and improve the efficiency, scalability, and stability of system resources.

  • Troubleshoot performance bottlenecks and optimize inference runtime for diverse hardware environments.

Minimum Qualifications:

  • Bachelor’s degree in Computer Science or equivalent industry experience.

  • 5+ years of experience writing high-performance, production-grade code.

  • Hands-on experience with LLM inference frameworks (e.g., vLLM, TensorRT, SGLang, Triton).

  • Deep understanding of distributed systems, including storage, search, and large-scale computation.

  • Strong background in systems programming, performance optimization, and parallel computing.

Preferred Qualifications:

  • Master’s or PhD in Computer Science, Machine Learning, or a related field.

  • Experience in fine-tuning large models and optimizing training pipelines.

  • Proficiency in hardware acceleration technologies such as CUDA, TensorRT, and TPUs.

Why Fireworks AI?

  • Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving.

  • Build What’s Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally.

  • Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI—no bureaucracy, just results.

  • Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.

Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.