PHANTAFIELD

Full stack GPU developer

RTL to training loops — own every layer of the 2D TMD compute stack

Full-time Santa Clara, CA Senior / Staff

About Phantafield

PhantaField is advancing humanity with atomic-scale technology. We are building the next generation of AI compute using 2D transition metal dichalcogenide (TMD) semiconductors — a material platform recognized by Intel and TSMC as the leading candidate to replace silicon beyond the 1 nm node. Through our proprietary low-temperature MOCVD process and 3D monolithic integration of DRAM on logic, we're delivering over 10× the memory bandwidth of HBM, solving the core bottleneck in AI acceleration. Founded in Silicon Valley and backed by Professor Chenming Hu — inventor of the FinFET and recipient of the U.S. National Medal of Technology and Innovation — PhantaField uniquely masters both material growth and device fabrication under one roof.

The role

We're looking for a rare engineer who can think in gates and tensors with equal fluency. You'll work across the entire GPU stack — writing RTL for 2D TMD-based compute architectures, defining GPU micro-architecture, building compiler passes, integrating with PyTorch, and training and optimizing transformer models on our hardware. You'll also leverage modern AI agent tooling to accelerate your own workflow and the team's. This is not a "pick one specialty" role. At PhantaField, the person who writes the RTL also watches the training loss curve — because when you're building a new class of semiconductor from the atom up, every layer of the stack is one system.

What you'll do

RTL & HARDWARE

Design and verify GPU compute units, memory controllers, and on-chip interconnects in SystemVerilog/Verilog targeting PhantaField's 2D TMD process. Write cycle-accurate RTL, build testbenches, and work closely with the materials and fabrication teams on device-level constraints unique to TMD-based CMOS. Prototype and validate designs on FPGA before tapeout.

ARCHITECTURE

Define GPU micro-architecture trade-offs for AI workloads — tensor cores, sparsity engines, mixed-precision datapaths, and memory hierarchy — optimized to exploit PhantaField's 3D monolithic DRAM-on-logic integration and ultra-high bandwidth memory subsystem. Co-design the ISA with compiler and software teams.

COMPILER

Build and optimize compiler infrastructure that maps high-level compute graphs down to PhantaField's instruction set. Work on kernel scheduling, memory planning, operator fusion, and auto-tuning using LLVM/MLIR, Triton, or equivalent. Bridge the gap between what the architecture exposes and what the framework needs.

FRAMEWORK

Integrate PhantaField's hardware backend into PyTorch — custom ops, autograd extensions, device plugins. Make training and inference on TMD-based GPUs a first-class experience for ML researchers, with seamless interop with HuggingFace, vLLM, and the broader ecosystem.

MODELS

Train, fine-tune, and benchmark transformer architectures end-to-end on PhantaField silicon. Profile attention mechanisms, KV caches, quantization strategies, and distributed training at scale. Feed results back into hardware and compiler decisions — closing the loop from training loss to transistor layout.

AI TOOLING

Use and build on state-of-the-art AI agent tools — Claude Code, Cursor, Codex, MCP integrations, agentic workflows — to amplify engineering velocity across the entire stack. Help the team adopt agent-driven development for RTL verification, code generation, design-space exploration, and debugging.

What we're looking for

Deep fluency in at least two of these layers, and working familiarity with the rest:

Silicon: RTL design in Verilog / SystemVerilog. Experience with synthesis, timing closure, or FPGA prototyping. Understanding of GPU or accelerator micro-architecture. Familiarity with novel semiconductor processes (2D materials, advanced nodes) is a strong plus.

Compiler: LLVM, MLIR, TVM, Triton, or equivalent. Experience lowering compute graphs to hardware-specific instruction sets and optimizing for memory-bound workloads.

Framework: PyTorch internals — custom C++/CUDA extensions, torch.compile, or device backend integration. Familiarity with ONNX, TensorRT, or similar inference runtimes.

Models: Hands-on transformer training at non-trivial scale. Understanding of attention variants, parallelism strategies (TP/PP/DP/EP), and inference optimization (speculative decoding, continuous batching, KV cache management).

AI agents: Practical experience using LLM-based coding agents and building agentic workflows. Comfort with MCP servers, prompt engineering for technical tasks, and evaluating AI-assisted development tools.

5+ years of relevant experience preferred, but we care more about depth of understanding across the stack than years on a resume.

Why Phantafield

You'll be joining at the ground floor of a company building an entirely new class of semiconductor — not iterating on silicon, but replacing it. Work alongside world-class advisors including the inventor of the FinFET. Ship hardware that runs transformers, and train the transformers that prove the hardware works. Competitive salary, meaningful equity, research-grade compute budget, and the kind of cross-stack ownership that simply doesn't exist at larger companies. If you want to touch every layer from atoms to attention heads, this is the role.

Apply for this role

PhantaField is an equal opportunity employer. We consider all qualified applicants regardless of race, color, religion, sex, national origin, disability, veteran status, sexual orientation, or gender identity.