SoC Performance Modeling & Architecture Engineer
AMD · Bangalore · 5+ yrs experience · Posted 2026-06-19
Tech stack: C, C++, Python
About the role
Role Overview We are seeking a highly skilled and motivated SoC Performance Modeling & Architecture Engineer to drive the definition, analysis, and optimization of our next-generation Systems-on-Chip (SoCs). In this role, you will be responsible for the end-to-end performance lifecycle—from workload analysis and microarchitectural modeling to post-silicon correlation. You will evaluate complex architectural trade-offs across CPU, GPU, NPU/ML accelerators, Memory, and Network-on-Chip (NoC) subsystems to maximize performance-per-watt and shape the roadmap for future silicon.
Responsibilities:
- 1. Performance Modeling & Simulation
- Leverage and Maintain Cycle-Approximate Models: Leverage, and maintain execution-driven and trace-driven SoC performance simulators (e.g., using C++, SystemC, or Python).
- Explore Architectural Space: Construct scalable models to project the performance of future SoC configurations and validate architectural concepts before RTL freeze.
- Hardware/Software Co-Design: Model the interaction between hardware subsystems and low-level software stacks to identify system-wide bottlenecks.
- 2. Workload Analysis & Benchmarking
- Characterize Workloads: Profile and analyze industry-standard benchmarks and real-world use cases across CPU (e.g., SPEC CPU), GPU (e.g., 3DMark, GFXBench), and Machine Learning/AI (e.g., MLPerf, LLM inference, CNNs).
- Trace Generation: Capture and manipulate instruction and memory traces from both pre-silicon environments and post-silicon hardware.
- 3. Subsystem & Interconnect Analysis
- NoC & Fabric Optimization: Evaluate Network-on-Chip (NoC) topologies, routing algorithms, arbitration schemes, and bandwidth/latency characteristics under heavy multi-master workloads.
- Memory Hierarchy Evaluation:
- Analyze the impact of cache hierarchies (L1/L2/System Cache), memory controllers, and various memory technologies (e.g., LPDDR5/6, HBM) on system performance.
- 4. PPA (Power, Performance, Area) Trade-off Studies
- Multi-Domain Trade-offs: Evaluate complex trade-offs when scaling or configuring different combinations of CPU cores, GPU compute units, and ML accelerators.
- Required Qualifications & Skills
- Technical Expertise
Qualifications:
- BS/MS in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
- Experience: 5+ years of industry experience in SoC/Processor architecture and performance modeling.
- Programming: Strong proficiency in C/ C++ and Python
- Architecture Knowledge: Deep understanding of modern computer architecture, including pipelining, out-of-order execution, cache coherence protocols, and virtual memory.
- Modeling Frameworks: Direct experience with performance modeling frameworks (e.g., gem5, SystemC, or proprietary cycle-accurate/cycle-approximate simulators).
- Hands-on experience with post-silicon debug tools and reading hardware performance counters (e.g., ARM PMU, Intel VTune, or internal test chips).
- Familiarity with ML frameworks (PyTorch, TensorFlow) and compiling/mapping models to specialized hardware accelerators.
- Experience utilizing data visualization tools to present complex performance metrics cleanly to cross-functional stakeholders.
Qualifications
- BS/MS in Computer Engineering, Electrical Engineering, Computer Science, or a related field.
- Experience: 5+ years of industry experience in SoC/Processor architecture and performance modeling.
- Programming: Strong proficiency in C/ C++ and Python Architecture Knowledge:
- Deep understanding of modern computer architecture, including pipelining, out-of-order execution, cache coherence protocols, and virtual memory.
- Modeling Frameworks: Direct experience with performance modeling frameworks (e.g., gem5, SystemC, or proprietary cycle-accurate/cycle-approximate simulators).
- Hands-on experience with post-silicon debug tools and reading hardware performance counters (e.g., ARM PMU, Intel VTune, or internal test chips).
- Familiarity with ML frameworks (PyTorch, TensorFlow) and compiling/mapping models to specialized hardware accelerators.
- Experience utilizing data visualization tools to present complex performance metrics cleanly to cross-functional stakeholders.
Responsibilities
- Performance Modeling & Simulation Leverage and Maintain Cycle-Approximate Models: Leverage, and maintain execution-driven and trace-driven SoC performance simulators (e.g., using C++, SystemC, or Python).
- Explore Architectural Space: Construct scalable models to project the performance of future SoC configurations and validate architectural concepts before RTL freeze.
- Hardware/Software Co-Design: Model the interaction between hardware subsystems and low-level software stacks to identify system-wide bottlenecks.
- Workload Analysis & Benchmarking
- Characterize Workloads: Profile and analyze industry-standard benchmarks and real-world use cases across CPU (e.g., SPEC CPU), GPU (e.g., 3DMark, GFXBench), and Machine Learning/AI (e.g., MLPerf, LLM inference, CNNs).
- Trace Generation: Capture and manipulate instruction and memory traces from both pre-silicon environments and post-silicon hardware.
- Subsystem & Interconnect Analysis
- NoC & Fabric Optimization: Evaluate Network-on-Chip (NoC) topologies, routing algorithms, arbitration schemes, and bandwidth/latency characteristics under heavy multi-master workloads.
- Memory Hierarchy Evaluation:
- Analyze the impact of cache hierarchies (L1/L2/System Cache), memory controllers, and various memory technologies (e.g., LPDDR5/6, HBM) on system performance.
- PPA (Power, Performance, Area) Trade-off Studies
- Multi-Domain Trade-offs: Evaluate complex trade-offs when scaling or configuring different combinations of CPU cores, GPU compute units, and ML accelerators.
- Required Qualifications & Skills Technical Expertise