This talk presents practical lessons from deploying energy-efficient applications across heterogeneous platforms spanning edge to cloud. We explore HW/SW co-design strategies that balance performance, power, and adaptability under real-world constraints. Through case studies on CPUs, GPUs, and FPGAs, we show how workload characteristics drive architecture selection and mapping decisions. Key insights highlight the role of memory hierarchy, data movement, and precision scaling in achieving high energy efficiency. We present results from an energy-efficient LLM inference serving framework and analyze the interaction between dynamic data types, memory hierarchy, and DSP utilization on FPGA platforms. Empirical findings demonstrate substantial improvements in performance-per-watt. The talk concludes with design guidelines for scalable, energy-aware systems.
