This paper presents an optimized hardware architecture of the inverse quantization and the inverse transform (IQ/IT) for a high-efficiency video coding (HEVC) decoder. Our highly parallel and pipelined architecture was designed to support all HEVC Transform Unit (TU) sizes: 4 x 4, 8 x 8, 16 x 16, and 32 x 32. The IQ/IT was described in the VHSIC hardware description language and synthesized to Xilinx XC7Z020 field-programmable gate array (FPGA) and to TSMC 180 nm standard-cell library. The throughput of the hardware architecture reached in the worst case a processing rate of up to 1080 p at 33 fps at 146 MHz and 1080 p at 25 fps at 110 MHz when mapped to FPGA and standard-cells, respectively. The validation of our architecture was conducted on the ZC702 platform using a Software/Hardware (SW/HW) environment in order to evaluate different implementation methods (SW and SW/HW) in terms of power consumption and run-time. The experimental results demonstrate that the SW/HW accelerations were enhanced by more than 70% in terms of the run-time speed relative to the SW solution. Besides, the power consumption of the SW/HW designs was reduced by nearly 60% compared with the SW case.