NHWC, also known as "channels_last",
Because NCHW needs to read the data of all channels before it can be calculated, it requires more storage during calculation. This feature is suitable for GPU operations. It takes advantage of the large memory bandwidth and strong parallelism of the GPU. Its memory access and calculation control logic is relatively simple; and NHWC can obtain the value of one color pixel for every three pixels read. , the color pixel can be calculated, which is more suitable for multi-core CPU operations. The memory bandwidth of the CPU is relatively small, the calculation delay of each pixel is low, and the temporary space is also small; if an asynchronous method is used to read and calculate at the same time To reduce the memory access time, the calculation control will be more complicated, which is also more suitable for the CPU.
Conclusion: When training the model, using the GPU is suitable for the NCHW format; when doing inference in the CPU, the NHWC format is suitable.
The format used is determined by the characteristics of the computing hardware. OpenCV is designed to operate on the CPU, so the default HWC format