Skip to main content

How to measure FastCV HAL vs OpenCV performance

Compile the build with either of the following options
  • To enable FastCV acceleration, use the -DWITH_FASTCV=ON option
  • To enable default OpenCV on CPU, use the -DWITH_FASTCV=OFF option
NoteBy default OpenCV acceleration with FastCV is enabled.
Once compilation is done, flash the build and boot the device. All libraries are present in the /usr/lib/ directory.
  1. Copy the test bins to /usr/bin to run the tests.
    scp -r opencv_perf_core root@[IP-address]:/usr/bin/
    
  2. Copy the test data to the device if not previously copied. To obtain the test data, clone the projects at https://github.com/opencv/opencv_extra/tree/4.13.0. Use the scp command to push the test data to the desired location on the host. For example:
    scp -r [file] root@[IP-ADDR]:/tmp
    
  3. To start the test on the target device, run the following commands.
    cd /usr/bin/
    chmod 777 opencv_perf*
    export OPENCV_OPENCL_RUNTIME=disabled && export OPENCV_TEST_DATA_PATH=/tmp && /usr/bin/opencv_perf_core --gtest_filter=ArithmMixedTest.subtract/2 --perf_min_samples=100 --perf_force_samples=100 >> results_Arithm.txt
    
The above commands run different test cases for the subtract API, with each test case looped over 100 times. The results are collected in a results_Arithm.txt text file. results_Arithm.txt includes details for different test cases including the test name, number of samples, resolution, mean time, pass/fail status. For the following test case with FastCV acceleration enabled, the total time taken was 16 ms. FastCV performance results With the default OpenCV, for the same test case, the total time taken was 23 ms. OpenCV performance results Run other test cases and compare the latency numbers between default OpenCV and FastCV accelerated OpenCV.

Supported OpenCV APIs and corresponding FastCV APIs

Supported OpenCV APIs
OpenCV moduleOpenCV APIUnderlying FastCV API for OpenCV acceleration
IMGPROCmedianBlurfcvFilterMedian3x3u8_v3
sobelfcvFilterSobel3x3u8s16
sobelfcvFilterSobel5x5u8s16
sobelfcvFilterSobel7x7u8s16
boxFilterfcvBoxFilter3x3u8_v3
boxFilterfcvBoxFilter5x5u8_v2
adaptiveThresholdfcvAdaptiveThresholdGaussian3x3u8_v2
adaptiveThresholdfcvAdaptiveThresholdGaussian5x5u8_v2
adaptiveThresholdfcvAdaptiveThresholdMean3x3u8_v2
adaptiveThresholdfcvAdaptiveThresholdMean5x5u8_v2
subtractfcvImageDiffu8f32_v2
pyrDownfcvPyramidCreateu8_v4
cvtColorfcvColorRGB888toYCrCbu8_v3
cvtColorfcvColorRGB888ToHSV888u8
GaussianBlurfcvFilterGaussian5x5u8_v3
GaussianBlurfcvFilterGaussian3x3u8_v4
cvWarpPerspectivefcvWarpPerspectiveu8_v5
CannyfcvFilterCannyu8
boxFilterfcvBoxFilterNxNf32
CORElutfcvTableLookupu8
normfcvHammingDistanceu8
multiplyfcvElementMultiplyu8u16_v2
multiplyfcvElementMultiplyu8
multiplyfcvElementMultiplys16
multiplyfcvElementMultiplyf32
transposefcvTransposeu8_v2
transposefcvTransposeu16_v2
transposefcvTransposef32_v2
meanStdDevfcvImageIntensityStats_v2
flipfcvFlipu8
flipfcvFlipu16
flipfcvFlipRGB888u8
rotatefcvRotateImageu8
rotatefcvRotateImageInterleavedu8
addWeightedfcvAddWeightedu8_v2
SVDfcvSVDf32_v2
GemmfcvMatrixMultiplyf32_v2
GemmfcvMultiplyScalarf32
GemmfcvAddf32_v2

FastCV CPU extension
OpenCV extension APIsFastCV APIs usedDescription
arithmetic_opfcvAddu8Matrix addition of two uint8_t type matrixes to one uint8_t matrix
fcvAddf32_v2Matrix addition of two float32_t type matrixes.
fcvAdds16_v2Matrix addition of two int16_t type matrixes which allows in-place operation
fcvSubtracts16Matrix substration of two uint16_t type matrixes
fcvSubtractu8Matrix substration of two uint8_t type matrixes
bilateralFilterfcvBilateralFilter5x5u8_v3Bilateral smoothing with 5x5 bilateral kernel
fcvBilateralFilter7x7u8_v3Bilateral smoothing with 7x7 bilateral kernel
fcvBilateralFilter9x9u8_v3Bilateral smoothing with 9x9 bilateral kernel
bilateralRecursivefcvBilateralFilterRecursiveu8Here the smoothing is actually performed in gradient domain.
buildPyramidfcvPyramidAllocate_v3Allocates memory for an image pyramid. This API can be removed without notice and should only be used for testing.
fcvPyramidCreateu8_v4Creates an image pyramid from an 8-bit unsigned (grayscale) source image.
fcvPyramidDelete_v2Deallocates an array of fcvPyramidLevel. Can be used for any type (f32/s8/u8).
calcHistfcvImageIntensityHistogramCreates a histogram of intensities for a rectangular region of a grayscale image.
clusterEuclideanfcvClusterEuclideanu8General function for computing cluster centers and cluster bindings
cvtColorfcvColorYCbCr420PseudoPlanarToYCbCr444PseudoPlanaru8Color conversion from pseudo planar YCbCr420 to pseudo planar YCbCr444.
fcvColorYCbCr420PseudoPlanarToYCbCr422PseudoPlanaru8Color conversion from pseudo planar YCbCr420 to pseudo planar YCbCr422.
fcvColorYCbCr422PseudoPlanarToYCbCr444PseudoPlanaru8Color conversion from pseudo planar YCbCr422 to pseudo planar YCbCr444.
fcvColorYCbCr422PseudoPlanarToYCbCr420PseudoPlanaru8Color conversion from pseudo planar YCbCr422 to pseudo planar YCbCr420.
fcvColorYCbCr444PseudoPlanarToYCbCr422PseudoPlanaru8Color conversion from pseudo planar YCbCr444 to pseudo planar YCbCr422.
fcvColorYCbCr444PseudoPlanarToYCbCr420PseudoPlanaru8Color conversion from pseudo planar YCbCr444 to pseudo planar YCbCr420.
fcvColorRGB565ToYCbCr444PseudoPlanaru8Color conversion from RGB565 to pseudo-planar YCbCr444.
fcvColorRGB565ToYCbCr422PseudoPlanaru8Color conversion from RGB565 to pseudo-planar YCbCr422.
fcvColorRGB565ToYCbCr420PseudoPlanaru8Color conversion from RGB565 to pseudo-planar YCbCr420.
fcvColorRGB888ToYCbCr444PseudoPlanaru8Color conversion from RGB888 to pseudo-planar YCbCr444.
fcvColorRGB888ToYCbCr422PseudoPlanaru8Color conversion from RGB888 to pseudo-planar YCbCr422.
fcvColorRGB888ToYCbCr420PseudoPlanaru8Color conversion from RGB888 to pseudo-planar YCbCr420.
fcvColorYCbCr420PseudoPlanarToRGB565u8Color conversion from pseudo-planar YCbCr420 to RGB565.
fcvColorYCbCr422PseudoPlanarToRGB565u8Color conversion from pseudo-planar YCbCr422 to RGB565.
fcvColorYCbCr422PseudoPlanarToRGB888u8Color conversion from pseudo-planar YCbCr422 to RGB888.
fcvColorYCbCr422PseudoPlanarToRGBA8888u8Color conversion from pseudo-planar YCbCr422 to RGBA8888.
fcvColorYCbCr444PseudoPlanarToRGB565u8Color conversion from pseudo-planar YCbCr444 to RGB565.
fcvColorYCbCr444PseudoPlanarToRGB888u8Color conversion from pseudo-planar YCbCr444 to RGB888.
DCTfcvDCTu8Performs forward discrete Cosine transform on uint8_t pixels
FAST10fcvCornerFast10InMaskScoreu8Extracts FAST corners and scores from the image based on the mask.
fcvCornerFast10InMasku8Extracts FAST corners from the image.
fcvCornerFast10Scoreu8Extracts FAST corners and scores from the image
fcvCornerFast10u8Extracts FAST corners from the image.
FFTfcvFFTu8Computes the 1D or 2D Fast Fourier Transform of a real valued matrix.
fillConvexPolyfcvFillConvexPolyu8This function fills the interior of a convex polygon with the specified color.
filter2DfcvFilterCorrNxNu8NxN correlation with non-separable kernel. Border values are ignored in this function.
fcvFilterCorrNxNu8s16NxN correlation with non-separable kernel. Border values are ignored in this function.
fcvFilterCorrNxNu8f32NxN correlation with non-separable kernel. Border values are ignored in this function.
gaussianBlurfcvFilterGaussian3x3u8_v4Blurs an image with 3x3 Gaussian filter with border handling scheme specified by user
fcvFilterGaussian5x5u8_v3Blurs an image with 5x5 Gaussian filter
fcvFilterGaussian5x5s16_v3Blurs an image with 5x5 Gaussian filter
fcvFilterGaussian5x5s32_v3Blurs an image with 5x5 Gaussian filter
fcvFilterGaussian11x11u8_v2Blurs an image with 11x11 Gaussian filter
houghLinesfcvHoughLineu8Performs Hough Line detection
iDCTfcvIDCTs16Performs inverse discrete cosine transform on int16_t coefficients
IFFTfcvIFFTf32Computes the 1D or 2D Inverse Fast Fourier Transform of a complex valued matrix.
integrateImageYUVfcvIntegrateImageYCbCr420PseudoPlanaru8This function calculates the integral images of a YCbCr420 image, where the input YCbCr420 has UV interleaved.
matmuls8s32fcvMatrixMultiplys8s32Matrix multiplication of two int8_t type matrices
meanShiftfcvMeanShiftu8Applies the meanshift procedure and obtains the final converged position. Source image must be 8 bit grayscale image.
fcvMeanShifts32Applies the meanshift procedure and obtains the final converged position. Source image must be int 32bit grayscale image.
fcvMeanShiftf32Applies the meanshift procedure and obtains the final converged position. Source image must be float 32bit grayscale image.
MergefcvChannelCombine2Planesu8Combine two channels in an interleaved fashion
fcvChannelCombine3Planesu8Combine three channels in an interleaved fashion
fcvChannelCombine4Planesu8Combine four channels in an interleaved fashion
momentsfcvImageMomentsu8Computes weighted average (moment) of the image pixels’ intensities. Input must be of data 8-bit image.
fcvImageMomentss32Computes weighted average (moment) of the image pixels’ intensities. Input must be of data type int32_t.
fcvImageMomentsf32Computes weighted average (moment) of the image pixels’ intensities. Input must be of data type float32_t.
NormalizeLocalBoxfcvNormalizeLocalBoxu8Calculate the local subtractive and contrastive normalization of the image.
fcvNormalizeLocalBoxf32Calculate the local subtractive and contrastive normalization of the image.
remapfcvRemapu8_v2Applies a generic geometrical transformation to a greyscale CV_8UC1 image.
remapRGBAfcvRemapRGBA8888BLu8Applies a generic geometrical transformation to a 4-channel CV_8UC4 image with bilinear interpolation
fcvRemapRGBA8888NNu8Applies a generic geometrical transformation to a 4-channel CV_8UC4 image with nearest neighbor interpolation
resizeDownBy2fcvScaleDownBy2u8_v2Down-scale the image by averaging each 2x2 pixel block
resizeDownBy4fcvScaleDownBy4u8_v2Down-scale the image by averaging each 4x4 pixel block
ResizeDownFcvScaleDownMNu8Image downscaling using MN method
fcvScaleDownMNInterleaveu8Interleaved image downscaling using MN method
runMSERfcvMserInitFunction to initialize MSER.
fcvMserNN8InitFunction to initialize 8-neighbor MSER
fcvMserExtu8_v3Function to invoke MSER with a smaller memory footprint, the (optional) output of contour bound boxes, and additional information.
fcvMserExtNN8u8Function to invoke 8-neighbor MSER, with additional outputs for each contour.
fcvMserNN8u8Function to invoke 8-neighbor MSER.
fcvMserReleaseFunction to release MSER resources.
sepFilter2DfcvFilterCorrSepMxNu8MxN correlation with separable kernel.
fcvFilterCorrSep9x9s16_v29x9 FIR filter (convolution) with seperable kernel.
fcvFilterCorrSep11x11s16_v211x11 FIR filter (convolution) with seperable kernel.
fcvFilterCorrSep13x13s16_v213x13 correlation with separable kernel.
fcvFilterCorrSep15x15s16_v215x15 correlation with separable kernel.
fcvFilterCorrSep17x17s16_v217x17 correlation with separable kernel.
fcvFilterCorrSepNxNs16NxN correlation with separable kernel.
sobelfcvFilterSobel3x3u8_v23x3 Sobel edge filter
fcvFilterSobel3x3u8s16Creates a 2D gradient image from source luminance data without normalization. Convolution with the 3x3 Sobel kernel.
fcvFilterSobel5x5u8s16Creates a 2D gradient image from source luminance data without normalization. Convolution with the 5x5 Sobel kernel.
fcvFilterSobel7x7u8s16Creates a 2D gradient image from source luminance data without normalization. Convolution with the 7x7 Sobel kernel.
sobelPyramidfcvPyramidAllocateAllocates memory for Pyramid
fcvPyramidAllocate_v2Allocates memory for Pyramid
fcvPyramidAllocate_v3Allocates memory for Pyramid
fcvPyramidSobelGradientCreatei8Creates a gradient pyramid of integer8 from an image pyramid of uint8_t
fcvPyramidSobelGradientCreatei16Creates a gradient pyramid of int16_t from an image pyramid of uint8_t
fcvPyramidSobelGradientCreatef32Creates a gradient pyramid of float32 from an image pyramid of uint8_t
fcvPyramidDeleteDeallocates an array of fcvPyramidLevel. Can be used for any type(f32/s8/u8).
fcvPyramidDelete_v2Deallocates an array of fcvPyramidLevel. Can be used for any type(f32/s8/u8).
fcvPyramidCreatef32_v2Builds an image pyramid (with stride). Memory should be deallocated using fcvPyramidDelete_v2
fcvPyramidCreateu8_v4Builds a Gaussian image pyramid.
sobel3x3u8fcvImageGradientSobelPlanars8_v2Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u9fcvImageGradientSobelPlanars16_v2Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u10fcvImageGradientSobelPlanars16_v3Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u11fcvImageGradientSobelPlanarf32_v2Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u12fcvImageGradientSobelPlanarf32_v3Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
splitfcvDeinterleaveu8Performe image deinterleave for unsigned byte data.
fcvChannelExtractu8Extract channel as a single uint8_t type plane from an interleaved or multi-planar image format
thresholdRangefcvFilterThresholdRangeu8_v2Binarizes a grayscale image based on a pair of threshold values.
trackOpticalFlowLKfcvTrackLKOpticalFlowu8_v3Optical flow (with stride so ROI can be supported)
fcvTrackLKOpticalFlowu8Optical flow. Bitwidth optimized implementation
warpAffinefcvTransformAffineClippedu8_v3Applies an affine transformation on a grayscale image using a 2x3 matrix.
warpAffine3Planefcv3ChannelTransformAffineClippedBCu8Applies an affine transformation on a 3-color channel image using a 2x3 matrix using bicubic interpolation.
warpPatchAffinefcvTransformAffineu8_v2Warps the patch centered at nPos in the input image using the affine transform in nAffine
warpPerspectivefcvWarpPerspectiveu8_v5Warps a grayscale image using the a perspective projection transformation matrix (also known as a homography).
warpPerspective2Planefcv2PlaneWarpPerspectiveu8Perspective warp two images using the same transformation.

FastCV QDSP extensions
OpenCV extension APIsFastCV APIs usedDescription
CannyfcvFilterCannyu8QCanny edge detection with more algorithm configuration controls.
fcvdspinitfcvQ6InitInitializes the FastCV DSP environment.
fcvdspdeinitfcvQ6DeInitDeinitializes the FastCV DSP environment.
FFTfcvFFTu8QComputes the 1D or 2D Fast Fourier Transform of a real valued matrix.
filter2DfcvFilterCorr3x3s8_v2Q3x3 correlation with non-separable kernel.
fcvFilterCorrNxNu8QNxN correlation with non-separable kernel. Border values are ignored in this function.
fcvFilterCorrNxNu8s16QNxN correlation with non-separable kernel. Border values are ignored in this function.
fcvFilterCorrNxNu8f32QNxN correlation with non-separable kernel. Border values are ignored in this function.
IFFTfcvIFFTf32QComputes the 1D or 2D Inverse Fast Fourier Transform of a complex valued matrix.
sumOfAbsoluteDiffsfcvSumOfAbsoluteDiffs8x8u8_v2QSum of absolute differences of an image against an 8x8 template.
thresholdOtsufcvFilterThresholdOtsuu8QBinarizes a grayscale image using Otsu’s method.
For FastCV Extension details, see the extension’s documentation

Enable or disable FastCV acceleration

Enable Enable FastCV HAL acceleration by including -DWITH_FASTCV=ON in the OpenCV BitBake file in the EXTRA_OECMAKE options as shown below. This flag allows compilation of OpenCV APIs with the FastCV HAL.
DEPENDS:qcom-custom-bsp += "qcom-fastcv-binaries"

EXTRA_OECMAKE += "-DOPENCV_ALLOW_DOWNLOADS=ON"
EXTRA_OECMAKE:append:qcom-custom-bsp = " -DWITH_FASTCV=ON "
#python () {
Disable Disable FastCV HAL acceleration by including -DWITH_FASTCV=OFF in the OpenCV BitBake file in the EXTRA_OECMAKE options as shown below and then recompile the OpenCV recipe using the devtool method.
DEPENDS:qcom-custom-bsp += "qcom-fastcv-binaries"

EXTRA_OECMAKE:append:qcom-custom-bsp = " -DWITH_FASTCV=OFF "
#python () {
#    bsp_type = d.getVar('BSP_TYPE')
The following shows how this flag is included in the CMakeLists files (opencv/3rdparty/fastcv/CMakeLists.txt):
if(NOT WITH_FASTCV OR NOT FASTCV_DIR)
   message(STATUS "FastCV is not available, disabling related HAL and stuff")
   return()
endif()

if(NOT ANDROID AND NOT UNIX)
   message(FATAL_ERROR "FastCV HAL supports Android and UNIX only!")
endif()

set(OPENCV_3P_FASTCV_DIR ${CMAKE_CURRENT_SOURCE_DIR})
add_subdirectory(hal)
The following sample is one of the FastCV HAL API implementations with FastCV APIs. opencv/3rdparty/fastcv/src/fastcv_hal_core.cpp
int fastcv_hal_sub8u32f(
    const uchar*    src1_data,
    size_t          src1_step,
    const uchar*    src2_data,
    size_t          src2_step,
    float*          dst_data,
    size_t          dst_step,
    int             width,
    int             height)
{
    INITIALIZATION_CHECK;

    fcvStatus status = FASTCV_SUCCESS;

    if (src1_step < width && src2_step < width)
    {
       src1_step = width*sizeof(uchar);
       src2_step = width*sizeof(uchar);
       dst_step  = width*sizeof(float);
    }

    status = fcvImageDiffu8f32_v2(src1_data, src2_data, width, height, src1_step,
                                  src2_step, dst_data, dst_step);

    CV_HAL_RETURN(status,hal_subtract);
}