Evaluate FastCV acceleration - Qualcomm Dragonwing Documentation

How to measure FastCV HAL vs OpenCV performance

Compile the build with either of the following options

To enable FastCV acceleration, use the -DWITH_FASTCV=ON option
To enable default OpenCV on CPU, use the -DWITH_FASTCV=OFF option

NoteBy default OpenCV acceleration with FastCV is enabled.

Once compilation is done, flash the build and boot the device. All libraries are present in the /usr/lib/ directory.

Copy the test bins to /usr/bin to run the tests.

scp -r opencv_perf_core root@[IP-address]:/usr/bin/

Copy the test data to the device if not previously copied. To obtain the test data, clone the projects at https://github.com/opencv/opencv_extra/tree/4.13.0. Use the scp command to push the test data to the desired location on the host. For example:
```
scp -r [file] root@[IP-ADDR]:/tmp
```

To start the test on the target device, run the following commands.

cd /usr/bin/
chmod 777 opencv_perf*
export OPENCV_OPENCL_RUNTIME=disabled && export OPENCV_TEST_DATA_PATH=/tmp && /usr/bin/opencv_perf_core --gtest_filter=ArithmMixedTest.subtract/2 --perf_min_samples=100 --perf_force_samples=100 >> results_Arithm.txt

The above commands run different test cases for the subtract API, with each test case looped over 100 times. The results are collected in a results_Arithm.txt text file. results_Arithm.txt includes details for different test cases including the test name, number of samples, resolution, mean time, pass/fail status. For the following test case with FastCV acceleration enabled, the total time taken was 16 ms.

With the default OpenCV, for the same test case, the total time taken was 23 ms.

Run other test cases and compare the latency numbers between default OpenCV and FastCV accelerated OpenCV.

Supported OpenCV APIs and corresponding FastCV APIs

Supported OpenCV APIs

OpenCV module	OpenCV API	Underlying FastCV API for OpenCV acceleration
IMGPROC	medianBlur	fcvFilterMedian3x3u8_v3
	sobel	fcvFilterSobel3x3u8s16
	sobel	fcvFilterSobel5x5u8s16
	sobel	fcvFilterSobel7x7u8s16
	boxFilter	fcvBoxFilter3x3u8_v3
	boxFilter	fcvBoxFilter5x5u8_v2
	adaptiveThreshold	fcvAdaptiveThresholdGaussian3x3u8_v2
	adaptiveThreshold	fcvAdaptiveThresholdGaussian5x5u8_v2
	adaptiveThreshold	fcvAdaptiveThresholdMean3x3u8_v2
	adaptiveThreshold	fcvAdaptiveThresholdMean5x5u8_v2
	subtract	fcvImageDiffu8f32_v2
	pyrDown	fcvPyramidCreateu8_v4
	cvtColor	fcvColorRGB888toYCrCbu8_v3
	cvtColor	fcvColorRGB888ToHSV888u8
	GaussianBlur	fcvFilterGaussian5x5u8_v3
	GaussianBlur	fcvFilterGaussian3x3u8_v4
	cvWarpPerspective	fcvWarpPerspectiveu8_v5
	Canny	fcvFilterCannyu8
	boxFilter	fcvBoxFilterNxNf32
CORE	lut	fcvTableLookupu8
	norm	fcvHammingDistanceu8
	multiply	fcvElementMultiplyu8u16_v2
	multiply	fcvElementMultiplyu8
	multiply	fcvElementMultiplys16
	multiply	fcvElementMultiplyf32
	transpose	fcvTransposeu8_v2
	transpose	fcvTransposeu16_v2
	transpose	fcvTransposef32_v2
	meanStdDev	fcvImageIntensityStats_v2
	flip	fcvFlipu8
	flip	fcvFlipu16
	flip	fcvFlipRGB888u8
	rotate	fcvRotateImageu8
	rotate	fcvRotateImageInterleavedu8
	addWeighted	fcvAddWeightedu8_v2
	SVD	fcvSVDf32_v2
	Gemm	fcvMatrixMultiplyf32_v2
	Gemm	fcvMultiplyScalarf32
	Gemm	fcvAddf32_v2

FastCV CPU extension

OpenCV extension APIs	FastCV APIs used	Description
arithmetic_op	fcvAddu8	Matrix addition of two uint8_t type matrixes to one uint8_t matrix
	fcvAddf32_v2	Matrix addition of two float32_t type matrixes.
	fcvAdds16_v2	Matrix addition of two int16_t type matrixes which allows in-place operation
	fcvSubtracts16	Matrix substration of two uint16_t type matrixes
	fcvSubtractu8	Matrix substration of two uint8_t type matrixes
bilateralFilter	fcvBilateralFilter5x5u8_v3	Bilateral smoothing with 5x5 bilateral kernel
	fcvBilateralFilter7x7u8_v3	Bilateral smoothing with 7x7 bilateral kernel
	fcvBilateralFilter9x9u8_v3	Bilateral smoothing with 9x9 bilateral kernel
bilateralRecursive	fcvBilateralFilterRecursiveu8	Here the smoothing is actually performed in gradient domain.
buildPyramid	fcvPyramidAllocate_v3	Allocates memory for an image pyramid. This API can be removed without notice and should only be used for testing.
	fcvPyramidCreateu8_v4	Creates an image pyramid from an 8-bit unsigned (grayscale) source image.
	fcvPyramidDelete_v2	Deallocates an array of fcvPyramidLevel. Can be used for any type (f32/s8/u8).
calcHist	fcvImageIntensityHistogram	Creates a histogram of intensities for a rectangular region of a grayscale image.
clusterEuclidean	fcvClusterEuclideanu8	General function for computing cluster centers and cluster bindings
cvtColor	fcvColorYCbCr420PseudoPlanarToYCbCr444PseudoPlanaru8	Color conversion from pseudo planar YCbCr420 to pseudo planar YCbCr444.
	fcvColorYCbCr420PseudoPlanarToYCbCr422PseudoPlanaru8	Color conversion from pseudo planar YCbCr420 to pseudo planar YCbCr422.
	fcvColorYCbCr422PseudoPlanarToYCbCr444PseudoPlanaru8	Color conversion from pseudo planar YCbCr422 to pseudo planar YCbCr444.
	fcvColorYCbCr422PseudoPlanarToYCbCr420PseudoPlanaru8	Color conversion from pseudo planar YCbCr422 to pseudo planar YCbCr420.
	fcvColorYCbCr444PseudoPlanarToYCbCr422PseudoPlanaru8	Color conversion from pseudo planar YCbCr444 to pseudo planar YCbCr422.
	fcvColorYCbCr444PseudoPlanarToYCbCr420PseudoPlanaru8	Color conversion from pseudo planar YCbCr444 to pseudo planar YCbCr420.
	fcvColorRGB565ToYCbCr444PseudoPlanaru8	Color conversion from RGB565 to pseudo-planar YCbCr444.
	fcvColorRGB565ToYCbCr422PseudoPlanaru8	Color conversion from RGB565 to pseudo-planar YCbCr422.
	fcvColorRGB565ToYCbCr420PseudoPlanaru8	Color conversion from RGB565 to pseudo-planar YCbCr420.
	fcvColorRGB888ToYCbCr444PseudoPlanaru8	Color conversion from RGB888 to pseudo-planar YCbCr444.
	fcvColorRGB888ToYCbCr422PseudoPlanaru8	Color conversion from RGB888 to pseudo-planar YCbCr422.
	fcvColorRGB888ToYCbCr420PseudoPlanaru8	Color conversion from RGB888 to pseudo-planar YCbCr420.
	fcvColorYCbCr420PseudoPlanarToRGB565u8	Color conversion from pseudo-planar YCbCr420 to RGB565.
	fcvColorYCbCr422PseudoPlanarToRGB565u8	Color conversion from pseudo-planar YCbCr422 to RGB565.
	fcvColorYCbCr422PseudoPlanarToRGB888u8	Color conversion from pseudo-planar YCbCr422 to RGB888.
	fcvColorYCbCr422PseudoPlanarToRGBA8888u8	Color conversion from pseudo-planar YCbCr422 to RGBA8888.
	fcvColorYCbCr444PseudoPlanarToRGB565u8	Color conversion from pseudo-planar YCbCr444 to RGB565.
	fcvColorYCbCr444PseudoPlanarToRGB888u8	Color conversion from pseudo-planar YCbCr444 to RGB888.
DCT	fcvDCTu8	Performs forward discrete Cosine transform on uint8_t pixels
FAST10	fcvCornerFast10InMaskScoreu8	Extracts FAST corners and scores from the image based on the mask.
	fcvCornerFast10InMasku8	Extracts FAST corners from the image.
	fcvCornerFast10Scoreu8	Extracts FAST corners and scores from the image
	fcvCornerFast10u8	Extracts FAST corners from the image.
FFT	fcvFFTu8	Computes the 1D or 2D Fast Fourier Transform of a real valued matrix.
fillConvexPoly	fcvFillConvexPolyu8	This function fills the interior of a convex polygon with the specified color.
filter2D	fcvFilterCorrNxNu8	NxN correlation with non-separable kernel. Border values are ignored in this function.
	fcvFilterCorrNxNu8s16	NxN correlation with non-separable kernel. Border values are ignored in this function.
	fcvFilterCorrNxNu8f32	NxN correlation with non-separable kernel. Border values are ignored in this function.
gaussianBlur	fcvFilterGaussian3x3u8_v4	Blurs an image with 3x3 Gaussian filter with border handling scheme specified by user
	fcvFilterGaussian5x5u8_v3	Blurs an image with 5x5 Gaussian filter
	fcvFilterGaussian5x5s16_v3	Blurs an image with 5x5 Gaussian filter
	fcvFilterGaussian5x5s32_v3	Blurs an image with 5x5 Gaussian filter
	fcvFilterGaussian11x11u8_v2	Blurs an image with 11x11 Gaussian filter
houghLines	fcvHoughLineu8	Performs Hough Line detection
iDCT	fcvIDCTs16	Performs inverse discrete cosine transform on int16_t coefficients
IFFT	fcvIFFTf32	Computes the 1D or 2D Inverse Fast Fourier Transform of a complex valued matrix.
integrateImageYUV	fcvIntegrateImageYCbCr420PseudoPlanaru8	This function calculates the integral images of a YCbCr420 image, where the input YCbCr420 has UV interleaved.
matmuls8s32	fcvMatrixMultiplys8s32	Matrix multiplication of two int8_t type matrices
meanShift	fcvMeanShiftu8	Applies the meanshift procedure and obtains the final converged position. Source image must be 8 bit grayscale image.
	fcvMeanShifts32	Applies the meanshift procedure and obtains the final converged position. Source image must be int 32bit grayscale image.
	fcvMeanShiftf32	Applies the meanshift procedure and obtains the final converged position. Source image must be float 32bit grayscale image.
Merge	fcvChannelCombine2Planesu8	Combine two channels in an interleaved fashion
	fcvChannelCombine3Planesu8	Combine three channels in an interleaved fashion
	fcvChannelCombine4Planesu8	Combine four channels in an interleaved fashion
moments	fcvImageMomentsu8	Computes weighted average (moment) of the image pixels’ intensities. Input must be of data 8-bit image.
	fcvImageMomentss32	Computes weighted average (moment) of the image pixels’ intensities. Input must be of data type int32_t.
	fcvImageMomentsf32	Computes weighted average (moment) of the image pixels’ intensities. Input must be of data type float32_t.
NormalizeLocalBox	fcvNormalizeLocalBoxu8	Calculate the local subtractive and contrastive normalization of the image.
NormalizeLocalBox	fcvNormalizeLocalBoxf32	Calculate the local subtractive and contrastive normalization of the image.
remap	fcvRemapu8_v2	Applies a generic geometrical transformation to a greyscale CV_8UC1 image.
remapRGBA	fcvRemapRGBA8888BLu8	Applies a generic geometrical transformation to a 4-channel CV_8UC4 image with bilinear interpolation
remapRGBA	fcvRemapRGBA8888NNu8	Applies a generic geometrical transformation to a 4-channel CV_8UC4 image with nearest neighbor interpolation
resizeDownBy2	fcvScaleDownBy2u8_v2	Down-scale the image by averaging each 2x2 pixel block
resizeDownBy4	fcvScaleDownBy4u8_v2	Down-scale the image by averaging each 4x4 pixel block
ResizeDown	FcvScaleDownMNu8	Image downscaling using MN method
ResizeDown	fcvScaleDownMNInterleaveu8	Interleaved image downscaling using MN method
runMSER	fcvMserInit	Function to initialize MSER.
	fcvMserNN8Init	Function to initialize 8-neighbor MSER
	fcvMserExtu8_v3	Function to invoke MSER with a smaller memory footprint, the (optional) output of contour bound boxes, and additional information.
	fcvMserExtNN8u8	Function to invoke 8-neighbor MSER, with additional outputs for each contour.
	fcvMserNN8u8	Function to invoke 8-neighbor MSER.
	fcvMserRelease	Function to release MSER resources.
sepFilter2D	fcvFilterCorrSepMxNu8	MxN correlation with separable kernel.
	fcvFilterCorrSep9x9s16_v2	9x9 FIR filter (convolution) with seperable kernel.
	fcvFilterCorrSep11x11s16_v2	11x11 FIR filter (convolution) with seperable kernel.
	fcvFilterCorrSep13x13s16_v2	13x13 correlation with separable kernel.
	fcvFilterCorrSep15x15s16_v2	15x15 correlation with separable kernel.
	fcvFilterCorrSep17x17s16_v2	17x17 correlation with separable kernel.
	fcvFilterCorrSepNxNs16	NxN correlation with separable kernel.
sobel	fcvFilterSobel3x3u8_v2	3x3 Sobel edge filter
	fcvFilterSobel3x3u8s16	Creates a 2D gradient image from source luminance data without normalization. Convolution with the 3x3 Sobel kernel.
	fcvFilterSobel5x5u8s16	Creates a 2D gradient image from source luminance data without normalization. Convolution with the 5x5 Sobel kernel.
	fcvFilterSobel7x7u8s16	Creates a 2D gradient image from source luminance data without normalization. Convolution with the 7x7 Sobel kernel.
sobelPyramid	fcvPyramidAllocate	Allocates memory for Pyramid
	fcvPyramidAllocate_v2	Allocates memory for Pyramid
	fcvPyramidAllocate_v3	Allocates memory for Pyramid
	fcvPyramidSobelGradientCreatei8	Creates a gradient pyramid of integer8 from an image pyramid of uint8_t
	fcvPyramidSobelGradientCreatei16	Creates a gradient pyramid of int16_t from an image pyramid of uint8_t
	fcvPyramidSobelGradientCreatef32	Creates a gradient pyramid of float32 from an image pyramid of uint8_t
	fcvPyramidDelete	Deallocates an array of fcvPyramidLevel. Can be used for any type(f32/s8/u8).
	fcvPyramidDelete_v2	Deallocates an array of fcvPyramidLevel. Can be used for any type(f32/s8/u8).
	fcvPyramidCreatef32_v2	Builds an image pyramid (with stride). Memory should be deallocated using fcvPyramidDelete_v2
	fcvPyramidCreateu8_v4	Builds a Gaussian image pyramid.
sobel3x3u8	fcvImageGradientSobelPlanars8_v2	Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u9	fcvImageGradientSobelPlanars16_v2	Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u10	fcvImageGradientSobelPlanars16_v3	Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u11	fcvImageGradientSobelPlanarf32_v2	Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
sobel3x3u12	fcvImageGradientSobelPlanarf32_v3	Creates a 2D gradient image from source luminance data using 3x3 neighborhood with Sobel kernel
split	fcvDeinterleaveu8	Performe image deinterleave for unsigned byte data.
split	fcvChannelExtractu8	Extract channel as a single uint8_t type plane from an interleaved or multi-planar image format
thresholdRange	fcvFilterThresholdRangeu8_v2	Binarizes a grayscale image based on a pair of threshold values.
trackOpticalFlowLK	fcvTrackLKOpticalFlowu8_v3	Optical flow (with stride so ROI can be supported)
trackOpticalFlowLK	fcvTrackLKOpticalFlowu8	Optical flow. Bitwidth optimized implementation
warpAffine	fcvTransformAffineClippedu8_v3	Applies an affine transformation on a grayscale image using a 2x3 matrix.
warpAffine3Plane	fcv3ChannelTransformAffineClippedBCu8	Applies an affine transformation on a 3-color channel image using a 2x3 matrix using bicubic interpolation.
warpPatchAffine	fcvTransformAffineu8_v2	Warps the patch centered at nPos in the input image using the affine transform in nAffine
warpPerspective	fcvWarpPerspectiveu8_v5	Warps a grayscale image using the a perspective projection transformation matrix (also known as a homography).
warpPerspective2Plane	fcv2PlaneWarpPerspectiveu8	Perspective warp two images using the same transformation.

FastCV QDSP extensions

OpenCV extension APIs	FastCV APIs used	Description
Canny	fcvFilterCannyu8Q	Canny edge detection with more algorithm configuration controls.
fcvdspinit	fcvQ6Init	Initializes the FastCV DSP environment.
fcvdspdeinit	fcvQ6DeInit	Deinitializes the FastCV DSP environment.
FFT	fcvFFTu8Q	Computes the 1D or 2D Fast Fourier Transform of a real valued matrix.
filter2D	fcvFilterCorr3x3s8_v2Q	3x3 correlation with non-separable kernel.
	fcvFilterCorrNxNu8Q	NxN correlation with non-separable kernel. Border values are ignored in this function.
	fcvFilterCorrNxNu8s16Q	NxN correlation with non-separable kernel. Border values are ignored in this function.
	fcvFilterCorrNxNu8f32Q	NxN correlation with non-separable kernel. Border values are ignored in this function.
IFFT	fcvIFFTf32Q	Computes the 1D or 2D Inverse Fast Fourier Transform of a complex valued matrix.
sumOfAbsoluteDiffs	fcvSumOfAbsoluteDiffs8x8u8_v2Q	Sum of absolute differences of an image against an 8x8 template.
thresholdOtsu	fcvFilterThresholdOtsuu8Q	Binarizes a grayscale image using Otsu’s method.

For FastCV Extension details, see the extension’s documentation

Enable or disable FastCV acceleration

Enable Enable FastCV HAL acceleration by including -DWITH_FASTCV=ON in the OpenCV BitBake file in the EXTRA_OECMAKE options as shown below. This flag allows compilation of OpenCV APIs with the FastCV HAL.

DEPENDS:qcom-custom-bsp += "qcom-fastcv-binaries"

EXTRA_OECMAKE += "-DOPENCV_ALLOW_DOWNLOADS=ON"
EXTRA_OECMAKE:append:qcom-custom-bsp = " -DWITH_FASTCV=ON "
#python () {

Disable Disable FastCV HAL acceleration by including -DWITH_FASTCV=OFF in the OpenCV BitBake file in the EXTRA_OECMAKE options as shown below and then recompile the OpenCV recipe using the devtool method.

DEPENDS:qcom-custom-bsp += "qcom-fastcv-binaries"

EXTRA_OECMAKE:append:qcom-custom-bsp = " -DWITH_FASTCV=OFF "
#python () {
#    bsp_type = d.getVar('BSP_TYPE')

The following shows how this flag is included in the CMakeLists files (opencv/3rdparty/fastcv/CMakeLists.txt):

if(NOT WITH_FASTCV OR NOT FASTCV_DIR)
   message(STATUS "FastCV is not available, disabling related HAL and stuff")
   return()
endif()

if(NOT ANDROID AND NOT UNIX)
   message(FATAL_ERROR "FastCV HAL supports Android and UNIX only!")
endif()

set(OPENCV_3P_FASTCV_DIR ${CMAKE_CURRENT_SOURCE_DIR})
add_subdirectory(hal)

The following sample is one of the FastCV HAL API implementations with FastCV APIs. opencv/3rdparty/fastcv/src/fastcv_hal_core.cpp

int fastcv_hal_sub8u32f(
    const uchar*    src1_data,
    size_t          src1_step,
    const uchar*    src2_data,
    size_t          src2_step,
    float*          dst_data,
    size_t          dst_step,
    int             width,
    int             height)
{
    INITIALIZATION_CHECK;

    fcvStatus status = FASTCV_SUCCESS;

    if (src1_step < width && src2_step < width)
    {
       src1_step = width*sizeof(uchar);
       src2_step = width*sizeof(uchar);
       dst_step  = width*sizeof(float);
    }

    status = fcvImageDiffu8f32_v2(src1_data, src2_data, width, height, src1_step,
                                  src2_step, dst_data, dst_step);

    CV_HAL_RETURN(status,hal_subtract);
}

​How to measure FastCV HAL vs OpenCV performance

​Supported OpenCV APIs and corresponding FastCV APIs

​Enable or disable FastCV acceleration

How to measure FastCV HAL vs OpenCV performance

Supported OpenCV APIs and corresponding FastCV APIs

Enable or disable FastCV acceleration