TK1之opencv配置 -03
三 – cuda安装
在opencv之前。确保cuda ok。
对嵌入式开发有两种选择,You have two options for developing CUDA applications for Jetson TK1
原生编译(native compilation)和交叉编译(cross-compilation)。
所谓原生编译,就是在目标板上直接运行自己的代码,以TK1为例,就是说在TK1目标板上编译代码;
所谓交叉编译,这也是大多数采用的编译方法,简单来说就是在台式机上编译,然后挂载在目标板上运行的方式。
对于开发TK1,推荐使用原生编译。
1)native compilation (compiling code onboard the Jetson TK1),Native compilation is generally the easiest option, but takes longer to compile, 2)whereas cross-compilation is typically more complex to configure and debug, but for large projects it will be noticeably faster at compiling.
The CUDA Toolkit currently only supports cross-compilation from an Ubuntu 12.04 or 14.04 Linux desktop.
In comparison, native compilation happens onboard the Jetson device and thus is the same no matter which OS or desktop you have.
所以不要download错了。Toolkit for L4T 的而不是Toolkit for Ubuntu。
Installing the CUDA Toolkit onto your device for native CUDA development
cross-compilation (compiling code on an x86 desktop in a special way so it can execute on the Jetson TK1 target device).
(Make sure you download the Toolkit for L4T and not the Toolkit for Ubuntu since that is for cross-compilation instead of native compilation)
3.1 安装 CUDA 6.5 Toolkit for L4T
下载:
http://developer.download.nvidia.com/embedded/L4T/r21_Release_v3.0/cuda-repo-l4t-r21.3-6-5-prod_6.5-42_armhf.deb
安装:
$ cd ~/Downloads
# Install the CUDA repo metadata that you downloaded manually for L4T
$ sudo dpkg -i ./cuda-repo-l4t-r21.3-6-5-prod_6.5-42_armhf.deb
更新apt-get:
# Download & install the actual CUDA Toolkit including the OpenGL toolkit from NVIDIA. (It only downloads around 15MB)
$ sudo apt-get update
3.2 安装 toolkit
# Install “cuda-toolkit-6-5” , etc.
$ sudo apt-get install cuda-toolkit-6-5
# 设置当前用户下可以访问GPU
# Add yourself to the “video” group to allow access to the GPU
$ sudo usermod -a -G video $USER
Add the 32-bit CUDA paths to your .bashrc login script,
and start using it in your current console:
$ echo “” >> ~/.bashrc
$ echo “# Add CUDA bin & library paths:” >> ~/.bashrc
$ echo “export PATH=/usr/local/cuda/bin:$PATH” >> ~/.bashrc
$ echo “export LD_LIBRARY_PATH=/usr/local/cuda/lib:${LD_LIBRARY_PATH}” >> ~/.bashrc
$ source ~/.bashrc
(?)sudo reboot
3.3 Verify cuda capacity
查看编译环境是否安装成功:
$ cd /usr/local/cuda
ok
$ nvcc -V
— nvcc: NVIDIA (R) Cuda compiler driver
— Cuda Release6.5 V6.5.35
3.3.1 apt方式安装CUDAsamples (good choice)
安装samples Installing & running the CUDA samples (optional)
$ sudo apt-get install cuda-samples-6-5
接着把devicequery给跑出来,运行下列代码:
$ cd /usr/local/cuda
$ sudo chmod o+w samples/ -R
$ cd samples/1_Utilities/deviceQuery
$ make
查看CUDA计算容量:
$ ../../bin/armv7l/linux/release/gnueabihf/deviceQueryarmv7l
—Detected 1 CUDA Capable device(s)
— CUDA Driver Version / Runtime Version 6.5 / 6.5
— UDA Capability Major/Minor version number: 3.2
— Result = PASS
最后几行应该显示cuda version 为 6.5 result =PASS
如果cuda的计算能力达不到3.0及以上,请跳过部分。
or,
3.3.2 源代码方式安装CUDAsamples
If you think you will write your own CUDA code or you want to see what CUDA can do,
then follow this section to build & run all of the CUDA samples.
Install writeable copies of the CUDA samples to your device’s home directory (it will create a “NVIDIA_CUDA-6.5_Samples” folder):
$ cuda-install-samples-6.5.sh /home/ubuntu
Build the CUDA samples (takes around 15 minutes on Jetson TK1):
$ cd ~/NVIDIA_CUDA-6.5_Samples
make
再测试cuda:
Run some CUDA samples:
1_Utilities/deviceQuery/deviceQuery
1_Utilities/bandwidthTest/bandwidthTest
cd 0_Simple/matrixMul
./matrixMulCUBLAS
cd ../..
cd 0_Simple/simpleTexture
./simpleTexture
cd ../..
cd 3_Imaging/convolutionSeparable
./convolutionSeparable
cd ../..
cd 3_Imaging/convolutionTexture
./convolutionTexture
cd ../..
3.4 远程测试:
Note:
Many of the CUDA samples use OpenGL GLX and open graphical windows.
If you are running these programs through an SSH remote terminal, you can remotely display the windows on your desktop by typing “export DISPLAY=:0” and then executing the program.
(This will only work if you are using a Linux/Unix machine or you run an X server such as the free “Xming” for Windows).
eg:
$ export DISPLAY=:0
$ cd ~/NVIDIA_CUDA-6.5_Samples/2_Graphics/simpleGL
./simpleGL
$ cd ~/NVIDIA_CUDA-6.5_Samples/3_Imaging/bicubicTexture
./bicubicTexture
$ cd ~/NVIDIA_CUDA-6.5_Samples/3_Imaging/bilateralFilter
./bilateralFilter
if
#error — unsupported GNU version! gcc 4.9 and up are not supported!
$ cd /usr/bin
rm gcc g++ cpp
ln -s gcc-4.9 gcc
ln -s g++-4.9 g++
ln -s cpp-4.9 cpp
3.5
Note:
the Optical Flow sample (HSOpticalFlow) and 3D stereo sample (stereoDisparity) take rouglhy 1 minute each to execute since they compare results with CPU code.
Some of the CUDA samples use other libraries such as OpenMP or MPI or OpenGL.
If you want to compile those samples then you’ll need to install these toolkits like this:
(to be added)
4 – opencv安装
Tegra 平台为OpenCV提供了GPU加速的功能。
除了OpenCV中的GPU模块外,例如核心的Core模块等一系列OpenCV中的常用模块都可以使用GPU加速。
但是GPU加速的前提是已经安装好Cuda,
并且用nvcc device 编译OpenCV。
在Jetson安装配置OpenCV有两种方法:
方法一,是使用官方的Jetpack安装包 安装方法,可以安装最新的Tegra4OpenCV,这需要有一台Ubuntu 14.04 LST X64系统的电脑,并使得Jetson Tk1进入Recovery模式。
直接采用nvidia提供的安装包优点:简单,同时增加了cpu neon的优化;缺点版本不够新,可能是2.4,对于基于opencv-3.0.0实现的代码可能需要修改代码。
方法二,从源码编译
要严格按照Cuda->Tegra4OpenCV->OpenCV的顺序,切不要颠倒,这个能够使用最新的opencv版本代码;缺点是过程复杂。
推荐: -> 手动源代码方式。
=== 手动源码方式 : ===
安装OpenCV主要分为安装Tegra4OpenCV和OpenCV源码两个部分:
*0. pre
*1. Tegra4OpenCV安装。
*2. OpenCV安装。
4.0 pre
启用Universe源:
$ sudo add-apt-repository universe
$ sudo apt-get update
安装几个必须的库:
# Some general development libraries 基本的g++编译器和cmake
$ sudo apt-get install build-essential make cmake cmake-curses-gui g++
# libav video input/output development libraries 输入输出库
$ sudo apt-get install libavformat-dev libavutil-dev libswscale-dev
# Video4Linux camera development libraries Video4Linux摄像头模块
$ sudo apt-get install libv4l-dev
# Eigen3 math development libraries Eigen3模块
$ sudo apt-get install libeigen3-dev
# OpenGL development libraries (to allow creating graphical windows)
# OpenGL开发模块 允许创建图形窗口 (并不是OpenGL全体)
$ sudo apt-get install libglew1.6-dev
# GTK development libraries (to allow creating graphical windows)
# GTK 开发库 (允许创建图形窗口)
$ sudo apt-get install libgtk2.0-dev
4.1 Tegra4OpenCV安装
4.1.1 下载 : libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb
**(注意版本)** 下载opencv deb包:
只有21.2没有21.3:
https://developer.nvidia.com/embedded/downloads <--- go to find waht you want --->
OR directly go to:
http://developer.download.nvidia.com/embedded/OpenCV/L4T_21.2/libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb
4.1.2 安装: OpenCV优化包
$ sudo dpkg -i libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb
$ sudo apt-get update
4.1.3 安装Tegra4OpenCV:
$ sudo apt-get install libopencv4tegra
$ sudo apt-get install libopencv4tegra-dev
这个可能安装不上去,一直出问题, 墙的问题,重复执行apt-get update,安装完所有更新后。
4.1.4 验证OpenCV是否安装成功:
cd /usr/lib 或者 cd/usr/include
查看有无opencv的库或者头文件。
4.2 OpenCV源装
4.2.1 下载 OpenCV 2.4.10
下载: http://opencv.org/
或者: 在TK1平台上直接下载:
$ wget http://downloads.sourceforge.net/project/opencvlibrary/opencv-unix/2.4.10/opencv-2.4.10.zip
OR: from the command-line you can run this on the device:
$ wget https://github.com/Itseez/opencv/archive/2.4.10.zip
4.2.2 config OpenCV 2.4.10
$ cd Downloads
$ unzip opencv-2.4.10.zip
$ cd opencv-2.4.10/
$ mkdir build
$ cd build
$ cmake -D WITH_CUDA=ON -D CUDA_ARCH_BIN=”3.2″ -D CUDA_ARCH_PTX=”” -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_OPENGL=ON -D WITH_QT=ON ..
or,
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 ..
注意:如果不添加-DWITH_QT=ON,那么-DWITH_OPENGL=ON将没有效果,
因此如果你开启OPENGL使能,还得先参考安装qt环境。
最后一句中 .. 的意思表示你的MakeFile文件在上一层文件夹,如果系统提示找不到MakeFile文件的话可以将它改为包含OpenCV Makefile的路径。
And the value 3.2 depend on you pc’s capacitys may be 3.2 maybe 3.5 maybe 3.8 …
… …
— Configuring done
— Generating done
这一步通过以后会出现config OK 的标志,表示检查已经成功,可以编译
4.2.3 install OpenCV 2.4.10
To install the built OpenCV library by copying the OpenCV library to “/usr/local/include” and “/usr/local/lib”:
$ sudo make -j4 install
20 minutes ..
最后没有出现错误
表示OpenCV已经安装成功。
$ sudo ldconfig
4.2.4 uninstall 2.4.10
if want to install other version of opencv: such as 2.4.13:
先手动删除opencv安装的一些文件:
$ sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv /usr/include/opencv2 /usr/include/opencv /usr/local/share/opencv /usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV /usr/local/bin/opencv* /usr/local/lib/libopencv*
same as before:
$ cd opencv-2.4.13
$ mkdir build
$ cd build
$ cmake -DWITH_CUDA=ON -D CUDA_ARCH_BIN=”3.2″ -D CUDA_ARCH_PTX=”” -D BUILD_TESTS=OFF -D WITH_OPENGL=ON -D WITH_QT=ON -D BUILD_PERF_TESTS=OFF ..
$ sudo make -j4 install
$ sudo ldconfig
4.2.7 配置环境变量 2.4.13
Finally, make sure your system searches the “/usr/local/lib” folder for libraries:
$ echo “” >> ~/.bashrc
$ echo “# Use OpenCV and other custom-built libraries.” >> ~/.bashrc
$ echo “export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/” >> ~/.bashrc
$ source ~/.bashrc
这样OpenCV的配置ok,可以在tk1上使用基于GPU加速的OpenCV函数库
4.2.8 测试运行几个OpenCV样例
第一个:
# Test a simple OpenCV program. Creates a graphical window, hence you should plug a HDMI monitor in or use a remote viewer such as X Tunneling or VNC or TeamViewer on your desktop.
$ cd ~/opencv-2.4.10/samples/cpp
$ g++ edge.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -o edge
$ ./edge
第二个:
# If you have a USB webcam plugged in to your board, then test one of the live camera programs.
$ g++ laplace.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o laplace
$ ./laplace
第三个示例:
# Test a GPU accelerated OpenCV sample.
$ cd ../gpu
$ g++ houghlines.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o houghlines
$ ./houghlines ../cpp/logo_in_clutter.png
REF:http://blog.csdn.net/zyazky/article/details/52388605
Q1
if ERROR:
In file included from /home/ubuntu/opencv-2.4.10/modules/core/src/opengl_interop.cpp:52:0:
/usr/local/cuda/include/cuda_gl_interop.h:64:2: error: #error Please include the appropriate gl headers before including cuda_gl_interop.h
#error Please include the appropriate gl headers before including cuda_gl_interop.h
^
make[2]: *** [modules/core/CMakeFiles/opencv_core.dir/src/opengl_interop.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs….
make[1]: *** [modules/core/CMakeFiles/opencv_core.dir/all] Error 2
then:
$ sudo vi /usr/local/cuda/include/cuda_gl_interop.h
comments collowing lines:
#if defined(__arm__) || defined(__aarch64__)
//#ifndef GL_VERSION
//#error Please include the appropriate gl headers before including cuda_gl_interop.h
//#endif
//#else
REFs:
https://github.com/opencv/opencv/issues/5205
https://devtalk.nvidia.com/default/topic/1007290/building-opencv-with-opengl-support-/
Q2
IF error:
/home/ubuntu/Downloads/opencv-2.4.10/build/modules/ghgui/src/window_QT.cpp:3111:12: error: ‘GL_PERSPECTIVE_CORRECTION_HINT’ was not declared in this scope
then:
$ vi /home/ubuntu/Downloads/opencv-2.4.10/modules/highgui/src/window_QT.cpp
add :
#define GL_PERSPECTIVE_CORRECTION_HINT 0x0C50
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVIDIA* CUDA* Runtime API and supports only NVIDIA GPUs.The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others) ready to be used by the application developers.
The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.
The OpenCV GPU module is designed for ease of use and does not require any knowledge of CUDA. Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest performance. It is helpful to understand the cost of various operations, what the GPU does, what the preferred data formats are, and so on.
The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if your algorithm involves many simple operations, then, for the best possible performance, you may still need to write your own kernels to avoid extra write and read operations on the intermediate results.
To enable CUDA support, configure OpenCV using CMake with WITH_CUDA=ON . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built.
Otherwise, the module is still built but at runtime all functions from the module throw Exception with CV_GpuNotSupported error code, except for gpu::getCudaEnabledDeviceCount(). The latter function returns zero GPU count in this case.
Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using the gpu::getCudaEnabledDeviceCount() function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose an appropriate implementation (CPU or GPU) accordingly.
Compilation for Different NVIDIA* Platforms :
NVIDIA* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails. By default, the OpenCV GPU module includes:
Binaries for compute capabilities 1.3 and 2.0 (controlled by CUDA_ARCH_BIN in CMake)
PTX code for compute capabilities 1.1 and 1.3 (controlled by CUDA_ARCH_PTX in CMake)
This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT’ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT’ed. For devices with CC 1.0, no code is available and the functions throw Exception. For platforms where JIT compilation is performed first, the run is slow.
On a GPU with CC 1.0, you can still compile the GPU module and most of the functions will run flawlessly. To achieve this, add “1.0” to the list of binaries, for example, CUDA_ARCH_BIN=”1.0 1.3 2.0″ . The functions that cannot be run on CC 1.0 GPUs throw an exception.
You can always determine at runtime whether the OpenCV GPU-built binaries (or PTX code) are compatible with your GPU. The function gpu::DeviceInfo::isCompatible() returns the compatibility status (true/false).
Utilizing Multiple GPUs :
In the current version, each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, you have to manually distribute the work between GPUs. Switching active devie can be done using gpu::setDevice() function. For more details please read Cuda C Programming Guide.
While developing algorithms for multiple GPUs, note a data passing overhead. For primitive functions and small images, it can be significant, which may eliminate all the advantages of having multiple GPUs. But for high-level algorithms, consider using multi-GPU acceleration. For example, the Stereo Block Matching algorithm has been successfully parallelized using the following algorithm:
Split each image of the stereo pair into two horizontal overlapping stripes.
Process each pair of stripes (from the left and right images) on a separate Fermi* GPU.
Merge the results into a single disparity map.
With this algorithm, a dual GPU gave a 180 % performance increase comparing to the single Fermi GPU. For a source code example, see https://github.com/opencv/opencv/tree/master/samples/gpu/.