TK1之caffe配置 -05

五 – cudnn安装
5.1 安装老的cudnn-6.5, 2.0版本 (推荐)
5.2 卸载老版本cudnn
5.3 安装新cudnn-7.0, 4.0版本 (不建议)
5.4 回到老的2.0版本的cudnn-6.5以配合cuda-6.5
六 – caffe安装
6.1 准备Caffe环境
6.2 下载
6.3 config
6.4 build
6.5 runtest的CUDNN_STATUS_NOT_INITIALIZED出错
6.6 LMDB_MAP_SIZE出错
6.7 跑机 benchmarking
七 – Python and/or MATLAB
7.1 Ubuntu方式 (ok)
7.2 pip方式 (not tested)
7.3 编译python接口
7.4 环境变量

五 – cudnn安装

如果要安装caffe, 需要先安装cudnn。

对于32bit的arm-tk1,适合配置是cuda6.5 + cudnn2.0 + caffe0.13,bvlc官网1.0的msater分支不能用。
对于32bit的mini-pc, 可以尝试cuda6.5 + cudnn4.0 + cafe1.00, cuda6.5是支持32bit的维一选择。

5.1 安装老的cudnn-6.5, 2.0版本 (推荐)

下载 V2:
https://developer.nvidia.com/rdp/cudnn-archive
$ tar -zxvf cudnn-6.5-linux-ARMv7-V1.tgz
$ cd cudnn-6.5-linux-ARMv7-V2

复制文件
$ sudo cp cudnn.h /usr/local/cuda/include
$ sudo cp libcudnn* /usr/local/cuda/lib

重新加载库:
$ sudo ldconfig -v

make caffe时出错。 尝试新的cudnn4.0。原因可能caffe是新版本,所以要新版本cudn持,而和r1和v2版本部分函数名称改变。

5.2 卸载老版本cudnn

$ sudo rm /usr/local/cuda/include/cudnn.h
$ sudo rm /usr/local/cuda/lib/libcudnn*

5.3 安装新cudnn-7.0, 4.0版本 (不建议)

下载: https://developer.nvidia.com/rdp/cudnn-archive
新的4.0: cuDNN v4 Library for L4T (ARMv7)版cudnn-7.0-Linux-ARMv7-v4.0-prod.tgz
$ tar -zxvf cudnn-7.0-linux-ARMv7-v4.0-prod.tgz

产生cuda文件夹
$ cd ~/Downloads/cuda/

复制cuDNN文件:
$ sudo cp include/cudnn.h /usr/local/cuda/include/
$ sudo cp lib/libcudnn* /usr/local/cuda/lib/
$ sudo ldconfig -v

重新加载库:
$ sudo ldconfig -v

在32bit的arm体系的TK1上make runtest不成功。应该是因为github的caffe太新了。

5.4 回到老的2.0版本的cudnn-6.5以配合cuda-6.5

$ sudo rm /usr/local/cuda/include/cudnn.h
$ sudo rm /usr/local/cuda/lib/libcudnn*
# cd 6.5 cudnn v2
$ sudo cp cudnn.h /usr/local/cuda/include
$ sudo cp libcudnn* /usr/local/cuda/lib
$ sudo ldconfig -v

六 – caffe安装

6.1 准备Caffe环境

$ sudo add-apt-repository universe
$ sudo apt-get update

$ sudo apt-get install libprotobuf-dev protobuf-compiler 、
$ sudo apt-get install cmake libleveldb-dev libsnappy-dev
$ sudo apt-get install libatlas-base-dev libhdf5-serial-dev libgflags-dev

在python接口的scipy库的时候,需要fortran编译器(gfortran),如果没有报错,因此可以先安装:
$ sudo apt-get install gfortran

安装下面可能需要梯子:
$ sudo apt-get install libgoogle-glog-dev liblmdb-dev

boost这一步视情况而定。有的说boost的版本会有问题,建议降低版本到1.55版本的。因为caffe官网给的是 $ sudo apt-get install –no-install-recommends libboost-all-dev , 特意加了个–no-install-recommends。而且installation主页特意有Boost>=1.55要求。
$ sudo apt-get install libboost-dev libboost-thread-dev libboost-system-dev

但是上面这个网上常见的安装在ubuntu14.04会默认安装1.54。所以如果装了的话卸载, 完了后用这个来安装:
检查下:
$ dpkg -S /usr/include/boost/version.hpp
— 1.5.4
卸载掉:
$ sudo apt-get autoremove libboost1.54-dev 这个autoremove一下卸载了上百个包,free出 980M space, 杀伤力太大。
再安装1.55:
$ sudo apt-get install libboost1.55-all-dev libboost-thread1.55-dev libboost-system1.55-dev libboost-filesystem1.55-dev

编译器这步骤视情况而定。系统自带shi 4.8,如果出错,随时可以降级为4.7,如果编译caffe时出现编译器错误则降低版本:
$ sudo apt-get install gcc-4.7 g++-4.7 cpp-4.7
$ cd /usr/bin
$ sudo rm gcc g++ cpp
$ sudo ln -s gcc-4.9 gcc
$ sudo ln -s g++-4.9 g++
$ sudo ln -s cpp-4.9 cpp

赋予访问权限:
$ sudo usermod -a -G video $USER

all fine!

6.2 下载

# Git clone Caffe
$ git clone https://github.com/BVLC/caffe.git
$ cd caffe
$ git checkout dev
$ cp Makefile.config.example Makefile.config

6.3 config

$ vi Makefile.config
使能USE_CUDNN := 1

查看CUDA计算容量
$ /usr/local/cuda/samples/1_Utilities/deviceQuery
— 3.2
在caffe的Makefile.config文件中,禁止CUDA_ARCH *的60和61
CUDA_ARCH := -gencode arch=compute_20,code=sm_20
  -gencode arch=compute_20,code=sm_21
  -gencode arch=compute_21,code=sm_21 <-------------------   -gencode arch=compute_30,code=sm_30   -gencode arch=compute_35,code=sm_35   -gencode arch=compute_50,code=sm_50   -gencode arch=compute_50,code=compute_50 如果 make caffe时有 Error: nvcc fatal : Unsupported gpu architecture 'compute_60' $ vi Makefile.config. # CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility. # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility. CUDA_ARCH := -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 # -gencode arch=compute_60,code=sm_60 # -gencode arch=compute_61,code=sm_61 # -gencode arch=compute_61,code=compute_61 6.4 build (只要出错, 按照下面顺序来过: make clean -> make all -j4 -> make test -j4 ->make runtest -j4 )
$ make clean ok.
$ make -j 4 all 20 mins ok
$ make test -j4 5 minute ok
$ make runtest -j4 10 mins

出问题的基本都是runtest:
Q1
常见错误是sudo, 运行runtest测试不需要sudo。
Q2
Error
g++: internal compiler error: Killed (program cc1plus)
解决方法:重启即可,原因不明(有说是内存问题)

6.5 runtest的CUDNN_STATUS_NOT_INITIALIZED出错

Note: Randomizing tests’ orders with a seed of 86430 .
F1222 00:00:55.561334 15196 cudnn_softmax_layer.cpp:15] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***

This indicates a CuDNN/CUDA error. I’ve seen this when the CUDA/driver version and the CuDNN version are mismatched (5.0.5 and CUDA 7.0). This is almost certainly something to do with your setup, and not a bug in Caffe.
I encountered the same issue and finally found out that there is a mysterious
so you mean in order for this to compile successfully, we need to compile Caffe without the support of CUDNN right?
Thank you verymuch . I really appreciate your help 🙂
Q1
有的说需要降低版本:
$ cd /usr/bin
$ sudo rm gcc g++ cpp
$ sudo ln -s gcc-4.9 gcc
$ sudo ln -s g++-4.9 g++
$ sudo ln -s cpp-4.9 cpp
Q2
有的说是boost的版本问题,建议降低版本到1.55版本的。因为caffe官网给的是 $ sudo apt-get install –no-install-recommends libboost-all-dev
特意加了个–no-install-recommends。而且installation主页有Boost>=1.55,可以抵触的是上面这命令默认装的是1.54。所以卸载完了后用这个安装:$ sudo apt-get install libboost1.55-all-dev , 估计没事了吧。
校验:
$ dpkg -S /usr/include/boost/version.hpp
— 1.5.4
卸载:
$ sudo apt-get autoremove libboost1.54-dev 没必要吧,估计不是?
这个autoremove以下卸载了上百个包,free出 980M space, 杀伤力太大。
安装:
$ sudo apt-get install libboost1.55-all-dev
有小故障:
cannot find -l boost_system boost_filesystem boost_thread
$ apt-cache search libboost | grep 1.55
$ sudo apt-get install libboost-system1.55-dev
$ sudo apt-get install libboost-filesystem1.55-dev
$ sudo apt-get install libboost-thread1.55-dev
CUDNN_STATUS_NOT_INITIALIZED

Q3 原因和方法
在$ git checkout dev的时候就有提示,该分支不存在, 没注意。
but not the dev branch, even dev branch does not exist now,

所以从github下载的是master分支, 已经太新了, 在tk1能用的dev分支已经不见了。。

目前这个master branch的caffe,最低是要求cuDNN 5.0以上+ CUDA 7.0以上。
而我们的是cudnn2.0 + cuda6.5

cuDNN v5.1 has different versions for CUDA 7.5 and CUDA 8.0
cuDNN v5 has different versions for CUDA 7.5 and CUDA 8.0
cuDNN v4 and v3 both require CUDA 7.0
cuDNN v2 and v1 both require CUDA 6.5

可以从这里克隆cudnn 2.x能用的caffe :
https://github.com/RadekSimkanic/caffe-for-cudnn-v2.5.48
还有一个works for CUDNN v2 on Jetson TK1的caffe: https://github.com/platotek/caffetk1
不一定惯用

$ make runtest -j4
Major revision number: 3
Minor revision number: 2
Name: GK20A
Total global memory: 1980252160
Total shared memory per block: 49152
Total registers per block: 32768
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 1024
Maximum dimension 0 of block: 1024
Maximum dimension 1 of block: 1024
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 2147483647
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 65535
Clock rate: 852000
Total constant memory: 65536
Texture alignment: 512
Concurrent copy and execution: Yes
Number of multiprocessors: 1
Kernel execution timeout: No
Unified virtual addressing: Yes

虽然还有小错LMDB_MAP_SIZE, 不过容易解决。

6.6 LMDB_MAP_SIZE出错

FAILED:
F1222 07:03:16.822439 16826 db_lmdb.hpp:14] Check failed: mdb_status == 0 (-30792 vs. 0) MDB_MAP_FULL: Environment mapsize limit reached

这个是32bit的限制太小了, 1T改为半T。
I think this issue is due to the Jetson being a 32-bit (ARM) device, and the constant LMDB_MAP_SIZE in src/caffe/util/db.cpp being too big for it to understand. Unfortunately master has a really large value for LMDB_MAP_SIZE in src/caffe/util/db.cpp, which confuses our little 32-bit ARM processor on the Jetson, eventually leading to Caffe tests failing with errors like MDB_MAP_FULL: Environment mapsize limit reached.

$ vi src/caffe/util/db_lmdb.cpp
const size_t LMDB_MAP_SIZE = 1099511627776; // 1 TB
改为 2^29 (536870912)

$ vi ./examples/mnist/convert_mnist_data.cpp
adjust the value from 1099511627776 to 536870912.

$ make runtest -j 4
… … …
[==========] 1702 tests from 251 test cases ran. (5165779 ms total)
[ PASSED ] 1702 tests.
YOU HAVE 2 DISABLED TESTS

OKAY !

6.7 跑机 benchmarking

最后,运行一下Caffe的基准代码来检测一下性能。验证cpu和gpu下运行效率:
Finally you can run Caffe’s benchmarking code to measure performance.

* 这个cpu的大概600秒
$ build/tools/caffe time –model=models/bvlc_alexnet/deploy.prototxt
… …
I1222 09:11:54.935829 19824 caffe.cpp:366] Average Forward pass: 5738.58 ms.
I1222 09:11:54.935860 19824 caffe.cpp:368] Average Backward pass: 5506.83 ms.
I1222 09:11:54.935890 19824 caffe.cpp:370] Average Forward-Backward: 11246.2 ms.
I1222 09:11:54.935921 19824 caffe.cpp:372] Total Time: 562310 ms.
I1222 09:11:54.935952 19824 caffe.cpp:373] *** Benchmark ends ***
ok.
These results are the summation of 10 iterations, so per image recognition on the Average Forward Pass is the listed result divided by 10, i.e. 227.156 ms is ~23 ms per image recognition.

* 这个gpu的大概30秒
$ build/tools/caffe time –model=models/bvlc_alexnet/deploy.prototxt –gpu=0
… …
I1222 09:16:02.577358 19857 caffe.cpp:366] Average Forward pass: 278.286 ms.
I1222 09:16:02.577504 19857 caffe.cpp:368] Average Backward pass: 318.795 ms.
I1222 09:16:02.577637 19857 caffe.cpp:370] Average Forward-Backward: 599.67 ms.
I1222 09:16:02.577800 19857 caffe.cpp:372] Total Time: 29983.5 ms.
I1222 09:16:02.577951 19857 caffe.cpp:373] *** Benchmark ends ***
ok.
It’s running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result.

此后可以测试demo。 Caffe两个demo分别是mnist和cifar10,尤其是mnist,称为caffe编程的hello world。

七 – Python and/or MATLAB

编译caffe之后, 才可以对付pycaffe接口的编译。

Caffe拥有pythonC++shell接口,在Caffe使用python特别方便,在实例中都有接口的说明。
安装步骤是: Python依赖包、Matlab、Matlab engine for python

但即使有默认的python,还是要安装python-dev,why?
因为linux发行版通常会把类库的头文件和相关的pkg-config分拆成一个单独的xxx-dev(el)包。 以python为例,有些场合的应用是需要python-dev的, 例如:
要自己安装一个源外的python类库, 而这个类库内含需要编译的调用python api的c/c++文件;
自己写的一个程序编译需要链接libpythonXX.(a|so)。

在UBUNTU系统下使用python,很多时候需要安装不同的python库进行扩展。
通常用到的两种方式:pip install和ubuntu系统独有的apt-get install, 区别?
* pip install的源是pyPI , apt-get 的源是ubuntu仓库。
对于python的包来说,pyPI的源要比ubuntu更多,对于同一个包,pyPI可以提供更多的版本以供下载。pip install安装的python包,可以只安装在当前工程内。
* apt-get 安装的包是系统化的包,在系统内完全安装。
* apt-get 和 pip install 中,对于相同python包,命名可能会不同:apt-get install:对于python2来说,包的名称可能是python-

在~/caffe/python/requirements.txt的清单文件列出了需要的依赖库
Cython>=0.19.2
numpy>=1.7.1
scipy>=0.13.2
scikit-image>=0.9.3
matplotlib>=1.3.1
ipython>=3.0.0
h5py>=2.2.0
leveldb>=0.191
networkx>=1.8.1
nose>=1.3.0
pandas>=0.12.0
python-dateutil>=1.4,<2 protobuf>=2.5.0
python-gflags>=2.0
pyyaml>=3.10
Pillow>=2.3.0
six>=1.1.0

7.1 Ubuntu方式 (ok)

安装相应的build依赖包:
$ sudo apt-get install build-essential

Caffe的Python接口需要numpy库
$ sudo apt-get install python-numpy

安装scipy库
$ sudo apt-get install python-scipy

boost
$ sudo apt-get install libboost-python1.55-dev
//$ sudo apt-get install libboost-python-dev -X

$ sudo apt-get install python-protobuf

$ sudo apt-get install python-skimage

7.2 pip方式 (not tested)

pip进行安装 // Use pip to install numpy and scipy instead for newer versions.
$ for req in $(cat requirements.txt); do pip install $req; done
(中间安装google的 protobuf时要翻墙)
不过网上还是看到有人说不要用他的文档,自己一个一个装比较好。

一句话安装:
$ sudo pip install cython numpy scipy scikit-image matplotlib ipython h5py leveldb networkx nose pandas python-dateutil protobuf python-gflags pyyaml pillow six

$ sudo pip install scikit-image
generate a error: Exception IndexError: list index out of range

Also note that in the Makefile.config of caffe, there is this line:
PYTHON_INCLUDE := /usr/include/python2.7 <-- correct /usr/lib/python2.7/dist-packages/numpy/core/include <-- doesn't exist so, try: $ pip install -U scikit-image -U is --upgrade,意是如果已安装就升级到最新版 按照官方建议安装anaconda包。 在anaconda官网下载.sh文件,执行,最后添加bin目录到环境变量即可。 建议安装Anaconda包,这个包能独立于系统自带的python库,并且提供大部分Caffe需要的科学运算Python库。要注意,在运行Caffe时,可能会报一些找不到libxxx.so的错误,而用 locate libxxx.so命令发现已经安装在anaconda中,这时首先想到的是在/etc/ld.so.conf.d/ 下面将 $your_anaconda_path/lib 加入 LD_LIBRARY_PATH中。 但是这样做可能导致登出后无法再进入桌面!!!原因(猜测)可能是anaconda的lib中有些内容于系统自带的lib产生冲突。 正确的做法:为了不让系统在启动时就将anaconda/lib加入系统库目录,可以在用户自己的~/.bashrc 中添加library path, 比如就在最后添加了两行 # add library path LD_LIBRARY_PATH=your_anaconda_path/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH 开启另一个终端后即生效,并且重启后能够顺利加载lightdm, 进入桌面环境。 $ vi ~/.bashrc export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH sudo ldconfig 7.3 编译python接口 编译Python wrapper, 编译python接口 $ cd ~/caffe $ make pycaffe -j4 --结果显示ALL TESTS PASSED ok $ make pytest -j4 在python中测试: $ cd caffe-folder/python $ python >>>import caffe
没有报错,说明caffe安装全部完成

添加caffe/python 到python path变量, 把caffe下的python路径添加到到path中,
这样不需要进入caffe/python目录下就能调用caffe的python接口。
$ vi ~/.bashrc
#set caffe PYTHONPATH
export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH

TK1之opencv配置 -03

三 – cuda安装

在opencv之前。确保cuda ok。

对嵌入式开发有两种选择,You have two options for developing CUDA applications for Jetson TK1
原生编译(native compilation)和交叉编译(cross-compilation)。
所谓原生编译,就是在目标板上直接运行自己的代码,以TK1为例,就是说在TK1目标板上编译代码;
所谓交叉编译,这也是大多数采用的编译方法,简单来说就是在台式机上编译,然后挂载在目标板上运行的方式。

对于开发TK1,推荐使用原生编译。

1)native compilation (compiling code onboard the Jetson TK1),Native compilation is generally the easiest option, but takes longer to compile, 2)whereas cross-compilation is typically more complex to configure and debug, but for large projects it will be noticeably faster at compiling.
The CUDA Toolkit currently only supports cross-compilation from an Ubuntu 12.04 or 14.04 Linux desktop.
In comparison, native compilation happens onboard the Jetson device and thus is the same no matter which OS or desktop you have.

所以不要download错了。Toolkit for L4T 的而不是Toolkit for Ubuntu。

Installing the CUDA Toolkit onto your device for native CUDA development
cross-compilation (compiling code on an x86 desktop in a special way so it can execute on the Jetson TK1 target device).
(Make sure you download the Toolkit for L4T and not the Toolkit for Ubuntu since that is for cross-compilation instead of native compilation)

3.1 安装 CUDA 6.5 Toolkit for L4T

下载:
http://developer.download.nvidia.com/embedded/L4T/r21_Release_v3.0/cuda-repo-l4t-r21.3-6-5-prod_6.5-42_armhf.deb
安装:
$ cd ~/Downloads
# Install the CUDA repo metadata that you downloaded manually for L4T
$ sudo dpkg -i ./cuda-repo-l4t-r21.3-6-5-prod_6.5-42_armhf.deb
更新apt-get:
# Download & install the actual CUDA Toolkit including the OpenGL toolkit from NVIDIA. (It only downloads around 15MB)
$ sudo apt-get update

3.2 安装 toolkit
# Install “cuda-toolkit-6-5” , etc.
$ sudo apt-get install cuda-toolkit-6-5

# 设置当前用户下可以访问GPU
# Add yourself to the “video” group to allow access to the GPU
$ sudo usermod -a -G video $USER

Add the 32-bit CUDA paths to your .bashrc login script,
and start using it in your current console:

$ echo “” >> ~/.bashrc
$ echo “# Add CUDA bin & library paths:” >> ~/.bashrc
$ echo “export PATH=/usr/local/cuda/bin:$PATH” >> ~/.bashrc
$ echo “export LD_LIBRARY_PATH=/usr/local/cuda/lib:${LD_LIBRARY_PATH}” >> ~/.bashrc
$ source ~/.bashrc
(?)sudo reboot

3.3 Verify cuda capacity

查看编译环境是否安装成功:
$ cd /usr/local/cuda
ok
$ nvcc -V
— nvcc: NVIDIA (R) Cuda compiler driver
— Cuda Release6.5 V6.5.35

3.3.1 apt方式安装CUDAsamples (good choice)
安装samples Installing & running the CUDA samples (optional)
$ sudo apt-get install cuda-samples-6-5

接着把devicequery给跑出来,运行下列代码:
$ cd /usr/local/cuda
$ sudo chmod o+w samples/ -R
$ cd samples/1_Utilities/deviceQuery
$ make

查看CUDA计算容量:
$ ../../bin/armv7l/linux/release/gnueabihf/deviceQueryarmv7l
—Detected 1 CUDA Capable device(s)
— CUDA Driver Version / Runtime Version 6.5 / 6.5
— UDA Capability Major/Minor version number: 3.2
— Result = PASS
最后几行应该显示cuda version 为 6.5 result =PASS
如果cuda的计算能力达不到3.0及以上,请跳过部分。
or,

3.3.2 源代码方式安装CUDAsamples
If you think you will write your own CUDA code or you want to see what CUDA can do,
then follow this section to build & run all of the CUDA samples.

Install writeable copies of the CUDA samples to your device’s home directory (it will create a “NVIDIA_CUDA-6.5_Samples” folder):
$ cuda-install-samples-6.5.sh /home/ubuntu
Build the CUDA samples (takes around 15 minutes on Jetson TK1):
$ cd ~/NVIDIA_CUDA-6.5_Samples
make

再测试cuda:
Run some CUDA samples:
1_Utilities/deviceQuery/deviceQuery
1_Utilities/bandwidthTest/bandwidthTest
cd 0_Simple/matrixMul
./matrixMulCUBLAS
cd ../..
cd 0_Simple/simpleTexture
./simpleTexture
cd ../..
cd 3_Imaging/convolutionSeparable
./convolutionSeparable
cd ../..
cd 3_Imaging/convolutionTexture
./convolutionTexture
cd ../..

3.4 远程测试:

Note:
Many of the CUDA samples use OpenGL GLX and open graphical windows.

If you are running these programs through an SSH remote terminal, you can remotely display the windows on your desktop by typing “export DISPLAY=:0” and then executing the program.
(This will only work if you are using a Linux/Unix machine or you run an X server such as the free “Xming” for Windows).

eg:
$ export DISPLAY=:0
$ cd ~/NVIDIA_CUDA-6.5_Samples/2_Graphics/simpleGL
./simpleGL
$ cd ~/NVIDIA_CUDA-6.5_Samples/3_Imaging/bicubicTexture
./bicubicTexture
$ cd ~/NVIDIA_CUDA-6.5_Samples/3_Imaging/bilateralFilter
./bilateralFilter

if
#error — unsupported GNU version! gcc 4.9 and up are not supported!
$ cd /usr/bin
rm gcc g++ cpp
ln -s gcc-4.9 gcc
ln -s g++-4.9 g++
ln -s cpp-4.9 cpp

3.5
Note:
the Optical Flow sample (HSOpticalFlow) and 3D stereo sample (stereoDisparity) take rouglhy 1 minute each to execute since they compare results with CPU code.

Some of the CUDA samples use other libraries such as OpenMP or MPI or OpenGL.
If you want to compile those samples then you’ll need to install these toolkits like this:
(to be added)

4 – opencv安装

Tegra 平台为OpenCV提供了GPU加速的功能。
除了OpenCV中的GPU模块外,例如核心的Core模块等一系列OpenCV中的常用模块都可以使用GPU加速。

但是GPU加速的前提是已经安装好Cuda,
并且用nvcc device 编译OpenCV。

在Jetson安装配置OpenCV有两种方法:

方法一,是使用官方的Jetpack安装包 安装方法,可以安装最新的Tegra4OpenCV,这需要有一台Ubuntu 14.04 LST X64系统的电脑,并使得Jetson Tk1进入Recovery模式。
直接采用nvidia提供的安装包优点:简单,同时增加了cpu neon的优化;缺点版本不够新,可能是2.4,对于基于opencv-3.0.0实现的代码可能需要修改代码。
方法二,从源码编译
要严格按照Cuda->Tegra4OpenCV->OpenCV的顺序,切不要颠倒,这个能够使用最新的opencv版本代码;缺点是过程复杂。

推荐: -> 手动源代码方式。

=== 手动源码方式 : ===
安装OpenCV主要分为安装Tegra4OpenCV和OpenCV源码两个部分:
*0. pre
*1. Tegra4OpenCV安装。
*2. OpenCV安装。

4.0 pre
启用Universe源:
$ sudo add-apt-repository universe
$ sudo apt-get update

安装几个必须的库:
# Some general development libraries 基本的g++编译器和cmake
$ sudo apt-get install build-essential make cmake cmake-curses-gui g++

# libav video input/output development libraries 输入输出库
$ sudo apt-get install libavformat-dev libavutil-dev libswscale-dev

# Video4Linux camera development libraries Video4Linux摄像头模块
$ sudo apt-get install libv4l-dev

# Eigen3 math development libraries Eigen3模块
$ sudo apt-get install libeigen3-dev

# OpenGL development libraries (to allow creating graphical windows)
# OpenGL开发模块 允许创建图形窗口 (并不是OpenGL全体)
$ sudo apt-get install libglew1.6-dev

# GTK development libraries (to allow creating graphical windows)
# GTK 开发库 (允许创建图形窗口)
$ sudo apt-get install libgtk2.0-dev

4.1 Tegra4OpenCV安装

4.1.1 下载 : libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb
**(注意版本)** 下载opencv deb包:
只有21.2没有21.3:
https://developer.nvidia.com/embedded/downloads <--- go to find waht you want --->

OR directly go to:
http://developer.download.nvidia.com/embedded/OpenCV/L4T_21.2/libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb

4.1.2 安装: OpenCV优化包
$ sudo dpkg -i libopencv4tegra-repo_l4t-r21_2.4.10.1_armhf.deb
$ sudo apt-get update

4.1.3 安装Tegra4OpenCV:
$ sudo apt-get install libopencv4tegra
$ sudo apt-get install libopencv4tegra-dev
这个可能安装不上去,一直出问题, 墙的问题,重复执行apt-get update,安装完所有更新后。

4.1.4 验证OpenCV是否安装成功:
cd /usr/lib 或者 cd/usr/include
查看有无opencv的库或者头文件。

4.2 OpenCV源装

4.2.1 下载 OpenCV 2.4.10
下载: http://opencv.org/
或者: 在TK1平台上直接下载:
$ wget http://downloads.sourceforge.net/project/opencvlibrary/opencv-unix/2.4.10/opencv-2.4.10.zip

OR: from the command-line you can run this on the device:
$ wget https://github.com/Itseez/opencv/archive/2.4.10.zip

4.2.2 config OpenCV 2.4.10
$ cd Downloads
$ unzip opencv-2.4.10.zip
$ cd opencv-2.4.10/
$ mkdir build
$ cd build
$ cmake -D WITH_CUDA=ON -D CUDA_ARCH_BIN=”3.2″ -D CUDA_ARCH_PTX=”” -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_OPENGL=ON -D WITH_QT=ON ..
or,
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D WITH_CUBLAS=1 ..
注意:如果不添加-DWITH_QT=ON,那么-DWITH_OPENGL=ON将没有效果,
因此如果你开启OPENGL使能,还得先参考安装qt环境。
最后一句中 .. 的意思表示你的MakeFile文件在上一层文件夹,如果系统提示找不到MakeFile文件的话可以将它改为包含OpenCV Makefile的路径。
And the value 3.2 depend on you pc’s capacitys may be 3.2 maybe 3.5 maybe 3.8 …
… …
— Configuring done
— Generating done
这一步通过以后会出现config OK 的标志,表示检查已经成功,可以编译

4.2.3 install OpenCV 2.4.10
To install the built OpenCV library by copying the OpenCV library to “/usr/local/include” and “/usr/local/lib”:
$ sudo make -j4 install
20 minutes ..
最后没有出现错误
表示OpenCV已经安装成功。
$ sudo ldconfig

4.2.4 uninstall 2.4.10
if want to install other version of opencv: such as 2.4.13:
先手动删除opencv安装的一些文件:
$ sudo rm -r /usr/local/include/opencv2 /usr/local/include/opencv /usr/include/opencv2 /usr/include/opencv /usr/local/share/opencv /usr/local/share/OpenCV /usr/share/opencv /usr/share/OpenCV /usr/local/bin/opencv* /usr/local/lib/libopencv*
same as before:
$ cd opencv-2.4.13
$ mkdir build
$ cd build
$ cmake -DWITH_CUDA=ON -D CUDA_ARCH_BIN=”3.2″ -D CUDA_ARCH_PTX=”” -D BUILD_TESTS=OFF -D WITH_OPENGL=ON -D WITH_QT=ON -D BUILD_PERF_TESTS=OFF ..
$ sudo make -j4 install
$ sudo ldconfig

4.2.7 配置环境变量 2.4.13
Finally, make sure your system searches the “/usr/local/lib” folder for libraries:
$ echo “” >> ~/.bashrc
$ echo “# Use OpenCV and other custom-built libraries.” >> ~/.bashrc
$ echo “export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/” >> ~/.bashrc
$ source ~/.bashrc

这样OpenCV的配置ok,可以在tk1上使用基于GPU加速的OpenCV函数库

4.2.8 测试运行几个OpenCV样例

第一个:
# Test a simple OpenCV program. Creates a graphical window, hence you should plug a HDMI monitor in or use a remote viewer such as X Tunneling or VNC or TeamViewer on your desktop.
$ cd ~/opencv-2.4.10/samples/cpp
$ g++ edge.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -o edge
$ ./edge

第二个:
# If you have a USB webcam plugged in to your board, then test one of the live camera programs.
$ g++ laplace.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o laplace
$ ./laplace

第三个示例:
# Test a GPU accelerated OpenCV sample.
$ cd ../gpu
$ g++ houghlines.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o houghlines
$ ./houghlines ../cpp/logo_in_clutter.png
REF:http://blog.csdn.net/zyazky/article/details/52388605

Q1
if ERROR:
In file included from /home/ubuntu/opencv-2.4.10/modules/core/src/opengl_interop.cpp:52:0:
/usr/local/cuda/include/cuda_gl_interop.h:64:2: error: #error Please include the appropriate gl headers before including cuda_gl_interop.h
#error Please include the appropriate gl headers before including cuda_gl_interop.h
^
make[2]: *** [modules/core/CMakeFiles/opencv_core.dir/src/opengl_interop.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs….
make[1]: *** [modules/core/CMakeFiles/opencv_core.dir/all] Error 2
then:
$ sudo vi /usr/local/cuda/include/cuda_gl_interop.h
comments collowing lines:
#if defined(__arm__) || defined(__aarch64__)
//#ifndef GL_VERSION
//#error Please include the appropriate gl headers before including cuda_gl_interop.h
//#endif
//#else
REFs:
https://github.com/opencv/opencv/issues/5205
https://devtalk.nvidia.com/default/topic/1007290/building-opencv-with-opengl-support-/
Q2
IF error:
/home/ubuntu/Downloads/opencv-2.4.10/build/modules/ghgui/src/window_QT.cpp:3111:12: error: ‘GL_PERSPECTIVE_CORRECTION_HINT’ was not declared in this scope
then:
$ vi /home/ubuntu/Downloads/opencv-2.4.10/modules/highgui/src/window_QT.cpp
add :
#define GL_PERSPECTIVE_CORRECTION_HINT 0x0C50

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVIDIA* CUDA* Runtime API and supports only NVIDIA GPUs.The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others) ready to be used by the application developers.
The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.
The OpenCV GPU module is designed for ease of use and does not require any knowledge of CUDA. Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest performance. It is helpful to understand the cost of various operations, what the GPU does, what the preferred data formats are, and so on.
The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if your algorithm involves many simple operations, then, for the best possible performance, you may still need to write your own kernels to avoid extra write and read operations on the intermediate results.

To enable CUDA support, configure OpenCV using CMake with WITH_CUDA=ON . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built.
Otherwise, the module is still built but at runtime all functions from the module throw Exception with CV_GpuNotSupported error code, except for gpu::getCudaEnabledDeviceCount(). The latter function returns zero GPU count in this case.
Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using the gpu::getCudaEnabledDeviceCount() function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose an appropriate implementation (CPU or GPU) accordingly.

Compilation for Different NVIDIA* Platforms :
NVIDIA* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails. By default, the OpenCV GPU module includes:
Binaries for compute capabilities 1.3 and 2.0 (controlled by CUDA_ARCH_BIN in CMake)
PTX code for compute capabilities 1.1 and 1.3 (controlled by CUDA_ARCH_PTX in CMake)
This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT’ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT’ed. For devices with CC 1.0, no code is available and the functions throw Exception. For platforms where JIT compilation is performed first, the run is slow.
On a GPU with CC 1.0, you can still compile the GPU module and most of the functions will run flawlessly. To achieve this, add “1.0” to the list of binaries, for example, CUDA_ARCH_BIN=”1.0 1.3 2.0″ . The functions that cannot be run on CC 1.0 GPUs throw an exception.
You can always determine at runtime whether the OpenCV GPU-built binaries (or PTX code) are compatible with your GPU. The function gpu::DeviceInfo::isCompatible() returns the compatibility status (true/false).

Utilizing Multiple GPUs :
In the current version, each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, you have to manually distribute the work between GPUs. Switching active devie can be done using gpu::setDevice() function. For more details please read Cuda C Programming Guide.
While developing algorithms for multiple GPUs, note a data passing overhead. For primitive functions and small images, it can be significant, which may eliminate all the advantages of having multiple GPUs. But for high-level algorithms, consider using multi-GPU acceleration. For example, the Stereo Block Matching algorithm has been successfully parallelized using the following algorithm:
Split each image of the stereo pair into two horizontal overlapping stripes.
Process each pair of stripes (from the left and right images) on a separate Fermi* GPU.
Merge the results into a single disparity map.
With this algorithm, a dual GPU gave a 180 % performance increase comparing to the single Fermi GPU. For a source code example, see https://github.com/opencv/opencv/tree/master/samples/gpu/.

TK1之system配置 -01

智能机器人(57):融合调试

robot-pose-ekf的配置参数

freq:
滤波器的输出频率。注意, 高的频率只是把合成odom输出的更频繁,并不会增加姿态评估的精度。

sensor_timeout:
等待某一路传感器消息的最大时间, 超过这个时间如果vo或者imu的传感器还没有新的消息到达,则滤波器不再等待。

ekf滤波器不要去所有sensor一直的同步在线, 缺一些没关系。例如t0时刻滤波器更新输出, t1时刻来一路odom数据, t2时刻来一路imu数据, 则滤波器向内差值产生t0~t1的imu数据, 最终输出姿态评估。
( https://chidambaramsethu.wordpress.com/2013/07/15/a-beginners-guide-to-the-the-ros-robot_pose_ekf-package/)

output_frame:
wiki的示例是有问题的、至少是不清楚的:
示例默认把”output_frame” 设为”odom”容易混淆, 这个坐标系最好命名为”odom_combined”并不会和输出topic: “odom_combined” 混淆。

源代码更清楚:
(http://docs.ros.org/kinetic/api/robot_pose_ekf/html/odom__estimation__node_8cpp_source.html)
76 // paramters
77 nh_private.param(“output_frame”, output_frame_, std::string(“odom_combined”));
78 nh_private.param(“base_footprint_frame”, base_footprint_frame_, std::string(“base_footprint”));
79 nh_private.param(“sensor_timeout”, timeout_, 1.0);
80 nh_private.param(“odom_used”, odom_used_, true);
81 nh_private.param(“imu_used”, imu_used_, true);
82 nh_private.param(“vo_used”, vo_used_, true);
83 nh_private.param(“gps_used”, gps_used_, false);
84 nh_private.param(“debug”, debug_, false);
85 nh_private.param(“self_diagnose”, self_diagnose_, false);
86 double freq;
87 nh_private.param(“freq”, freq, 30.0);

104 pose_pub_ = nh_private.advertise(“odom_combined”, 10);

434 odom_broadcaster_.sendTransform(StampedTransform(tmp, tmp.stamp_, output_frame_, base_footprint_frame_));

注意: obot_pose_ekf 的 “output_frame”对应 amcl的”odom_frame_id”。