TK1之lenet -11

基于caffe的prototxt和solver结构,
采用mnist的训练图片和测试图片,
创建神经网络的结构和配置,
实现手写数字识别。

2 IPython – qtconsole – Notebook

2.1 IPython

Python的验证调试pdb在其ide里面是功能微弱的, 为此有IPython这一利器:
自动补全功能,使用tab键,如输入im后按tab键,自动补全import。
$ sudo apt-get install ipython
$ sudo apt-get install python-zmq

$ ipython
即可进入互式IDE编程环境。

可以使用使用魔法指令% :
%run test.py 直接运行python脚本
%pwd: 显示当前工作目录。
%cd: 切换工作目录。

后面的qt和book均使用了matplotlib这个著名的Python图表绘制扩展库,
支持输出多格式图像,可以使用多种GUI界面库交互式地显示图表。
这里用它的inline这个嵌入参数把matplotlib图表的显示在qtconsole或者notebook里面, 而不是单独弹出一个窗口。

2.2 qtconsole

IPython团队开发了一个基于Qt框架,目的是为终端应用程序提供诸如内嵌图片、多行编辑、语法高亮之类的富文本编辑功能的GUI控制台。
好处是并不出现shell控制台在背后运行,只有ipython的qt控制台运行。
$ sudo apt-get install ipython-qtconsole

$ ipython qtconsole –pylab=inline

2.3 Notebook

或者可以使用nootebook:
$ sudo apt-get install ipython-notebook

$ ipython notebook –pylab=inline
出现界面后,可以输入多行代码,shift+Enter运行即可。
Shift-Enter : run cell
Ctrl-Enter : run cell in-place
Alt-Enter : run cell, insert below

2.4 说明

iPython调试, 虽然可以代码块,方便的很;
但是如果出错也是很间接, kenel restarting之类的,都看不清

很多时候, 在python的一行行pdb调试,有时还是不可替代的。

后面的调试里, 从哪里启动python特别重要:
$ cd …CAFE_ROOT…
$ ipython or python … …
否则,不然又很多路经问题出错。

如果:
db_lmdb.hpp:14] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
这个是train和test的prototxt里面的data source 位置不对。
如果:
caffe solver.cpp:442] Cannot write to snapshot prefix ‘examples/mnist/lenet’.
这个可以是路径也可以是把lenet改成任一个zenet
如果:
ipython “Kernel Restarting” about solver = caffe.SGDSolver(‘mnist/lenet_auto_solver.prototxt’)
可能要在solver里面最后显示加上:
solver_type: GPU

3 设置prototxt

3.0
$ cd /home/ubuntu/sdcard/caffe-for-cudnn-v2.5.48
$ ipython qtconsole –pylab=inline

3.1 配置Python环境

# we’ll use the pylab import for numpy and plot inline
from pylab import *
%matplotlib inline

# Import caffe, adding it to sys.path if needed. (Make sure’ve built pycaffe)
# caffe_root = ‘../’ # It means this bash file should be run from {caffe_root}/examples/

import sys
caffe_root = ‘.’
sys.path.insert(0, caffe_root + ‘python’)

import caffe

3.2 挂于数据

可选动作, 如果ok不需要。

# Using LeNet example data and networks, if already downloaded skip this step.
## run scripts from caffe root
## import os
## os.chdir(caffe_root)
## Download data
## !data/mnist/get_mnist.sh
## Prepare data
## !examples/mnist/create_mnist.sh
## back to examples
## os.chdir(‘examples’)

3.3 关于网络

可选动作,如果ok不需要。

这个train和test网络文件是caffe安装时候提供了的,直接用即可。
也自己生成, 以下是建一个Leet的变体。

# We’ll write the net in a succinct and natural way as Python code that serializes to Caffe’s protobuf model format.
# This network expects to read from pre-generated LMDBs,
# but reading directly from ndarrays is also possible using MemoryDataLayer.

# We’ll need two external files:
# the net prototxt, defining the architecture and pointing to the train/test data
# the solver prototxt, defining the learning parameters
from caffe import layers as L, params as P
def lenet(db_path,batch_size):
n=caffe.NetSpec()
n.data,n.label=L.Data(batch_size=batch_size,backend=P.Data.LMDB,source=db_path, transform_param=dict(scale=1./255),ntop=2)
n.conv1=L.Convolution(n.data,kernel_size=5,num_output=20,weight_filler=dict(type=’xavier’))
n.pool1=L.Pooling(n.conv1,kernel_size=2,stride=2,pool=P.Pooling.MAX)
n.conv2=L.Convolution(n.pool1,kernel_size=5,num_output=50,weight_filler=dict(type=’xavier’))
n.pool2=L.Pooling(n.conv2,kernel_size=2,stride=2,pool=P.Pooling.MAX)
n.fc1 =L.InnerProduct(n.pool2,num_output=500,weight_filler=dict(type=’xavier’))
n.relu1=L.ReLU(n.fc1,in_place=True)
n.score=L.InnerProduct(n.relu1,num_output=10,weight_filler=dict(type=’xavier’))
n.loss=L.SoftmaxWithLoss(n.score,n.label)
return n.to_proto()

# write net to disk in a human-readable serialization using Google’s protobuf
# You can read, write, and modify this description directly
with open(‘examples/mnist/lenet_auto_train.prototxt’, ‘w’) as f: # this is train
f.write(str(lenet(‘examples/mnist/mnist_train_lmdb’, 64)))
with open(‘examples/mnist/lenet_auto_test.prototxt’, ‘w’) as f: # this is solver
f.write(str(lenet(‘examples/mnist/mnist_test_lmdb’, 100)))

这样train和test网络结构写入介质
名字可以自定义,但是要符合solver.prototxt里面的配置。
# you can view tranin net struct geneted pre-step
# $ vi examples/mnist/lenet_auto_train.prototxt
# you can view test net struct geneted pre-step
# $ vi examples/mnist/lenet_auto_test.prototxt

3.4 关于求解器

可选动作,如果ok不需要。

这个lenet_auto_solver.prototxt文件是caffe安装时候提供了的,
直接用即可。

也自己生成这个solver.prototxt.
from caffe.proto import caffe_pd2
s=caffe_pb2.SolverParameter()
s.random_seed=0
#下面格式参数与之前看到的相似
… …
#最后
with open(yourpath,‘w‘)as f:
f.write(str(s))

然后与上面类似
solver=None
solver=caffe.get_solver(yourpath)

这个solver文件大概像这样:
# $ vi examples/mnist/lenet_auto_solver.prototxt
# The train/test net protocol buffer definition
train_net: “examples/mnist/lenet_auto_train.prototxt”
test_net: “examples/mnist/lenet_auto_test.prototxt”

# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100

# Carry out testing every 500 training iterations.
test_interval: 500

# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005

# The learning rate policy
lr_policy: “inv”
gamma: 0.0001
power: 0.75

# Display every 100 iterations
display: 100

# The maximum number of iterations
max_iter: 10000

# snapshot intermediate results
snapshot: 5000
snapshot_prefix: “mnist/lenet”

3.5 加载求解器

# Let’s pick a device and load the solver
caffe.set_device(0)
caffe.set_mode_gpu() #using GPU

# load the solver
solver=None
# create train and test nets
# We’ll use SGD (with momentum), but other methods are also available.
solver = caffe.SGDSolver(‘examples/mnist/lenet_auto_solver.prototxt’)

说明:这个caffe的优化函数是非凸的,没有解析解,需要通过优化方法来求解。

3.6 检查网络

检查输出shape | 查看中间特征(blobs) | 参数(params)的维数
# each output is (batch size, feature dim, spatial dim)

中间特征(blobs)
[(k, v.data.shape) for k, v in solver.net.blobs.items()]
… …
[(‘data‘, (64, 1, 28, 28)),
(‘label‘, (64,)),
(‘conv1‘, (64, 20, 24, 24)),
(‘pool1‘, (64, 20, 12, 12)),
(‘conv2‘, (64, 50, 8, 8)),
(‘pool2‘, (64, 50, 4, 4)),
(‘fc1‘, (64, 500)),
(‘score‘, (64, 10)),
(‘loss‘, ())]

参数(params)
# just print the weight sizes (we’ll omit the biases)
[(k, v[0].data.shape) for k, v in solver.net.params.items()]
… …
[(‘conv1‘, (20, 1, 5, 5)),
(‘conv2‘, (50, 20, 5, 5)),
(‘fc1‘, (500, 800)),
(‘score‘, (10, 500))]

3.7 执行一次

这个在qtconsole会crash, 在notebook不会, python更不会。

在测试集和训练集上执行一个前向的过程
# check that everything is loaded as we expect,
# by running a forward pass on the train and test nets,
# and check that they contain our data.

# step does one full iteration, covering all three phases: forward evaluation, backward propagation, and update.
# forward does only the first of these.
# Step does only one iteration: a single batch of 100 images. If doing a full set of 20 batches (all 2000 inputs) is called an epoch

训练集
solver.net.forward() # train net
… …
{‘loss’: array(2.363983154296875, dtype=float32)}

测试集
solver.test_nets[0].forward() # test net (there can be more than one)
… …
{‘loss’: array(2.365971088409424, dtype=float32)}

显示训练集 8个数据的图像和他们的标签
imshow(solver.net.blobs[‘data’].data[:8,0].transpose(1,0,2).reshape(28,8*28),cmap=”gray”);axis(‘off’)

print ‘wewewerth’,solver.net.blobs[‘label’].data[:8]
… …
wewewerth [ 5. 0. 4. 1. 9. 2. 1. 3.]

显示测试集中的8个图像和他们的标签
imshow(solver.test_nets[0].blobs[‘data’].data[:8, 0].transpose(1, 0, 2).reshape(28, 8*28), cmap=’gray’)

print solver.test_nets[0].blobs[‘label’].data[:8]
… …
labels [ 7. 2. 1. 0. 4. 1. 4. 9.]

3.8 运行网络

前述确定载入了正确的数据及标签,开始运行solver,运行一个batch看是否有梯度变化。
Both train and test nets seem to be loading data, and to have correct labels.

执行一步SGB, 查看权值的变化
# Let’s take one step of (minibatch) SGD and see what happens.
solver.step(1)

第一层权值的变化如下图:20个5×5规模的滤波器
# Do we have gradients propagating through our filters?
# Let’s see the updates to the first layer, shown here as a 4*5 grid of 5*5 filters.
imshow(solver.net.params[‘conv1′][0].diff[:,0].reshape(4,5,5,5).transpose(0,2,1,3).reshape(4*5,5*5),cmap=’gray’);axis(‘off’)
… …

(-0.5, 24.5, 19.5, -0.5)
运行网络,这个过程和通过caffe的binary训练是一样的

3.9 自定义循环开始训练

# Let’s run the net for a while, keeping track of a few things as it goes.
# Note that this process will be the same as if training through the caffe binary :
# *logging will continue to happen as normal
# *snapshots will be taken at the interval specified in the solver prototxt (here, every 5000 iterations)
# *testing will happen at the interval specified (here, every 500 iterations)
# Since we have control of the loop in Python, we’re free to do many other things as well, for example:
# *write a custom stopping criterion
# *change the solving process by updating the net in the loop

上篇使用的是%timeit
%%time
niter=200
test_interval=25

# losses will also be stored in the log
train_loss=zeros(niter)
test_acc=zeros(int(np.ceil(niter/test_interval)))
output=zeros((niter,8,10))

# the main solver loop
for it in range(niter):
solver.step(1) # SGD by Caffe
# store the train loss
train_loss[it]=solver.net.blobs[‘loss’].data
# store the output on the first test batch
# (start the forward pass at conv1 to avoid loading new data)
solver.test_nets[0].forward(start=’conv1′)
output[it]=solver.test_nets[0].blobs[‘score’].data[:8]
# run a full test every so often
# (Caffe can also do this for us and write to a log, but we show here
# how to do it directly in Python, where more complicated things are easier.)
if it % test_interval ==0:
print ‘iteration’, it, ‘ testing…’
correct=0
for test_it in range(100):
solver.test_nets[0].forward()
correct+=sum(solver.test_nets[0].blobs[‘score’].data.argmax(1)
==solver.test_nets[0].blobs[‘label’].data)
test_acc[it//test_interval]=correct/1e4
… …
iteration 0 testing…
iteration 25 testing…
iteration 50 testing…
iteration 75 testing…
iteration 100 testing…
iteration 125 testing…
iteration 150 testing…
iteration 175 testing…
CPU times: user 19.4 s, sys: 2.72 s, total: 22.2 s
Wall time: 20.9 s

画出训练样本损失和测试样本正确率
# plot the train loss and test accuracy
_, ax1 = subplots()
ax2 = ax1.twinx()
ax1.plot(arange(niter), train_loss)
ax2.plot(test_interval * arange(len(test_acc)), test_acc, ‘r’)
ax1.set_xlabel(‘iteration’)
ax1.set_ylabel(‘train loss’)
ax2.set_ylabel(‘test accuracy’)
ax2.set_title(‘Test Accuracy: {:.2f}’.format(test_acc[-1]))
… …

# Above pic shows the loss seems to have dropped quickly and coverged (except for stochasticity)

画出分类结果
# Since we saved the results on the first test batch, we can watch how our prediction scores evolved.
# We’ll plot time on the x axis and each possible label on the y, with lightness indicating confidence.
for i in range(8):
figure(figsize=(2,2))
imshow(solver.test_nets[0].blobs[‘data’].data[i,0],cmap=’gray’)
figure(figsize=(20,2))
imshow(output[:100,i].T,interpolation=’nearest’,cmap=’gray’) #output[:100,1]. first 100 results

NoTE:
# 上面结果开起来不错,再详细的看一下每个数字的得分是怎么变化的
We started with little idea about any of these digits, and ended up with correct classifications for each.
If you’ve been following along, you’ll see the last digit is the most difficult, a slanted “9” that’s (understandably) most confused with “4”.

使用softmax分类
# Note that these are the “raw” output scores rather than the softmax-computed probability vectors.
# below, make it easier to see the confidence of our net (but harder to see the scores for less likely digits).
for i in range(8):
figure(figsize=(2, 2))
imshow(solver.test_nets[0].blobs[‘data’].data[i, 0], cmap=’gray’)
figure(figsize=(10, 2))
imshow(exp(output[:50, i].T) / exp(output[:50, i].T).sum(0), interpolation=’nearest’, cmap=’gray’)
xlabel(‘iteration’)
ylabel(‘label’)
… …
输出结果图片对比更明显, 这是上面代码的计算方式能够把低的分数和高的分数两极分化:
/*
for i in range(2):
figure(figsize=(2,2))
imshow(solver.test_nets[0].blobs[‘data’].data[i,0],cmap=’gray’)
figure(figsize=(20,2))
imshow(exp(output[:100,i].T)/exp(output[:100,i].T).sum(0),interpolation=’nearest’,cmap=’gray’)
*/

3.10 下一步

# now we’ve defined, trained, and tested LeNet
# there are many possible next steps:
1. 定义新的结构(如加全连接层,改变relu等)
2. 优化lr等参数 (指数间隔寻找如0.1 0.01 0.001)
3. 增加训练时间
4. 由sgd–》adam
5. 其他

http://caffe.berkeleyvision.org/tutorial/interfaces.html
blog.csdn.net/thystar/article/details/50668877
https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb
https://github.com/BVLC/caffe/blob/master/examples/01-learning-lenet.ipynb

TK1之mnist -09

深度学习有几个有名的开源框架:Caffe,Theano,Torch,其中caffe的体系是:
* Caffe has several dependencies:
-CUDA is required for GPU mode. Caffe requires the CUDA nvcc compiler to compile its GPU code and CUDA driver for GPU operation.
-BLAS via ATLAS, MKL, or OpenBLAS. Caffe requires BLAS as the backend of its matrix and vector computations. There are several implementations of this library:ATLAS, Intel MKL or OpenBLAS.
-Boost, versoin >= 1.55
-protobuf, glog, gflags, hdf5
* Optional dependencies:
-OpenCV >= 2.4 including 3.0
-IO libraries: lmdb, leveldb (note: leveldb requires snappy)
-cuDNN for GPU acceleration (v5) , For best performance, Caffe can be accelerated by NVIDIA cuDNN.
* Pycaffe and Matcaffe interfaces have their own natural needs.
-For Python Caffe: Python 2.7 or Python 3.3+, numpy (>= 1.7), boost-provided boost.python,
The main requirements are numpy and boost.python (provided by boost). pandas is useful too and needed for some examples.
-For MATLAB Caffe: MATLAB with the mex compiler. Install MATLAB, and make sure that its mex is in your $PATH.

1 – Mnist

Caffe两个demo分别是mnist和cifar10,尤其是mnist,称为caffe编程的hello world。
mnist是个知名的手写数字数据库,据说是美国中学生手写的数字,由Yan LeCun进行维护。
mnist最初用于支票上的手写数字识别, 现在成了DL的入门练习库。
对mnist识别的专门模型是Lenet,是最早的cnn模型。

一般caffe的第一个hello world 都是对手写字体minist进行识别,主要三个步骤:
准备数据、修改配置、使用模型。
REF: http://yann.lecun.com/exdb/mnist/
http://caffe.berkeleyvision.org/gathered/examples/mnist.html

1.1 准备数据

* 下载数据:
$ ./data/mnist/get_mnist.sh

完成后,在 data/mnist/目录下有四个文件:
train-images-idx3-ubyte: 训练集样本 (9912422 bytes)
train-labels-idx1-ubyte: 训练集对应标注 (28881 bytes)
t10k-images-idx3-ubyte: 测试集图片 (1648877 bytes)
t10k-labels-idx1-ubyte: 测试集对应标注 (4542 bytes)

mnist数据训练样本为60000张,测试样本为10000张,每个样本为28*28大小的黑白图片,手写数字为0-9,因此分10类。

* 转换数据:
前述数据不能在caffe中直接使用,需要转换成LMDB数据:
$ ./examples/mnist/create_mnist.sh

转换成功后,会在 examples/mnist/目录下,生成两个文件夹,分别是:mnist_train_lmdb 和 mnist_test_lmdb, 里面存放的data.mdb和lock.mdb,就是我们需要的运行数据。

1.2 关于protobuf

Protocol Buffers简称为protobuf,是由google开源的一种轻便高效的结构化数据存储格式,可以用于结构化数据的串行化或者说序列化。protobuf完成的事情XML也可以,就是把某种数据结构的信息以某种格式保存起来,用于存储、传输协议等场合。
不用XML而另起炉灶一个是考虑,另外是protobuf有比较好的多语言支持(C++、Java、Python和生成机制以及其他一些内容体。

Protocol Buffers这种协议接口的结构很简单,类似:
package caffe;# 定义名称空间
message helloworld #定义类
{ #定义 filed
required int32 xx = 1; // 必须有的值
optional int32 xx = 2; //可选值
repeated xx xx=3; //可重复的
enum xx { #定义枚举类
xx =1;
}
}

例如,开发语言为C++,场景为模块A通过socket发送巨量订单信息给模块B。
则先写一个proto文件,例如Order.proto,在该文件添加名为”Order”的message结构:
message Order
{
required int32 time = 1;
required int32 userid = 2;
required float price = 3;
optional string desc = 4;
}

然后即可用protobuf内置编译器编译该Order.proto:
$ protoc -I=. –cpp_out=. ./Order.proto
protobuf会自动生成Order.pb.cc和Order.pb.h文件。

在发送方模块A, 就可以使用下面代码来序列化/解析该订单包装类:
$ vi Sender.cpp
Order order;
order.set_time(XXXX);
order.set_userid(123);
order.set_price(100.0f);
order.set_desc(“a test order”);
string sOrder;
order.SerailzeToString(&sOrder);
// 调用socket通讯库把序列化的字符串发送出去

在接收方模块B, 可以这样解析:
$ vi Receiver.cpp
string sOrder;
// 先通过网络通讯库接收到数据,存放到某字符串sOrder
Order order;
if(order.ParseFromString(sOrder)) // 解析该字符串
{
cout << "userid:" << order.userid() << endl << "desc:" << order.desc() << endl; } else cerr << "parse error!" << endl; 最后编译文件即可: $ g++ Sender.cpp -o Sender Order.pb.cc -I /usr/local/protobuf/include -L /usr/local/protobuf/lib -lprotobuf -pthread 测试下。。。 1.3 关于 caffe.proto 文件caffe.proto中就是用google protobuf定义的caffe所要用到的结构化数据。 在同级文件夹下还有两个文件caffe.pb.h 和caffe.pb.cc,这两个文件是由caffe.proto编译而来的。 $ vi src/caffe/proto/caffe.proto syntax = "proto2"; //默认就是二 package caffe; //各个结构封装在caffe包中,可以通过using namespace caffe; 或者caffe::**来调用 注意: proto文件为结构、prototext文件为配置。 1.4 caffe结构 caffe通过layer-by-layer的方式逐层定义网络,从开始输入到最终输出判断从下而上的定义整个网络。 caffe可以从四个层次来理解:Blob、Layer、Net、Solver,这四个部分的关系是: * Blob blob是贯穿整个框架的数据单元。存储整个网络中的所有数据(数据和导数),它在cup和GPU之间按需分配内存开销。Blob作为caffe基本的数据结构,通常用四维矩阵 Batch×Channel×Height×Weight表示,某一坐标(n,k,h,w)的物理位置为((n * K + k) * H + h) * W + w),存储了网络的神经元激活值和网络参数,以及相应的梯度(激活值的残差和dW、db)。其中包含有cpu_data、gpu_data、cpu_diff、gpu_diff、mutable_cpu_data、mutable_gpu_data、mutable_cpu_diff、mutable_gpu_diff这一堆很像的东西,分别表示存储在CPU和GPU上的数据。其中带data的里面存储的是激活值和W、b,diff中存储的是残差和dW、db,另外带mutable和不带mutable的一对指针所指的位置是相同的,只是不带mutable的只读,而带mutable的可写。 * Layer Layer代表了神经网络中各种各样的层,组合成一个网络。一般一个图像或样本会从数据层中读进来,然后一层一层的往后传。除了比较特殊的数据层之外其余大部分层都包含4个函数:LayerSetUp、Reshape、Forward、Backward。LayerSetup用于初始化层,开辟空间,填充初始值等。Reshape对输入值进行维度变换。Forward是前向传播,Backward是后向传播。 那么数据是如何在层之间传递的呢?每一层都会有一个(或多个)Bottom和top,分别存储输入和输出。bottom[0]->cpu_data()是存输入的神经元激活值,top就是存输出的,cpu_diff()存的是激活值的残差,gpu是存在GPU上的数据。如果这个层前后有多个输入输出层,就会有bottom[1],bottom[2]。。。每层的参数会存在this->blobs_里,一般this->blobs_[0]存W,this->blobs_[1]存b,this->blobs_[0]->cpu_data()存的是W的值,this->blobs_[0]->cpu_diff()存的梯度dW,b和db也类似,然后换成gpu是存在GPU上的数据,再带上mutable就可写了
凡是能在GPU上运算的层都会有名字相同的cpp和cu两个文件,cu文件中运算时基本都调用了cuda核函数。
* Net
Net就是把各种Layer按train_val.prototxt的定义堆叠在一起。先进行每个层的初始化,然后不断进行Update,每更新一次就进行一次整体的前向传播和反向传播,然后把每层计算得到的梯度计算进去,完成一次更新。注意每层在Backward中只是计算dW和db,而W和b的更新是在Net的Update里最后一起更新的。在caffe里训练模型的时候一般会有两个Net,一个train一个test。
* Solver
Solver是按solver.prototxt的参数定义对Net进行训练。先会初始化一个TrainNet和一个TestNet,然后其中的Step函数会对网络不断进行迭代。主要就是两个步骤反复迭代:不断利用ComputeUpdateValue计算迭代相关参数,比如计算learning rate,把weight decay加上什么;以及调用Net的Update函数对整个网络进行更新。

大致流程是:
sovler通过sovler.prototxt的参数配置初始化Net。 然后Net通过调用trainval.prototxt这些参数,来调用对应的Layer;并将数据blob输入到相应的Layer中;Layer来对流入的数据进行计算处理,然后再将计算后的blob数据返回,通过Net流向下一个Layer;每执行一次Solver就会计数一次然后调整learn_rate和descay_weith等权值。

* 一个简单的Net例子:
name:”logistic-regression”
layer{
name:”mnist”
type:”Date”
top:”data”
top:”label”
data_param{
source:”your-source”
batch_size:your-size
}
}
layer {
name:”ip”
type:”InnerProduct”
bottom:”data”
top:”ip”
inner_product_param{
num_output:2
}
}
layer {
name:”loss”
type:”SoftmaxWithLoss”
bottom:”ip”
bottom:”label”
top:”loss”}

1.5 LeNet

MNIST的手写数字识别适合采用LeNet网络,LeNet模型结构如下:

不过Caffe是改进了可激活函数Rectified Linear Unit (ReLU) 的LeNet,结构定义在lenet_train_test.prototxt中。这个改进了LeNet网络包含两个卷积层,两个池化层,两个全连接层,最后一层用于分类,像这样:

需要配置的文件有两个:
训练部分的lenet_train_test.prototxt ;
求解部分的lenet_solver.prototxt 。

1.6 配置训练部分

$ vi $CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt

1.6.0 Give the network a name
name: “LeNet” #给出神经网络名称

1.6.1 Data Layer
定义TRAIN的数据层。
layer {
name: “mnist” #定义该层的名字
type: “Data” #该层的类型是数据。除了Data这个type还有MemoryData/HDF5Date/HDF5output/ImageData等等
include {
phase: TRAIN #说明该层只在TRAIN阶段使用
}
transform_param {
scale: 0.00390625 #数据归一化系数,1/256归一到[0-1]
} #预处理,例如:减均值,尺寸变换,随机剪,镜像等等
data_param {
source: “mnist_train_lmdb” #训练数据的路径,必须。
backend: LMDB #默认为使用leveldb
batch_size: 64 #批量处理的大小,数据层读取lmdb数据时每次读取64条。
}
top: “data” #该层生成一个data blob
top: “label” #该层生成一个label blob
#该层产生两个blobs,: data blobs
}
该层名称为mnist,类型为data,数据源为lmdb,批量处理尺寸为64,缩放系数为1/256=0.00390625这样区间归一化为【0-1】,该层产生两个数据块:data和label。
配置了训练参数。

1.6.2 Data Layer
定义TEST的数据层
layer {
name: “mnist”
type: “Data”
top: “data”
top: “label”
include {
phase: TEST #说明该层只在TEST阶段使用
}
transform_param {
scale: 0.00390625
}
data_param {
source: “examples/mnist/mnist_test_lmdb” #测试数据的路径
batch_size: 100
backend: LMDB
}
}
配置了测试参数。

1.6.3 Convolution Layer -1
定义卷积层1。
layer {
name: “conv1” #该层名字conv1,即卷积层1
type: “Convolution” #该层的类型是卷积层
param { lr_mult: 1 } #weight learning rate,即权重w的学习率倍数, 1表示该值是全局(lenet_solver.prototxt中base_lr: 0.01)的1倍。
param { lr_mult: 2 } #bias learning rate,即偏移值权重b的学习率倍数,,2表示该值是全局(lenet_solver.prototxt中base_lr: 0.01)的2倍。
convolution_param {
num_output: 20 #卷积输出数量20,即输出为20个特征图,其规模为(data_size-kernel_size + stride)*(data_size -kernel_size + stride)
kernel_size: 5 #卷积核的大小为5*5
stride: 1 #卷积核的移动步幅为1
weight_filler {
type: “xavier” #xavier算法是一种初始化方法,根据输入和输出的神经元的个数自动初始化权值比例
}
bias_filler {
type: “constant” ##将偏移值初始化为“稳定”状态,即使用默认值0初始化。
}
}
bottom: “data” #该层使用的数据是由数据层提供的data blob
top: “conv1” #该层生成的数据是conv1
}
该层:
参数为(20,1,5,5),(20,),
输入是(64,28,28),
卷积输出是(64,20,24,24),
数据变化情况:64*1*28*28 -> 64*20*24*24
计算过程:

1.6.4 Pooling Layer -1
定义池化层1
layer {
name: “pool1”
type: “Pooling”
pooling_param {
pool: MAX #采用最大值池化
kernel_size: 2 #池化核大小为2*2
stride: 2 #池化核移动的步幅为2,即非重叠移动
}
bottom: “conv1” #该层使用的数据是conv1层输出的conv1
top: “pool1” #该层生成的数据是pool1
}
该层:
输出是(64,20,12,12),没有weight和biases。
过程数据变化:64*20*24*24->64*20*12*12
计算过程:

1.6.5 Conv2 Layer
剩下还有两层卷积(num=50,size=5,stride=1)和池化层
定义卷积层2
layer{
name:”conv2″
type:”Convolution”
param:{lr_mult:1}
param:{lr_mult:2}
convolution_param{
num_output:50
kernel_size:5
stride:1
weight_filler{type:”xavier”}
bias_filler{type:”constant”}
bottom:”pool1″
top:”conv2″}
}
该层:
输出是(64,50,10,10)
过程数据变化:64*20*12*12 -> 64*50*8*8

1.6.6 Pool2 Layer
定义池化层2
layer{
name:”pool2″
type:”Pooling”
bottom:”conv2″
top:”pool2″
pooling_param{
pool:MAX
kernel_size:2
stride:2
}
}
该层:
输出是(64,50,5,5)
过程数据变化:64*50*8*8 -> 64*50*4*4。

1.6.7 Fully Connected Layer -1
定义全连接层1。
caffe把全连接层曾作InnerProduct layer (IP层)。
layer {
name: “ip1”
type: “InnerProduct” #该层的类型为全连接层
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 500 #500个输出通道
weight_filler {
type: “xavier”
}
bias_filler {
type: “constant”
}
}
bottom: “pool2”
top: “ip1”
}
该层:
参数是(500,6250)(500,),
输出是(64,500,1,1) ,
过程数据变化:64*50*4*4 -> 64*500*1*1。
此处全连接是将C*H*W转换成1D feature vector,即50*4*4=800 -> 500=500*1*1。

1.6.8 ReLU Layer
定义ReLU1层,即非线性层。
layer {
name: “relu1”
type: “ReLU” #ReLU,限制线性单元,是一种激活函数,与sigmoid作用类似
bottom: “ip1”
top: “ip1” #底层与顶层相同以减少开支
#可以设置relu_param{negative_slope:leaky-relu的浮半轴斜率}
}
该层:
输出是(64,500)不变,
过程数据变化:64*500*1*1->64*500*1*1。
Since ReLU is an element-wise operation, we can do in-place operations to save some memory. This is achieved by simply giving the same name to the bottom and top blobs. Of course, do NOT use duplicated blob names for other layer types!

1.6.9 innerproduct layer -2
定义全连接层2。
layer {
name: “ip2”
type: “InnerProduct”
param { lr_mult: 1 }
param { lr_mult: 2 }
inner_product_param {
num_output: 10 #10个输出数据,对应0-9十个手写数字
weight_filler {
type: “xavier”
}
bias_filler {
type: “constant”
}
}
bottom: “ip1”
top: “ip2”
}
该层:
参数为(10,500)(10,),
输出为(64,10),
过程数据变化:64*500*1*1 -> 64*10*1*1

1.6.10 Loss Layer
定义损失函数估计层 。
layer {
name: “loss”
type: “SoftmaxWithLoss” #多分类使用softMax回归计算损失
bottom: “ip2”
bottom: “label” #需要用到数据层产生的lable,这里终于可以用上了label,没有top
}
该损失层过程数据变化:64*10*1*1 -> 64*10*1*1

1.6.11 accuracy layer
准确率层 。caffe LeNet中有一个accuracy layer的定义,只在TEST中计算准确率,只有name,type,buttom,top,include{phase:TEST}几部分,这是输出测试结果的一个层。
layer {
name: “accuracy”
type: “Accuracy”
bottom: “ip2”
bottom: “label”
top: “accuracy”
include {
phase: TEST
}
}

1.6.12 整个过程
*** 整个过程

REF: https://www.cnblogs.com/xiaopanlyu/p/5793280.html

1.6.13 Additional Notes: Writing Layer Rules
By default, that is without layer rules, a layer is always included in the network.
Layer definitions can include rules for whether and when they are included in the network definition, like:
layer {
// …layer definition…
include: { phase: TRAIN }
}
This is a rule, which controls layer inclusion in the network, based on current network’s state.

1.7 配置求解部分

有了prototxt,还缺一个solver, solver主要是定义模型的参数更新与求解方法。例如最后一行::
solver_mode: GPU,
修改GPU为CPU。 这样使用cpu训练。不管使用GPU还是CPU,由于MNIST 的数据集很小, 不像ImageNet那样明显。

$ vi ./examples/mnist/lenet_solver.prototxt

#指定训练和测试模型
net: “examples/mnist/lenet_train_test.prototxt” #网络模型文件的路径

# 指定测试集中多少参与向前计算,这里的测试batch size=100,所以100次可使用完全部10000张测试集。
# test_iter specifies how many forward passes the test should carry out. In the case of MNIST, we have test batch size 100 and 100 test iterations, covering the full 10,000 testing images.
test_iter: 100 #test的迭代次数,批处理大小为100, 100*100为测试集个数

# 每训练test_interval次迭代,进行一次训练.
# Carry out testing every 500 training iterations.
test_interval: 500 #训练时每迭代500次测试一次

# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01 # 基础学习率.
momentum: 0.9 #动量
weight_decay: 0.0005 #权重衰减

# 学习策略
lr_policy: “inv” # 学习策略 inv return base_lr * (1 + gamma * iter) ^ (- power)
gamma: 0.0001
power: 0.75

# 每display次迭代展现结果
display: 100

# 最大迭代数量
max_iter: 10000

# snapshot intermediate results
snapshot: 5000 #每迭代5000次存储一次参数
snapshot_prefix: “examples/mnist/lenet” #模型前缀,不加前缀为iter_迭代次数.caffmodel,加之后为lenet_iter_迭代次数.caffemodel

# solver mode: CPU or GPU
solver_mode: GPU

支持的求解器类型有:
Stochastic Gradient Descent (type: “SGD”),
AdaDelta (type: “AdaDelta”),
Adaptive Gradient (type: “AdaGrad”),
Adam (type: “Adam”),
Nesterov’s Accelerated Gradient (type: “Nesterov”) and
RMSprop (type: “RMSProp”)

1.8 训练模型

完成网络定义protobuf和solver protobuf文件后,训练很简单:
$ cd ~/caffe
$ ./examples/mnist/train_lenet.sh

读取lenet_solver.prototxt和lenet_train_test.prototxt两个配置文件
… …
创建每层网络: 数据层 – 卷积层(conv1) – 池化层(pooling1) – 全连接层(ip1) -非线性层 -损失层
… …
开始计算反馈网络
… …
输出测试网络和创建过程
… …
开始优化参数
… …
进入迭代优化过程
I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
I1203 solver.cpp:66] Iteration 100, loss = 0.26044
I1203 solver.cpp:84] Testing net
I1203 solver.cpp:111] Test score #0: 0.9785
I1203 solver.cpp:111] Test score #1: 0.0606671
#0:为准确度, 
#1:为损失

For each training iteration, lr is the learning rate of that iteration, and loss is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function. And after a few minutes, you are done!

整个训练时间会持续很久,20分钟,最后模型精度在0.989以上。
最终训练完的模型存储为一个二进制的protobuf文件位于:
./examples/mnist/lenet_iter_10000.caffemodel

至此,模型训练完毕。等待以后使用。
REF: http://caffe.berkeleyvision.org/gathered/examples/mnist.html

1.9、使用模型

使用模型识别手写数字,有人提供了python脚本,识别手写数字图片:
$ vi end_to_end_digit_recognition.py

# manual input image requirement: white blackgroud, black digit
# system input image requirement: black background, white digit

# loading settup
caffe_root = “/home/ubuntu/sdcard//caffe-for-cudnn-v2.5.48/”
model_weights = caffe_root + “examples/mnist/lenet_iter_10000.caffemodel”
model_def = caffe_root + “examples/mnist/lenet.prototxt”
image_path = caffe_root + “data/mnist/sample_digit_1.jpeg”

# set up Python environment: numpy for numerical routines, and matplotlib for plotting
import numpy as np
import scipy
import os.path
import time
# import matplotlib.pyplot as plt
from PIL import Image
import sys
sys.path.insert(0, caffe_root + ‘python’)
import caffe

# caffe.set_mode_cpu()
caffe.set_device(0)
caffe.set_mode_gpu()

# setup a network according to model setup
net = caffe.Net(model_def, # defines the structure of the model
model_weights, # contains the trained weights
caffe.TEST) # use test mode (e.g., don’t perform dropout)

exist_img_time=0
while True:
try:
new_img_time=time.ctime(os.path.getmtime(image_path))
if new_img_time!=exist_img_time:

# read image and convert to grayscale
image=Image.open(image_path,’r’)
image=image.convert(‘L’) #makes it greyscale
image=np.asarray(image.getdata(),dtype=np.float64).reshape((image.size[1],image.size[0]))

# convert image to suitable size
image=scipy.misc.imresize(image,[28,28])
# since system require black backgroud and white digit
inputs=255-image

# reshape input to suitable shape
inputs=inputs.reshape([1,28,28])

# change input data to test image
net.blobs[‘data’].data[…]=inputs

# forward processing of network
start=time.time()
net.forward()
end=time.time()
output_prob = net.blobs[‘ip2’].data[0] # the output probability vector for the first image in the batch

print ‘predicted class is:’, output_prob.argmax()

duration=end-start
print duration, ‘s’
exist_img_time=new_img_time
except IndexError:
pass
except IOError:
pass
except SyntaxError:
pass

测试:
$ python ./end_to_end_digit_recognition.py

I1228 17:22:30.683063 19537 net.cpp:283] Network initialization done.
I1228 17:22:30.698748 19537 net.cpp:761] Ignoring source layer mnist
I1228 17:22:30.702311 19537 net.cpp:761] Ignoring source layer loss
predicted class is: 3
0.0378859043121 s
结果识别出来 是“ 3 ”。

TK1之opencv -07

5.3
查看Jetson TK1 L4T版本
$ head -n 1 /etc/nv_tegra_release
— # R21 (release), REVISION: 3.0,

查看系统位数(32/64),当然是32位的了
$ getconf LONG_BIT
此外可以“uname -a”查看,输出的结果中,如果有x86_64就是64位的,没有就是32位的。

#查询opencv版本
$ pkg-config –modversion OpenCV
REF: http://blog.csdn.net/zyazky/article/details/52388756

# and setup USB 3.0 port to run USB; usb_port_owner_info=2 indicates USB 3.0
$ sudo sed -i ‘s/usb_port_owner_info=0/usb_port_owner_info=2/’ /boot/extlinux/extlinux.conf

# Disable USB autosuspend
$ sudo sed -i ‘$s/$/ usbcore.autosuspend=-1/’ /boot/extlinux/extlinux.conf
// USB 3.0 is enabled. The default is USB 2.0, /boot/extlinux/extlinux.conf must be modified to enable USB 3.0.

// Two scripts are installed in /usr/local/bin. To conserve power, by default the Jetson suspends power to the USB ports when they are not in use.
In a desktop environment, this can lead to issues with devices such as cameras and webcams.
The first script disables USB autosuspend.
REF: http://www.cnphp6.com/detail/32448

一、相机
USB 3.0的5Gbps支持full-sized USB port (J1C2 connector) has enough bandwidth to allow sending uncompressed 1080p video streams.
USB 2.0的480 Mbps is the slowest of the possible camera interfaces, it usually only supports upto 720p 30fps

1.1、先看有木有USB 3.0接口?
$ lsusb
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:07dc Intel Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
全部2.0 root hub没有启用USB 3.0接口。

1.2、改成启用3.0
Enabling support for USB 3.0 on the full-sized USB port is easier, only have to change one parameter in /boot/extlinux/extlinux.conf:
Change usb_port_owner_info=0, to usb_port_owner_info=2,
and reboot.
or,
$ sudo sed -i ‘s/usb_port_owner_info=0/usb_port_owner_info=2/’ /boot/extlinux/extlinux.conf
这个usb_port_owner_info=2 indicates USB 3.0, 默认的是2.0 USB。

1.3、禁止自动挂起
To conserve power, by default the Jetson suspends power to the USB ports when they are not in use. In a desktop environment, this can lead to issues with devices such as cameras and webcams. Some USB devices & cameras have problems on Jetson TK1 due to automatic suspending of inactive USB ports in L4T 19.2 OS to save power.
So you might need to disable USB auto-suspend mode.
可以临时用一下,You can disable this temporarily until you reboot:
$ sudo bash -c ‘echo -1 > /sys/module/usbcore/parameters/autosuspend’
or,
to perform this automatically on every boot up, you can modify your ‘/etc/rc.local’ script, add this near the bottom of the file but before the “exit” line:
# Disable USB auto-suspend, since it disconnects some devices such as webcams on Jetson TK1.
echo -1 > /sys/module/usbcore/parameters/autosuspend
or,
# Disable USB autosuspend:
sudo sed -i ‘$s/$/ usbcore.autosuspend=-1/’ /boot/extlinux/extlinux.conf

三、配置
base on Logitech C920 + tegra-ubuntu 3.10.40-gc017b03:
$ lsusb

C920就是 046d:082d, explore what USB information
$ lsusb -d 046d:082d -v | less
都加载了哪些模块
$ lsmod

加载摄像头驱动,可以用 V4L2 访问摄像头,for example usb camera mdule, such as 摄像头模块 OV5640
$ sudo modprobe tegra_camera
— modprobe: ERROR: could not insert ‘tegra_camera’: Device or resource busy
以显式的加载模块
$ sudo modprobe uvcvideols -al libjpeg*

$ ls /dev/vi*
$ cheese –device=/dev/video?
JPEG parameter struct mismatch: library thinks size is 432, caller expects 488
问题解决: 板子上运行的jpeglib 版本是80 下载的是62 ,所以出错
The jpeg error might be because it finds a wrong version of the jpeg library. I think there’s one in the Ubuntu rootfs and one in the nvidia binaries. You may want to move the nvidia version away temporarily and test again.
说得对。Problem in the library /usr/lib/arm-linux-gnueabihf/libjpeg.so.8.0.2. Gstreamer hopes that this address will be based on a special version of the library, but some packages are replaced by the library to its. The developers have promised to deal with the problem
As a workaround, you can try replacing this library the library /usr/lib/arm-linux-gnueabihf/tegra/libjpeg.so
$ cd /usr/lib/arm-linux-gnueabihf
$ ls -al libjpeg*
lrwxrwxrwx 1 root root 16 Dec 20 2013 libjpeg.so -> libjpeg.so.8.0.2
lrwxrwxrwx 1 root root 16 Dec 20 2013 libjpeg.so.8 -> libjpeg.so.8.0.2
-rw-r–r– 1 root root 157720 Dec 20 2013 libjpeg.so.8.0.2
$ ls -al tegra/libjp*
-rwxrwxr-x 1 root root 305028 Dec 17 16:03 tegra/libjpeg.so
so,
$ sudo ln -sf tegra/libjpeg.so ./libjpeg.so
$ cheese
OKAY
***### if needed we can go back–<–
$ sudo ln -sf libjpeg.so.8.0.2 ./libjpeg.so

测试下, let’s go …
$ export DISPLAY=:0

四、openCV的例程能跑?
第一个:
# Test a simple OpenCV program. Creates a graphical window, hence you should plug a HDMI monitor in or use a remote viewer such as X Tunneling or VNC or TeamViewer on your desktop.
cd ~/opencv-2.4.10/samples/cpp
g++ edge.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -o edge
./edge
第二个:
# If you have a USB webcam plugged in to your board, then test one of the live camera programs.
g++ laplace.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o laplace
./laplace
第三个:
# Test a GPU accelerated OpenCV sample.
cd ../gpu
g++ houghlines.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o houghlines
./houghlines ../cpp/logo_in_clutter.png

CPU Time : 217.342 ms
CPU Found : 39
GPU Time : 138.108 ms
GPU Found : 199

五、自己的例程能跑?
第一个:
g++ opencv_stream.cpp -I/usr/include/opencv -L/usr/lib -lopencv_calib3d -lopencv_contrib -lopencv_core -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_highgui -lopencv_imgproc -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_ts -lopencv_video -lopencv_videostab -lopencv_esm_panorama -lopencv_facedetect -lopencv_imuvstab -lopencv_tegra -lopencv_vstab -L/usr/local/cuda/lib -lcufft -lnpps -lnppi -lnppc -lcudart -lrt -lpthread -lm -ldl -o camera_stream

/usr/bin/ld: cannot find -lopencv_esm_panorama
/usr/bin/ld: cannot find -lopencv_facedetect
/usr/bin/ld: cannot find -lopencv_imuvstab
/usr/bin/ld: cannot find -lopencv_tegra
/usr/bin/ld: cannot find -lopencv_vstab
Note, All the libraries mentioned above are not required for building these samples. But they are given in case of any modifications to the code.
so,
g++ opencv_stream.cpp -I/usr/include/opencv -L/usr/lib -lopencv_calib3d -lopencv_contrib -lopencv_core -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_highgui -lopencv_imgproc -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_ts -lopencv_video -lopencv_videostab -L/usr/local/cuda/lib -lcufft -lnpps -lnppi -lnppc -lcudart -lrt -lpthread -lm -ldl -o camera_stream
./ camera_stream

第二个:
$ g++ opencv_canny.cpp -I/usr/local/include/opencv -L/usr/local/lib/
-lopencv_contrib -lopencv_core -lopencv_features2d -lopencv_flann
-lopencv_gpu -lopencv_highgui -lopencv_imgproc -lopencv_legacy -lopencv_ml
-lopencv_nonfree -lopencv_objdetect -lopencv_ocl -lopencv_photo -lopencv_stitching
-lopencv_superres -lopencv_video -lopencv_videostab
-L/usr/local/cuda/lib -lcufft -lnpps -lnppi -lnppc -lcudart -lrt -lpthread -lm -ldl -o camera_canny

$ g++ opencv_canny.cpp -I/usr/local/include/opencv -L/usr/local/lib/ -lopencv_contrib -lopencv_core -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_highgui -lopencv_imgproc -lopencv_legacy -lopencv_ml -lopencv_nonfree -lopencv_objdetect -lopencv_ocl -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -L/usr/local/cuda/lib -lcufft -lnpps -lnppi -lnppc -lcudart -lrt -lpthread -lm -ldl -o camera_canny
$ ./camera_canny

Conclusion
Use OpenCV4tegra only if the performance is really required. Check in the documentation for OpenCV4tegra whether the functions you are using are optimized or not.

五、GPU加速的OpenCV人体检测
(Full Body Detection) to build the OpenCV HOG (Hough Of Gradients) sample person detector program:
cd opencv-2.4.10/samples/gpu
g++ hog.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_nonfree -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o hog

./hog –video 768×576.avi
You can run the HOG demo such as on a pre-recorded video of people walking around. The HOG demo displays a graphical output, hence you should plug a HDMI monitor in or use a remote viewer such as X Tunneling or VNC or TeamViewer on your desktop in order to see the output.
Full Body Detection
./hog –camera /dev/video0
Note: This looks for whole bodies and assumes they are small, so you need to stand atleast 5m away from the camera if you want it to detect you!

六、GPU加速的OpenCV人脸检测(Face Detection)
$ cd gpu
g++ cascadeclassifier.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_calib3d -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_video -lopencv_videostab -o cascadeclassifier
$ ./cascadeclassifier –camera 0

运行结果FPS平均7fps,没有达到实时检测效果,要做的是并行编程将cuda与OpenCV有效地结合,实现实时人脸检测。

七、optical
光流实现
从摄像头获取数据,还可以从视频文件获取数据。
g++ optical.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_video -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_calib3d -o optical
optica 光流

光流追踪
g++ follow.cpp -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_video -lopencv_contrib -lopencv_features2d -lopencv_flann -lopencv_gpu -lopencv_legacy -lopencv_ml -lopencv_objdetect -lopencv_photo -lopencv_stitching -lopencv_superres -lopencv_videostab -lopencv_calib3d -o follow
optica 光流追踪

APP:需要hack的地方
Patching OpenCV
The OpenCV version 2.4.9 and cuda version 6.5 are not compatible. It requires a few modifications in OpenCV.
Edit the file /modules/gpu/src/nvidia/core/NCVPixelOperations.hpp in the opencv source code main directory. Remove the keyword “static” from lines 51 to 58, 61 to 68 and from 119 to 133 in it. This is an example:
Before:
template<> static inline __host__ __device__ Ncv8u _pixMaxVal() {return UCHAR_MAX;}
template<> static inline __host__ __device__ Ncv16u _pixMaxVal() {return USHRT_MAX;}
After:
template<> inline __host__ __device__ Ncv8u _pixMaxVal() {return UCHAR_MAX;}
template<> inline __host__ __device__ Ncv16u _pixMaxVal() {return USHRT_MAX;}

Patching libv4l
This version of OpenCV supports switching resolutions using libv4l library. But it has a limitation. It does not support cameras with resolutions higher than 5 Mega Pixel. Modifying libv4l2 library is required to add that support.
Follow the steps mentioned below to obtain the source code for libv4l2 library, modify and build it and then install it:
$ apt-get source v4l-utils
$ sudo apt-get build-dep v4l-utils
$ cd v4l-utils-/
Edit the file lib/libv4l2/libv4l2-priv.h as follows:
Modify the line from
“#define V4L2_FRAME_BUF_SIZE (4096 * 4096)”
To:
“#define V4L2_FRAME_BUF_SIZE (3 * 4096 * 4096)”
$ dpkg-buildpackage -rfakeroot -uc -b
$ cd ..
$ sudo dpkg -i libv4l-0__.deb
Now OpenCV will support higher resolutions as well.

TK1之caffe配置 -05

五 – cudnn安装
5.1 安装老的cudnn-6.5, 2.0版本 (推荐)
5.2 卸载老版本cudnn
5.3 安装新cudnn-7.0, 4.0版本 (不建议)
5.4 回到老的2.0版本的cudnn-6.5以配合cuda-6.5
六 – caffe安装
6.1 准备Caffe环境
6.2 下载
6.3 config
6.4 build
6.5 runtest的CUDNN_STATUS_NOT_INITIALIZED出错
6.6 LMDB_MAP_SIZE出错
6.7 跑机 benchmarking
七 – Python and/or MATLAB
7.1 Ubuntu方式 (ok)
7.2 pip方式 (not tested)
7.3 编译python接口
7.4 环境变量

五 – cudnn安装

如果要安装caffe, 需要先安装cudnn。

对于32bit的arm-tk1,适合配置是cuda6.5 + cudnn2.0 + caffe0.13,bvlc官网1.0的msater分支不能用。
对于32bit的mini-pc, 可以尝试cuda6.5 + cudnn4.0 + cafe1.00, cuda6.5是支持32bit的维一选择。

5.1 安装老的cudnn-6.5, 2.0版本 (推荐)

下载 V2:
https://developer.nvidia.com/rdp/cudnn-archive
$ tar -zxvf cudnn-6.5-linux-ARMv7-V1.tgz
$ cd cudnn-6.5-linux-ARMv7-V2

复制文件
$ sudo cp cudnn.h /usr/local/cuda/include
$ sudo cp libcudnn* /usr/local/cuda/lib

重新加载库:
$ sudo ldconfig -v

make caffe时出错。 尝试新的cudnn4.0。原因可能caffe是新版本,所以要新版本cudn持,而和r1和v2版本部分函数名称改变。

5.2 卸载老版本cudnn

$ sudo rm /usr/local/cuda/include/cudnn.h
$ sudo rm /usr/local/cuda/lib/libcudnn*

5.3 安装新cudnn-7.0, 4.0版本 (不建议)

下载: https://developer.nvidia.com/rdp/cudnn-archive
新的4.0: cuDNN v4 Library for L4T (ARMv7)版cudnn-7.0-Linux-ARMv7-v4.0-prod.tgz
$ tar -zxvf cudnn-7.0-linux-ARMv7-v4.0-prod.tgz

产生cuda文件夹
$ cd ~/Downloads/cuda/

复制cuDNN文件:
$ sudo cp include/cudnn.h /usr/local/cuda/include/
$ sudo cp lib/libcudnn* /usr/local/cuda/lib/
$ sudo ldconfig -v

重新加载库:
$ sudo ldconfig -v

在32bit的arm体系的TK1上make runtest不成功。应该是因为github的caffe太新了。

5.4 回到老的2.0版本的cudnn-6.5以配合cuda-6.5

$ sudo rm /usr/local/cuda/include/cudnn.h
$ sudo rm /usr/local/cuda/lib/libcudnn*
# cd 6.5 cudnn v2
$ sudo cp cudnn.h /usr/local/cuda/include
$ sudo cp libcudnn* /usr/local/cuda/lib
$ sudo ldconfig -v

六 – caffe安装

6.1 准备Caffe环境

$ sudo add-apt-repository universe
$ sudo apt-get update

$ sudo apt-get install libprotobuf-dev protobuf-compiler 、
$ sudo apt-get install cmake libleveldb-dev libsnappy-dev
$ sudo apt-get install libatlas-base-dev libhdf5-serial-dev libgflags-dev

在python接口的scipy库的时候,需要fortran编译器(gfortran),如果没有报错,因此可以先安装:
$ sudo apt-get install gfortran

安装下面可能需要梯子:
$ sudo apt-get install libgoogle-glog-dev liblmdb-dev

boost这一步视情况而定。有的说boost的版本会有问题,建议降低版本到1.55版本的。因为caffe官网给的是 $ sudo apt-get install –no-install-recommends libboost-all-dev , 特意加了个–no-install-recommends。而且installation主页特意有Boost>=1.55要求。
$ sudo apt-get install libboost-dev libboost-thread-dev libboost-system-dev

但是上面这个网上常见的安装在ubuntu14.04会默认安装1.54。所以如果装了的话卸载, 完了后用这个来安装:
检查下:
$ dpkg -S /usr/include/boost/version.hpp
— 1.5.4
卸载掉:
$ sudo apt-get autoremove libboost1.54-dev 这个autoremove一下卸载了上百个包,free出 980M space, 杀伤力太大。
再安装1.55:
$ sudo apt-get install libboost1.55-all-dev libboost-thread1.55-dev libboost-system1.55-dev libboost-filesystem1.55-dev

编译器这步骤视情况而定。系统自带shi 4.8,如果出错,随时可以降级为4.7,如果编译caffe时出现编译器错误则降低版本:
$ sudo apt-get install gcc-4.7 g++-4.7 cpp-4.7
$ cd /usr/bin
$ sudo rm gcc g++ cpp
$ sudo ln -s gcc-4.9 gcc
$ sudo ln -s g++-4.9 g++
$ sudo ln -s cpp-4.9 cpp

赋予访问权限:
$ sudo usermod -a -G video $USER

all fine!

6.2 下载

# Git clone Caffe
$ git clone https://github.com/BVLC/caffe.git
$ cd caffe
$ git checkout dev
$ cp Makefile.config.example Makefile.config

6.3 config

$ vi Makefile.config
使能USE_CUDNN := 1

查看CUDA计算容量
$ /usr/local/cuda/samples/1_Utilities/deviceQuery
— 3.2
在caffe的Makefile.config文件中,禁止CUDA_ARCH *的60和61
CUDA_ARCH := -gencode arch=compute_20,code=sm_20
  -gencode arch=compute_20,code=sm_21
  -gencode arch=compute_21,code=sm_21 <-------------------   -gencode arch=compute_30,code=sm_30   -gencode arch=compute_35,code=sm_35   -gencode arch=compute_50,code=sm_50   -gencode arch=compute_50,code=compute_50 如果 make caffe时有 Error: nvcc fatal : Unsupported gpu architecture 'compute_60' $ vi Makefile.config. # CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the *_50 through *_61 lines for compatibility. # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility. CUDA_ARCH := -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 # -gencode arch=compute_60,code=sm_60 # -gencode arch=compute_61,code=sm_61 # -gencode arch=compute_61,code=compute_61 6.4 build (只要出错, 按照下面顺序来过: make clean -> make all -j4 -> make test -j4 ->make runtest -j4 )
$ make clean ok.
$ make -j 4 all 20 mins ok
$ make test -j4 5 minute ok
$ make runtest -j4 10 mins

出问题的基本都是runtest:
Q1
常见错误是sudo, 运行runtest测试不需要sudo。
Q2
Error
g++: internal compiler error: Killed (program cc1plus)
解决方法:重启即可,原因不明(有说是内存问题)

6.5 runtest的CUDNN_STATUS_NOT_INITIALIZED出错

Note: Randomizing tests’ orders with a seed of 86430 .
F1222 00:00:55.561334 15196 cudnn_softmax_layer.cpp:15] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***

This indicates a CuDNN/CUDA error. I’ve seen this when the CUDA/driver version and the CuDNN version are mismatched (5.0.5 and CUDA 7.0). This is almost certainly something to do with your setup, and not a bug in Caffe.
I encountered the same issue and finally found out that there is a mysterious
so you mean in order for this to compile successfully, we need to compile Caffe without the support of CUDNN right?
Thank you verymuch . I really appreciate your help 🙂
Q1
有的说需要降低版本:
$ cd /usr/bin
$ sudo rm gcc g++ cpp
$ sudo ln -s gcc-4.9 gcc
$ sudo ln -s g++-4.9 g++
$ sudo ln -s cpp-4.9 cpp
Q2
有的说是boost的版本问题,建议降低版本到1.55版本的。因为caffe官网给的是 $ sudo apt-get install –no-install-recommends libboost-all-dev
特意加了个–no-install-recommends。而且installation主页有Boost>=1.55,可以抵触的是上面这命令默认装的是1.54。所以卸载完了后用这个安装:$ sudo apt-get install libboost1.55-all-dev , 估计没事了吧。
校验:
$ dpkg -S /usr/include/boost/version.hpp
— 1.5.4
卸载:
$ sudo apt-get autoremove libboost1.54-dev 没必要吧,估计不是?
这个autoremove以下卸载了上百个包,free出 980M space, 杀伤力太大。
安装:
$ sudo apt-get install libboost1.55-all-dev
有小故障:
cannot find -l boost_system boost_filesystem boost_thread
$ apt-cache search libboost | grep 1.55
$ sudo apt-get install libboost-system1.55-dev
$ sudo apt-get install libboost-filesystem1.55-dev
$ sudo apt-get install libboost-thread1.55-dev
CUDNN_STATUS_NOT_INITIALIZED

Q3 原因和方法
在$ git checkout dev的时候就有提示,该分支不存在, 没注意。
but not the dev branch, even dev branch does not exist now,

所以从github下载的是master分支, 已经太新了, 在tk1能用的dev分支已经不见了。。

目前这个master branch的caffe,最低是要求cuDNN 5.0以上+ CUDA 7.0以上。
而我们的是cudnn2.0 + cuda6.5

cuDNN v5.1 has different versions for CUDA 7.5 and CUDA 8.0
cuDNN v5 has different versions for CUDA 7.5 and CUDA 8.0
cuDNN v4 and v3 both require CUDA 7.0
cuDNN v2 and v1 both require CUDA 6.5

可以从这里克隆cudnn 2.x能用的caffe :
https://github.com/RadekSimkanic/caffe-for-cudnn-v2.5.48
还有一个works for CUDNN v2 on Jetson TK1的caffe: https://github.com/platotek/caffetk1
不一定惯用

$ make runtest -j4
Major revision number: 3
Minor revision number: 2
Name: GK20A
Total global memory: 1980252160
Total shared memory per block: 49152
Total registers per block: 32768
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 1024
Maximum dimension 0 of block: 1024
Maximum dimension 1 of block: 1024
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 2147483647
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 65535
Clock rate: 852000
Total constant memory: 65536
Texture alignment: 512
Concurrent copy and execution: Yes
Number of multiprocessors: 1
Kernel execution timeout: No
Unified virtual addressing: Yes

虽然还有小错LMDB_MAP_SIZE, 不过容易解决。

6.6 LMDB_MAP_SIZE出错

FAILED:
F1222 07:03:16.822439 16826 db_lmdb.hpp:14] Check failed: mdb_status == 0 (-30792 vs. 0) MDB_MAP_FULL: Environment mapsize limit reached

这个是32bit的限制太小了, 1T改为半T。
I think this issue is due to the Jetson being a 32-bit (ARM) device, and the constant LMDB_MAP_SIZE in src/caffe/util/db.cpp being too big for it to understand. Unfortunately master has a really large value for LMDB_MAP_SIZE in src/caffe/util/db.cpp, which confuses our little 32-bit ARM processor on the Jetson, eventually leading to Caffe tests failing with errors like MDB_MAP_FULL: Environment mapsize limit reached.

$ vi src/caffe/util/db_lmdb.cpp
const size_t LMDB_MAP_SIZE = 1099511627776; // 1 TB
改为 2^29 (536870912)

$ vi ./examples/mnist/convert_mnist_data.cpp
adjust the value from 1099511627776 to 536870912.

$ make runtest -j 4
… … …
[==========] 1702 tests from 251 test cases ran. (5165779 ms total)
[ PASSED ] 1702 tests.
YOU HAVE 2 DISABLED TESTS

OKAY !

6.7 跑机 benchmarking

最后,运行一下Caffe的基准代码来检测一下性能。验证cpu和gpu下运行效率:
Finally you can run Caffe’s benchmarking code to measure performance.

* 这个cpu的大概600秒
$ build/tools/caffe time –model=models/bvlc_alexnet/deploy.prototxt
… …
I1222 09:11:54.935829 19824 caffe.cpp:366] Average Forward pass: 5738.58 ms.
I1222 09:11:54.935860 19824 caffe.cpp:368] Average Backward pass: 5506.83 ms.
I1222 09:11:54.935890 19824 caffe.cpp:370] Average Forward-Backward: 11246.2 ms.
I1222 09:11:54.935921 19824 caffe.cpp:372] Total Time: 562310 ms.
I1222 09:11:54.935952 19824 caffe.cpp:373] *** Benchmark ends ***
ok.
These results are the summation of 10 iterations, so per image recognition on the Average Forward Pass is the listed result divided by 10, i.e. 227.156 ms is ~23 ms per image recognition.

* 这个gpu的大概30秒
$ build/tools/caffe time –model=models/bvlc_alexnet/deploy.prototxt –gpu=0
… …
I1222 09:16:02.577358 19857 caffe.cpp:366] Average Forward pass: 278.286 ms.
I1222 09:16:02.577504 19857 caffe.cpp:368] Average Backward pass: 318.795 ms.
I1222 09:16:02.577637 19857 caffe.cpp:370] Average Forward-Backward: 599.67 ms.
I1222 09:16:02.577800 19857 caffe.cpp:372] Total Time: 29983.5 ms.
I1222 09:16:02.577951 19857 caffe.cpp:373] *** Benchmark ends ***
ok.
It’s running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result.

此后可以测试demo。 Caffe两个demo分别是mnist和cifar10,尤其是mnist,称为caffe编程的hello world。

七 – Python and/or MATLAB

编译caffe之后, 才可以对付pycaffe接口的编译。

Caffe拥有pythonC++shell接口,在Caffe使用python特别方便,在实例中都有接口的说明。
安装步骤是: Python依赖包、Matlab、Matlab engine for python

但即使有默认的python,还是要安装python-dev,why?
因为linux发行版通常会把类库的头文件和相关的pkg-config分拆成一个单独的xxx-dev(el)包。 以python为例,有些场合的应用是需要python-dev的, 例如:
要自己安装一个源外的python类库, 而这个类库内含需要编译的调用python api的c/c++文件;
自己写的一个程序编译需要链接libpythonXX.(a|so)。

在UBUNTU系统下使用python,很多时候需要安装不同的python库进行扩展。
通常用到的两种方式:pip install和ubuntu系统独有的apt-get install, 区别?
* pip install的源是pyPI , apt-get 的源是ubuntu仓库。
对于python的包来说,pyPI的源要比ubuntu更多,对于同一个包,pyPI可以提供更多的版本以供下载。pip install安装的python包,可以只安装在当前工程内。
* apt-get 安装的包是系统化的包,在系统内完全安装。
* apt-get 和 pip install 中,对于相同python包,命名可能会不同:apt-get install:对于python2来说,包的名称可能是python-

在~/caffe/python/requirements.txt的清单文件列出了需要的依赖库
Cython>=0.19.2
numpy>=1.7.1
scipy>=0.13.2
scikit-image>=0.9.3
matplotlib>=1.3.1
ipython>=3.0.0
h5py>=2.2.0
leveldb>=0.191
networkx>=1.8.1
nose>=1.3.0
pandas>=0.12.0
python-dateutil>=1.4,<2 protobuf>=2.5.0
python-gflags>=2.0
pyyaml>=3.10
Pillow>=2.3.0
six>=1.1.0

7.1 Ubuntu方式 (ok)

安装相应的build依赖包:
$ sudo apt-get install build-essential

Caffe的Python接口需要numpy库
$ sudo apt-get install python-numpy

安装scipy库
$ sudo apt-get install python-scipy

boost
$ sudo apt-get install libboost-python1.55-dev
//$ sudo apt-get install libboost-python-dev -X

$ sudo apt-get install python-protobuf

$ sudo apt-get install python-skimage

7.2 pip方式 (not tested)

pip进行安装 // Use pip to install numpy and scipy instead for newer versions.
$ for req in $(cat requirements.txt); do pip install $req; done
(中间安装google的 protobuf时要翻墙)
不过网上还是看到有人说不要用他的文档,自己一个一个装比较好。

一句话安装:
$ sudo pip install cython numpy scipy scikit-image matplotlib ipython h5py leveldb networkx nose pandas python-dateutil protobuf python-gflags pyyaml pillow six

$ sudo pip install scikit-image
generate a error: Exception IndexError: list index out of range

Also note that in the Makefile.config of caffe, there is this line:
PYTHON_INCLUDE := /usr/include/python2.7 <-- correct /usr/lib/python2.7/dist-packages/numpy/core/include <-- doesn't exist so, try: $ pip install -U scikit-image -U is --upgrade,意是如果已安装就升级到最新版 按照官方建议安装anaconda包。 在anaconda官网下载.sh文件,执行,最后添加bin目录到环境变量即可。 建议安装Anaconda包,这个包能独立于系统自带的python库,并且提供大部分Caffe需要的科学运算Python库。要注意,在运行Caffe时,可能会报一些找不到libxxx.so的错误,而用 locate libxxx.so命令发现已经安装在anaconda中,这时首先想到的是在/etc/ld.so.conf.d/ 下面将 $your_anaconda_path/lib 加入 LD_LIBRARY_PATH中。 但是这样做可能导致登出后无法再进入桌面!!!原因(猜测)可能是anaconda的lib中有些内容于系统自带的lib产生冲突。 正确的做法:为了不让系统在启动时就将anaconda/lib加入系统库目录,可以在用户自己的~/.bashrc 中添加library path, 比如就在最后添加了两行 # add library path LD_LIBRARY_PATH=your_anaconda_path/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH 开启另一个终端后即生效,并且重启后能够顺利加载lightdm, 进入桌面环境。 $ vi ~/.bashrc export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH sudo ldconfig 7.3 编译python接口 编译Python wrapper, 编译python接口 $ cd ~/caffe $ make pycaffe -j4 --结果显示ALL TESTS PASSED ok $ make pytest -j4 在python中测试: $ cd caffe-folder/python $ python >>>import caffe
没有报错,说明caffe安装全部完成

添加caffe/python 到python path变量, 把caffe下的python路径添加到到path中,
这样不需要进入caffe/python目录下就能调用caffe的python接口。
$ vi ~/.bashrc
#set caffe PYTHONPATH
export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH