机器人应用

智能机器人(46):目标识别

ORK(Object Recognition Kitchen)是一套以template matching方法为主的目标识别工具,把看到的物体跟资料库中的物体比对,够相似,就算辨识成功。
ORK, based on OpenCV+PCL。OpenCV’s purpose is to help turn “seeing” into perception. We will also use active depth sensing to “cheat”
ORK, was built on top of ecto which is a lightweight hybrid C++/Python framework for organizing computations as directed acyclic graphs.
There is currently no unique method to perform object recognition. Objects can be textured, non textured, transparent, articulated, etc. For this reason, the Object Recognition Kitchen was designed to easily develop and run simultaneously several object recognition techniques. In short, ORK takes care of all the non-vision aspects of the problem for you (database management, inputs/outputs handling, robot/ROS integration …) and eases reuse of code.
REF: http://www.ais.uni-bonn.de/~holz/spme/talks/01_Bradski_SemanticPerception_2011.pdf

1. ORK安装

1.1 预安装ORK依赖
$ export DISTRO=indigo
$ sudo apt-get install libopenni-dev ros-${DISTRO}-catkin ros-${DISTRO}-ecto* ros-${DISTRO}-opencv-candidate ros-${DISTRO}-moveit-msgs
maybe only need tihs ros-${DISTRO}-opencv-candidate.

1.2. 安装ORK包
install all about ork:
$ sudo apt-get install ros-indigo-object-recognition-*
or,
$ sudo apt-get install ros-indigo-object-recognition-core ros-indigo-object-recognition-linemod ros-indigo-object-recognition-msgs ros-indigo-object-recognition-renderer ros-indigo-object-recognition-ros ros-indigo-object-recognition-ros-visualization

3. CouchDb数据库

In ORK everything is stored in a database: objects, models, training data. 对象、模型、训练数据。
Couchdb from Apache is our main database and it is the most tested implementation. To set up local instance, what you must do is install couchdb, and ensure that the service has started.

3.1 先安装CouchDB工具
$ sudo apt-get install couchdb
检查是否安装成功
$ curl -X GET http://localhost:5984
如果成功的话:
— {“couchdb”:”Welcome”,”version”:”1.0.1″}

3.2 数据可视化的Web接口
We have a set of webpages that may be pushed to your couchdb instance that help browse the objects that you training or models created.
a) First installed couch-app
$ sudo apt-get install python-pip
$ sudo pip install -U couchapp
数据库中可视化数据。
We provide a utility that automatically installs the visualizer on the DB. This will upload the contents of the directory to collection in your couchdb instance, called or_web_ui.
$ rosrun object_recognition_core push.sh

After this you can browse the web ui using the url:
http://localhost:5984/or_web_ui/_design/viewer/index.html
now, is empty.

5. TOD 的 Quickstart

ORK比较像一个框架,包含了好几种演算法,其中一种是TOD。
$ roscore
$ roslaunch openni_launch openni.launch

5.1 设置工作区。 Setup capture workspace
$ rosrun object_recognition_capture orb_template -o my_textured_plane
保持平面正视,图像居于中心。 ‘s’保存图像,例如上面的 my_textured_plane。 ‘q’退出。

可以跟踪下,以查看是否合格的模板。
$ rosrun object_recognition_capture orb_track –track_directory my_textured_plane

NOTE:
Use the SXGA (roughly 1 megapixel) mode of your openni device if possible.
$ rosrun dynamic_reconfigure dynparam set /camera/driver image_mode 1
$ rosrun dynamic_reconfigure dynparam se t /camera/driver depth_registration True

5.2 获取目标。 Capture objects
Once you are happy with the workspace tracking, its time to capure an object.
a) 预览模式。 Place an object at the origin of the workspace. An run the capture program in preview mode. Make sure the mask and pose are being picked up.
$ rosrun object_recognition_capture capture -i my_textured_plane –seg_z_min 0.01 -o silk.bag –preview

b) 正式模式。 When satisified by the preview mode, run it for real. The following will capture a bag of 60 views where each view is normally distributed on the view sphere. The mask and pose displays should only refresh when a novel view is captured. The program will finish when 35 (-n) views are captured. Press ‘q’ to quit early.
$ rosrun object_recognition_capture capture -i my_textured_plane –seg_z_min 0.01 -o silk.bag

c) 数据上传到DB。 Now time for upload. Make sure you install couch db on your machine. Give the object a name and useful tags seperated by a space, e.g. ‘milk soy silk’.
$ rosrun object_recognition_capture upload -i silk.bag -n ‘Silk’ milk soy silk –commit

5.3 训练对象。 Train objects
Repeat the steps above for the objects you would like to recognize. Once you have captured and uploaded all of the data, it time to mesh and train object recognition.
a) 生成mesh。 Meshing objects can be done in a batch mode as follows:
$ rosrun object_recognition_reconstruction mesh_object –all –visualize –commit

b) 查看。 The currently stored models are on:
http://localhost:5984/or_web_ui/_design/viewer/meshes.html

c) 训练。 Next objects should be trained. It may take some time between objects, this is normal. Also, here assumes that you are using TOD which only works for textured objects. Please refer to the documentation of other methods.
$rosrun object_recognition_core training -c `rospack find object_recognition_tod`/conf/training.ork –visualize

5.4 检测对象。 Detect objects
Now we’re ready for detection.
First launch rviz, it should be subscribed to the right markers for recognition results. /markers is used for the results, and it is a marker array.
rosrun object_recognition_core detection -c `rospack find object_recognition_tod` /conf/detection.ros.ork –visualize

7. Linemod 的 Tutorials

ORK比较像一个框架,包含了好几种演算法,其中一种Linemod。 Linemod is a pipeline that implements one of the best methods for generic rigid object recognition and it proceeds using very fast template matching.
REF: http://ar.in.tum.de/Main/StefanHinterstoisser.
这里练习:
1)将一个对象的mesh添加到DB中;
2)学习如何手动添加对象到DB中;
3)ORK数据库中数据的可视化。

7.1 添加目标obj。 Creating an object in the DB
ORK主要是识别对象的,首先需要将对象存储在DB中。像ORK 3d capture这样的管道可以用来创建对象。但是也要从内核(core)中用脚本对其进行处理。
$ rosrun object_recognition_core object_add.py -n “coke ” -d “A universal can of coke” –commit
— Stored new object with id: 9633ca8fc57f2e3dad8726f68a000326
执行上面这个指令之后,就可以从查看自己的资料库里是否已经新增这个物体:
http://localhost:5984/_utils/database.html?object_recognition/_design/objects/_view/by_object_name
点击对象,可以看到对象的信息,尤其是对象的id,DB的每个元素都有其自己的哈希表作为唯一ID(防止你给不同的对象相同的命名)。记住这个ID 9633ca8fc57f2e3dad8726f68a000326 :

7.2 添加模型mesh。 adding a mesh for the object
首先需要用DB接口获取了正确的对象id。
这里先获取一个例子中的模型mesh。When you install ORK, the database is empty. Luckily, ORK tutorials comes with a 3D mesh of a coke that can be downloaded here:
接下来要指定这个目标的3D模型, 在ork_tutorials这个例程里有coke.stl 档,就是可乐罐的3D模型。先下载这个例程,并且编译:
$ git clone https://github.com/wg-perception/ork_tutorials
$ cd ..
注意编译。 $ cd ..
$ catkin_make

NOTICE:下面这个指令中那串像乱码的内容就是object 的ID ,这串内容要从上一步自己的资料库里才能查得到。
You can upload the object and its mesh to the database with the scripts from the core:
$ rosrun object_recognition_core mesh_add.py 9633ca8fc57f2e3dad8726f68a000326 /home/dehaou1404/catkin_ws/src/ork_tutorials/data/coke.stl –commit
— Stored mesh for object id : 9633ca8fc57f2e3dad8726f68a000326

7.3 可视化数据。 Visualizing the object
Now, if you want to visualize the object in the db, you can just go to the visualization URL:
http://localhost:5984/or_web_ui/_design/viewer/meshes.html

you should see the following:

7.4 howto Delete the object
$ rosrun object_recognition_core object_delete.py OBJECT_ID

7.5 学习
Now, you can learn objects models from the database.
Execute the Linemod in the training mode with the configuration file through the -c option. The configuration file should define a pipeline that reads data from the database and computes objects models.
$ rosrun object_recognition_core training -c `rospack find object_recognition_linemod`/conf/training.ork
这个training指令会利用资料库里的3D 模型建立辨识时所需要的template,如果执行成功,你会看到如下的讯息
Training 1 objects.
computing object_id: 9633ca8fc57f2e3dad8726f68a000326
Info, T0: Load /tmp/fileeW7jSB.stl
Info, T0: Found a matching importer for this file format
Info, T0: Import root directory is ‘/tmp/’
Info, T0: Entering post processing pipeline
Info, T0: Points: 0, Lines: 0, Triangles: 1, Polygons: 0 (Meshes, X = removed)
Error, T0: FindInvalidDataProcess fails on mesh normals: Found zero-length vector
Info, T0: FindInvalidDataProcess finished. Found issues …
Info, T0: GenVertexNormalsProcess finished. Vertex normals have been calculated
Error, T0: Failed to compute tangents; need UV data in channel0
Info, T0: JoinVerticesProcess finished | Verts in: 1536 out: 258 | ~83.2%
Info, T0: Cache relevant are 1 meshes (512 faces). Average output ACMR is 0.669922
Info, T0: Leaving post processing pipeline

7.4 识别
Once learned, objects can be detected from the input point cloud. In order to detect object continuously, execute the Linemod in the detection mode with the configuration file that defines a source, a sink, and a pipeline
REF: http://wg-perception.github.io/object_recognition_core/detection/detection.html
$ roslaunch openni_launch openni.launch
$ rosrun dynamic_reconfigure dynparam set /camera/driver depth_registration True
$ rosrun dynamic_reconfigure dynparam set /camera/driver image_mode 2
$ rosrun dynamic_reconfigure dynparam set /camera/driver depth_mode 2
$ rosrun topic_tools relay /camera/depth_registered/image_raw /camera/depth_registered/image
$ rosrun object_recognition_core detection -c `rospack find object_recognition_linemod`/conf/detection.ros.ork
^^^ igoned

7.5 Visualization with RViz
a) config rviz
$ roslaunch openni_launch openni.launch
$ rosrun rviz rviz
Set the Fixed Frame (in Global Options, Displays window) to /camera_depth_optical_frame.
Add a PointCloud2 display and set the topic to /camera/depth/points. This is the unregistered point cloud in the frame of the depth (IR) camera and it is not matched with the RGB camera images. For visualization of the registered point cloud, the depth data could be aligned with the RGB data. To do it, launch the dynamic reconfigure GUI:
$ rosrun rqt_reconfigure rqt_reconfigure
Select /camera/driver from the drop-down menu and enable the depth_registration checkbox. In RViz, change the PointCloud2 topic to /camera/depth_registered/points and set the Color Transformer to RGB8 to see both color and 3D point cloud of your scene.
REF: https://wg-perception.github.io/ork_tutorials/tutorial03/tutorial.html
b) 然后就可以用Rviz查看识别的目标。
Go to RViz and add the OrkObject in the Displays window.
Select the OrkObject topic and the parameters to display: id / name / confidence.
Here, we show an example of detecting two objects (a coke and a head of NAO) and the outcome visualized in RViz.
For each recognized object, you can visualize its point cloud and also a point cloud of the matching object from the database. For this, compile the package with the CMake option -DLINEMOD_VIZ_PCD=ON. Once an object is recognized, its point cloud from the sensor 3D data is visualized as shown in the following image (check blue color). The cloud is published under the /real_icpin_ref topic.
For the same recognized object, we can visualize the point cloud of the matching object from the database as shown in the following image (check yellow color). The point cloud is created from the mesh stored in the database by visualizing at a pose returned by Linemod and refined by ICP. The cloud is published under the /real_icpin_model topic.
REF: http://wg-perception.github.io/ork_tutorials/tutorial03/tutorial.html

11. 算法简介
这个Linemod算法的核心概念就是整合多种不同的modalities,把modality想成物体的不同特征可能比较好理解,
例如下图中就有两种modalities – gradient 跟surface normal,而因为这两种特征所表达的特性不一样,所以可以互补,进而达到更好的辨识效果。

所以说,Linemod 需要先有已知的物体模型,然后先取得这个物体各种modlaities 的template,这样在辨识的时候就可以拿template 来比对了。
REF: http://blog.techbridge.cc/2016/05/14/ros-object-recognition-kitchen/

REF:http://wenku.baidu.com/link?url=hK1FMB2yR_frGcLxmQKAiZYhgKJdt0LBpaK1Sy-dY0ZgXuR2_jnLDYDBlaIz9eSLyvWOd66fVv-t3kH6NVapF2at3Cv9P1DfoxIyAn3MAuG

智能机器人(45):人脸识别

1. 人脸检测 vs 人脸识别
2. 人脸识别的OpcnCV实现
3. 人脸识别的ROS实现
4. 基于RGBD-camera的人脸识别

后面所属人脸识别基于opencv,由cvbridge桥接到ros,先把张三李四的图片训练加入到数据库,然后由actionlib提供识别服务。
这是训练:
人脸识别的训练

这是识别:
人脸识别

  • 1. 人脸检测 vs 人脸识别

人脸检识别一般包括人脸检测和人脸识别两步:
1. 人脸检测 Face Detection, a photo is searched to find any face.
2. 人脸识别 Face Recognition, detected and processed face compared to a database of known faces to decide who that person is.

1. 人脸检测已经达到准确度90-95%,例如OpenCV’s Face Detector,麻烦点的是It is usually harder to detect a person’s face when they are viewed from the side or at an angle, and sometimes this requires 3D Head Pose Estimation. It can also be very difficult to detect a person’s face if the photo is not very bright, or if part of the face is brighter than another or has shadows or is blurry or wearing glasses, etc.
2. 但是人脸识别准确度就不乐观, However, Face Recognition is much less reliable than Face Detection, generally 30-70% accurate. Face Recognition has been a strong field of research since the 1990s, but is still far from reliable, and more techniques are being invented each year.

后面所属人脸识别用特征脸方法 Eigen faces,也称作主成分分析法 PCA:Principal Component Analysis,a simple and popular method of 2D FR。

1.1. 特征脸
特征向量源于概率分布的协方差矩阵。该协方差矩阵源于特征脸的集合。实现了降维。
Eigenfaces is the name given to a set of eigenvectors when they are used in the computer vision problem of human face recognition.
The approach of using eigenfaces for recognition was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland in face classification.
The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images.
The eigenfaces themselves form a basis set of all images used to construct the covariance matrix.
This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set.

1.2. 原理
特征脸就是包含主成分的特征向量。
A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces.
Informally, eigenfaces can be considered a set of “standardized face ingredients”, derived from statistical analysis of many pictures of faces.
Any human face can be considered to be a combination of these standard faces. For example, one’s face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even -3% from eigenface 3.

1.3 实现
1.3.1. 先准备训练集。Prepare a training set of face images. The pictures constituting the training set should have been taken under the SAME lighting conditions, and must be NORMOLIZED to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single row with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image.
1.3.2. 减去均值矩阵,抛弃直流分量。Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T.
1.3.3. 计算特征向量。Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below.
1.3.4. 通过阈值化获得主成分。Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance.
如此,即可。
These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction.

  • 2. 人脸识别的OpcnCV实现

2.1. detect a face with OpenCV’s Face Detector
opencv经典的是使用哈尔级联 The OpenCV library makes it fairly easy to detect a frontal face in an image using its Haar Cascade Face Detector
The function “cvHaarDetectObjects” in OpenCV performs the actual face detection, it is best to write a wrapper function:
// Perform face detection on the input image, using the given Haar Cascade.
// Returns a rectangle for the detected region in the given image.
CvRect detectFaceInImage(IplImage *inputImg, CvHaarClassifierCascade* cascade)
{
// Smallest face size.
CvSize minFeatureSize = cvSize(20, 20);
// Only search for 1 face.
int flags = CV_HAAR_FIND_BIGGEST_OBJECT | CV_HAAR_DO_ROUGH_SEARCH;
// How detailed should the search be.
float search_scale_factor = 1.1f;
IplImage *detectImg;
IplImage *greyImg = 0;

// Detect all the faces in the greyscale image.
t = (double)cvGetTickCount();
rects = cvHaarDetectObjects( detectImg, cascade, storage,
search_scale_factor, 3, flags, minFeatureSize);
t = (double)cvGetTickCount() – t;
ms = cvRound( t / ((double)cvGetTickFrequency() * 1000.0) );
nFaces = rects->total;
printf(“Face Detection took %d ms and found %d objectsn”, ms, nFaces);

// Get the first detected face (the biggest).
if (nFaces > 0)
rc = *(CvRect*)cvGetSeqElem( rects, 0 );
else
rc = cvRect(-1,-1,-1,-1);    // Couldn’t find the face.
}

Now you can simply call “detectFaceInImage” whenever you want to find a face in an image.

2.2. specify the face classifier that OpenCV use to detect the face
分类器的方案 For example, OpenCV comes with several different classifiers for frontal face detection, as well as some profile faces (side view), eye detection, nose detection, mouth detection, whole body detection, etc. You can actually use this function with any of these other detectors if you want, or even create your own custom detector such as for car or person detection (read here), but since frontal face detection is the only one that is very reliable, it is the only one we discuss.

For frontal face detection, you can chose one of these Haar Cascade Classifiers that come with OpenCV (in the “datahaarcascades” folder):
“haarcascade_frontalface_default.xml”
“haarcascade_frontalface_alt.xml”
“haarcascade_frontalface_alt2.xml”
“haarcascade_frontalface_alt_tree.xml”

So you could do this in your program for face detection:
// Haar Cascade file, used for Face Detection.
char *faceCascadeFilename = “haarcascade_frontalface_alt.xml”;
// Load the HaarCascade classifier for face detection.
CvHaarClassifierCascade* faceCascade;
faceCascade = (CvHaarClassifierCascade*)cvLoad(faceCascadeFilename, 0, 0, 0);
if( !faceCascade ) {
printf(“Couldnt load Face detector ‘%s’n”, faceCascadeFilename);
exit(1);
}

Now that you have detected a face, you can use that face image for Face Recognition

2.3. 预处理preprocess images for Face Recognition

Now that you have detected a face, you can use that face image for Face Recognition. However, if you tried to simply perform face recognition directly on a normal photo image, you will probably get less than 10% accuracy!

It is extremely important to apply various image pre-processing techniques to standardize the images that you supply to a face recognition system. Most face recognition algorithms are extremely sensitive to lighting conditions, so that if it was trained to recognize a person when they are in a dark room, it probably wont recognize them in a bright room, etc. This problem is referred to as “lumination dependent”, and there are also many other issues, such as the face should also be in a very consistent position within the images (such as the eyes being in the same pixel coordinates), consistent size, rotation angle, hair and makeup, emotion (smiling, angry, etc), position of lights (to the left or above, etc). This is why it is so important to use a good image preprocessing filters before applying face recognition. You should also do things like removing the pixels around the face that aren’t used, such as with an elliptical mask to only show the inner face region, not the hair and image background, since they change more than the face does.

For simplicity, the face recognition system I will show you is Eigenfaces using greyscale images. So I will show you how to easily convert color images to greyscale (also called ‘grayscale’), and then easily apply Histogram Equalization as a very simple method of automatically standardizing the brightness and contrast of your facial images. For better results, you could use color face recognition (ideally with color histogram fitting in HSV or another color space instead of RGB), or apply more processing stages such as edge enhancement, contour detection, motion detection, etc. Also, this code is resizing images to a standard size, but this might change the aspect ratio of the face. You can read my tutorial HERE on how to resize an image while keeping its aspect ratio the same.

Here is some basic code to convert from a RGB or greyscale input image to a greyscale image, resize to a consistent dimension, then apply Histogram Equalization for consistent brightness and contrast:
// Either convert the image to greyscale, or use the existing greyscale image.
IplImage *imageGrey;
if (imageSrc->nChannels == 3) {
imageGrey = cvCreateImage( cvGetSize(imageSrc), IPL_DEPTH_8U, 1 );
// Convert from RGB (actually it is BGR) to Greyscale.
cvCvtColor( imageSrc, imageGrey, CV_BGR2GRAY );
}
else {
// Just use the input image, since it is already Greyscale.
imageGrey = imageSrc;
}

// Resize the image to be a consistent size, even if the aspect ratio changes.
IplImage *imageProcessed;
imageProcessed = cvCreateImage(cvSize(width, height), IPL_DEPTH_8U, 1);
// Make the image a fixed size.
// CV_INTER_CUBIC or CV_INTER_LINEAR is good for enlarging, and
// CV_INTER_AREA is good for shrinking / decimation, but bad at enlarging.
cvResize(imageGrey, imageProcessed, CV_INTER_LINEAR);

// Give the image a standard brightness and contrast.
cvEqualizeHist(imageProcessed, imageProcessed);

…..  Use ‘imageProcessed’ for Face Recognition ….

if (imageGrey)
cvReleaseImage(&imageGrey);
if (imageProcessed)
cvReleaseImage(&imageProcessed);
Now that you have a pre-processed facial image

2.4. Eigenfaces be used for Face Recognition

Now that you have a pre-processed facial image, you can perform Eigenfaces (PCA) for Face Recognition. OpenCV comes with the function “cvEigenDecomposite()”, which performs the PCA operation, however you need a database (training set) of images for it to know how to recognize each of your people.

So you should collect a group of preprocessed facial images of each person you want to recognize. For example, if you want to recognize someone from a class of 10 students, then you could store 20 photos of each person, for a total of 200 preprocessed facial images of the same size (say 100×100 pixels).

Use “Principal Component Analysis” to convert all your 200 training images into a set of “Eigenfaces” that represent the main differences between the training images. First it will find the “average face image” of your images by getting the mean value of each pixel. Then the eigenfaces are calculated in comparison to this average face, where the first eigenface is the most dominant face differences, and the second eigenface is the second most dominant face differences, and so on, until you have about 50 eigenfaces that represent most of the differences in all the training set images.

In these example images above you can see the average face and the first and last eigenfaces that were generated from a collection of 30 images each of 4 people. Notice that the average face will show the smooth face structure of a generic person, the first few eigenfaces will show some dominant features of faces, and the last eigenfaces (eg: Eigenface 119) are mainly image noise. You can see the first 32 eigenfaces in the image below.
*** To explain Eigenfaces (Principal Component Analysis) in simple terms, Eigenfaces figures out the main differences between all the training images, and then how to represent each training image using a combination of those differences.

So for example, one of the training images might be made up of:
(averageFace) + (13.5% of eigenface0) – (34.3% of eigenface1) + (4.7% of eigenface2) + … + (0.0% of eigenface199).
Once it has figured this out, it can think of that training image as the 200 ratios:
{13.5, -34.3, 4.7, …, 0.0}.

It is indeed possible to generate the training image back from the 200 ratios by multiplying the ratios with the eigenface images, and adding the average face. But since many of the last eigenfaces will be image noise or wont contribute much to the image, this list of ratios can be reduced to just the most dominant ones, such as the first 30 numbers, without effecting the image quality much. So now it’s possible to represent all 200 training images using just 30 eigenface images, the average face image, and a list of 30 ratios for each of the 200 training images.

Interestingly, this means that we have found a way to compress the 200 images into just 31 images plus a bit of extra data, without loosing much image quality. But this tutorial is about face recognition, not image compression, so we will ignore that 🙂

o recognize a person in a new image, it can apply the same PCA calculations to find 200 ratios for representing the input image using the same 200 eigenfaces. And once again it can just keep the first 30 ratios and ignore the rest as they are less important. It can then search through its list of ratios for each of its 20 known people in its database, to see who has their top 30 ratios that are most similar to the 30 ratios for the input image. This is basically a method of checking which training image is most similar to the input image, out of the whole 200 training images that were supplied.

2.5. Implementing Offline Training

For implementation of offline training, where files are used as input and output through the command-line, I am using a similar method as the Face Recognition with Eigenface implementation in Servo Magazine, so you should read that article first, but I have made a few slight changes.

Basically, to create a facerec database from training images, you create a text file that lists the image files and which person each image file represents. For example, you could put this into a text file called “4_images_of_2_people.txt”:
1 Shervin dataShervinShervin1.bmp
1 Shervin dataShervinShervin2.bmp
1 Shervin dataShervinShervin3.bmp
1 Shervin dataShervinShervin4.bmp
2 Chandan dataChandanChandan1.bmp
2 Chandan dataChandanChandan2.bmp
2 Chandan dataChandanChandan3.bmp
2 Chandan dataChandanChandan4.bmp

This will tell the program that person 1 is named “Shervin”, and the 4 preprocessed facial photos of Shervin are in the “dataShervin” folder, and person 2 is called “Chandan” with 4 images in the “dataChandan” folder. The program can then loaded them all into an array of images using the function “loadFaceImgArray()”. Note that for simplicity, it doesn’t allow spaces or special characters in the person’s name, so you might want to enable this, or replace spaces in a person’s name with underscores (such as Shervin_Emami).

To create the database from these loaded images, you use OpenCV’s “cvCalcEigenObjects()” and “cvEigenDecomposite()” functions, eg:
// Tell PCA to quit when it has enough eigenfaces.
CvTermCriteria calcLimit = cvTermCriteria( CV_TERMCRIT_ITER, nEigens, 1);

// Compute average image, eigenvectors (eigenfaces) and eigenvalues (ratios).
cvCalcEigenObjects(nTrainFaces, (void*)faceImgArr, (void*)eigenVectArr,
CV_EIGOBJ_NO_CALLBACK, 0, 0, &calcLimit,
pAvgTrainImg, eigenValMat->data.fl);

// Normalize the matrix of eigenvalues.
cvNormalize(eigenValMat, eigenValMat, 1, 0, CV_L1, 0);

// Project each training image onto the PCA subspace.
CvMat projectedTrainFaceMat = cvCreateMat( nTrainFaces, nEigens, CV_32FC1 );
int offset = projectedTrainFaceMat->step / sizeof(float);
for(int i=0; i<nTrainFaces; i++) {
cvEigenDecomposite(faceImgArr[i], nEigens, eigenVectArr, 0, 0,
pAvgTrainImg, projectedTrainFaceMat->data.fl + i*offset);
}

You now have:
the average image “pAvgTrainImg”,
the array of eigenface images “eigenVectArr[]” (eg: 200 eigenfaces if you used nEigens=200 training images),
the matrix of eigenvalues (eigenface ratios) “projectedTrainFaceMat” of each training image.

These can now be stored into a file, which will be the face recognition database. The function “storeTrainingData()” in the code will store this data into the file “facedata.xml”, which can be reloaded anytime to recognize people that it has been trained for. There is also a function “storeEigenfaceImages()” in the code, to generate the images shown earlier, of the average face image to “out_averageImage.bmp” and eigenfaces to “out_eigenfaces.bmp”.

2.6 Implementing Offline Recognition

For implementation of the offline recognition stage, where the face recognition system will try to recognize who is the face in several photos from a list in a text file, I am also using an extension of the Face Recognition with Eigenface implementation in Servo Magazine.

The same sort of text file that is used for offline training can also be used for offline recognition. The text file lists the images that should be tested, as well as the correct person in that image. The program can then try to recognize who is in each photo, and check the correct value in the input file to see whether it was correct or not, for generating statistics of its own accuracy.

The implementation of the offline face recognition is almost the same as offline training:
The list of image files (preprocessed faces) and names are loaded into an array of images, from the text file that is now used for recognition testing (instead of training). This is performed in code by “loadFaceImgArray()”.
The average face, eigenfaces and eigenvalues (ratios) are loaded from the face recognition database file “facedata.xml”, by the function “loadTrainingData()”.
Each input image is projected onto the PCA subspace using the OpenCV function “cvEigenDecomposite()”, to see what ratio of eigenfaces is best for representing this input image.
But now that it has the eigenvalues (ratios of eigenface images) to represent the input image, it looks for the original training image that had the most similar ratios. This is done mathematically in the function “findNearestNeighbor()” using the “Euclidean Distance”, but basically it checks how similar the input image is to each training image, and finds the most similar one: the one with the least distance in Euclidean Space. As mentioned in the Servo Magazine article, you might get better results if you use the Mahalanobis space (define USE_MAHALANOBIS_DISTANCE in the code).
The distance between the input image and most similar training image is used to determine the “confidence” value, to be used as a guide of whether someone was actually recognized or not. A confidence of 1.0 would mean a good match, and a confidence of 0.0 or negative would mean a bad match. But beware that the confidence formula I use in the code is just an extremely basic confidence metric that isn’t reliable, so if you need something more reliable you should look for “Face Verification” algorithms. If you find that it gives misleading values for your images, you should ignore it or disable it in the code (eg: set the confidence always to 1.0).

Once it knows which training image is most similar to the input image, and assuming the confidence value is not too low (it should be atleast 0.6 or higher), then it has figured out who that person is, in other words, it has recognized that person!

2.7 mprove the Face Recognition accuracy

To improve the recognition performance, there are MANY things that can be improved here (look at commercial Face Recognition systems such as SPOTR for examples), and some improvements can be fairly easy to implement. For example, you could add color processing, edge detection, etc.

You can usually improve the face recognition accuracy by using more input images, atleast 50 per person, by taking more photos of each person, particularly from different angles and lighting conditions. If you cant take more photos, there are several simple techniques you could use to obtain more training images, by generating new images from your existing ones:
You could create mirror copies of your facial images, so that you will have twice as many training images and it wont have a bias towards left or right.
You could translate or resize or rotate your facial images slightly to produce many alternative images for training, so that it will be less sensitive to exact conditions.
You could add image noise to have more training images that improve the tolerance to noise.

Remember that it is important to have a lot of variation of conditions for each person, so that the classifier will be able to recognize the person in different lighting conditions and positions, instead of looking for specific conditions. But it’s also important to make sure that a set of images for a person is not too varied, such as if you rotated some images by 90 degrees. This would make the classifier to be too generic and also give very bad results, so if you think you will have a set of images with too much variance (such as rotation more than 20 degrees), then you could create separate sets of training images for each person. For example, you could train a classifier to recognize “John_Facing_Forward” and another one for “John_Facing_Left” and other ones “Mary_Facing_Forward”, “Mary_Facing_Left”, etc. Then each classifier can have a lot of variance but not too much, and you simply need to associate the different classifiers for each person with that one person (ie: “John” or “Mary”).

That’s why you can often get very bad results if you don’t use good preprocessing on your images. As I mentioned earlier, Histogram Equalization is a very basic image preprocessing method that can make things worse in some situations, so you will probably have to combine several different methods until you get decent results.

That’s why face recognition is relatively easy to do in realtime if you are training on someone and then instantly trying to recognize them after, since it will be the same camera, and background will be the same, their expressions will be almost the same, the lighting will be the same, and the direction you are viewing them from will be the same. So you will often get good recognition results at that moment. But once you try to recognize them from a different direction or from a different room or outside or on a different time of the day, it will often give bad results!

Alternative techniques to Eigenfaces for Face Recognition:
Something you should know is that Eigenfaces is considered the simplest method of accurate face recognition, but many other (much more complicated) methods or combinations of multiple methods are slightly more accurate. So if you have tried the hints above for improving your training database and preprocessing but you still need more accuracy, you will probably need to learn some more complicated methods, or for example you could figure out how to combine separate Eigenface models for the eyes, nose ; mouth.

  • 3. 人脸识别的ROS实现

3.1. 源代码安装face_recognition
$ cd ~/catkin_ws/src
$ git clone https://github.com/procrob/procrob_functional.git –branch catkin
$ cd ~/catkin_ws
$ catkin_make
$ source ~/catkin_ws/devel/setup.bash

3.2. 流程
训练图片: Training images are stored in the data directory.
训练图片列表: Training images are listed in the train.text file.

The ‘train.txt’ follows a specific format which is best understood by looking at the example train.txt file provided in the package. Note that person numbers start from 1, and spaces or special characters are not allowed in persons’ names.

The program trains from the training examples listed in the train.txt, and create an Eigenfaces database which is stored in the ‘facedata.xml’.

Face detection is performed using a haarcascade classifier ‘haarcascade_frontalface_alt.xml’. The ‘data’ folder and ‘train.txt’, ‘facedata.xml’ and ‘haarcascade_frontalface_alt.xml’ files should be placed in the program’s working directory (the directory from which you execute the program).

When the face_recognition program starts: 1) If facedata.xml exists, the Eigenfaces database is loaded from facedata.xml. 2) If facedata.xml does not exist, the program tries to train and create Eigenfaces database from the training images listed in train.txt. Regardless of if the Eigenfaces database is loaded/created at start up or not, you can always add training images directly from the video stream and then update the Eigenfaces database by (re)training. Note: when the program (re)trains, the content of facedata.xml is disregarded and the program trains only based on the training images listed in train.txt.

3.3. 示例
For demonstration purposes an actionlib client example for the face_recognition simple actionlib server has been provided
The client subscribes to face_recognition/FRClientGoal messages. Each FRClientGoal message contains an ‘order_id’ and an ‘order_argument’ which specify a goal to be executed by the face_recognition server.

After receiving a message, the client sends the corresponding goal to the server. By registering relevant call back functions, the client receives feedback and result information from the execution of goals in the server and prints such information on the terminal.

3.3.1 发布视频流到主题/camera/image_raw.
For example you can use usb_cam to publish images from your web cam ,这个launch需要自己写:
$ roslaunch usb_cam usb_cam-zdh.launch
now add one node: /usb_cam_node and several topics.
can check it:
$ rosrun image_view image_view image:=/camera/image_raw.
可能需要安装驱动:
$ sudo apt-get install ros-indigo-usb-cam

3.3.2 启动Fserver
Fserver is a ROS node that provides a simple actionlib server interface for performing different face recognition functionalities in video stream.
Start the Fserver node:
$ cd /home/dehaou1404/catkin_ws/src/procrob_functional
$ rosrun face_recognition Fserver

* THE server subscribed topic:
— /camera/image_raw – video stream (standard ROS image transport)

* THE server involve parameters:
confidence_value (double, default = 0.88)
— add_face_number (int, default = 25)

3.3.3 启动FClient
Fclient is a ROS node that implements an actionlib client example for the face_recognition simple actionlib server (i.e. ‘Fserver’). ‘Fclient’ is provided for demonstration and testing purposes.

Each FRClientGoal message has an ‘order_id’ and an ‘order_argument’, which specify a goal to be executed by the Fserver.
After receiving a message, Fclient sends the corresponding goal to the Fserver. By registering relevant call back functions, Fclient receives feedback and result information from the execution of goals in the Fserver and prints such information on its terminal.

Run the face recognition client:
$ cd /home/dehaou1404/catkin_ws/src/procrob_functional
$ rosrun face_recognition Fclient
* THE client subscribe the topic:
— fr_order (face_recognition/FRClientGoal)

3.3.4. 训练和识别
Publish messages on topic /fr_order, to test different face recognition functionalities.
NOTICE: notice the info printed from client terminal after each command

3.3.4.1. 采集训练图片。
Acquire training images for one face, notice this one should try to appear in the video stream.
$ rostopic pub -1 /fr_order face_recognition/FRClientGoal — 2 “dehaoZhang”
in the server…:
[ INFO] [1483429069.013208800]: No face was detected in the last frame
[ INFO] [1483429069.266604135]: No face was detected in the last frame
[ INFO] [1483429069.512622293]: No face was detected in the last frame
[ INFO] [1483429069.728653884]: Storing the current face of ‘dehaoZhang’ into image ‘data/6_dehaoZhang1.pgm’.
[ INFO] [1483429075.494751180]: Storing the current face of ‘dehaoZhang’ into image ‘data/6_dehaoZhang24.pgm’.
[ INFO] [1483429075.745061728]: Storing the current face of ‘dehaoZhang’ into image ‘data/6_dehaoZhang25.pgm’.
in the client…:
[ INFO] [1483429043.355890495]: request for sending goal [2] is received
[ INFO] [1483429043.356973474]: Goal just went active
[ INFO] [1483429069.732158190]: Received feedback from Goal [2]
[ INFO] [1483429069.732210837]: A picture of dehaoZhang was successfully added to the training images
[ INFO] [1483429069.980983035]: Received feedback from Goal [2]
[ INFO] [1483429069.981020038]: A picture of dehaoZhang was successfully added to the training images
[ INFO] [1483429075.496298334]: Received feedback from Goal [2]
[ INFO] [1483429075.496374189]: A picture of dehaoZhang was successfully added to the training images
[ INFO] [1483429075.746450499]: Goal [2] Finished in state [SUCCEEDED]
[ INFO] [1483429075.746538156]: Pictures of dehaoZhang were successfully added to the training images

3.3.4.2. 训练样本集。
Retrain and update the database, so that you can be recognized
$ rostopic pub -1 /fr_order face_recognition/FRClientGoal — 3 “none”
in server…:
[ INFO] [1483429419.101123133]: People:
[ INFO] [1483429419.101157041]:
[ INFO] [1483429419.101187136]: ,
[ INFO] [1483429419.101213218]: ,
[ INFO] [1483429419.101241996]: ,
[ INFO] [1483429419.101268921]: ,
[ INFO] [1483429419.101300335]: ,
[ INFO] [1483429419.101334005]: .
[ INFO] [1483429419.101375442]: Got 150 training images.
in client…:
[[5~[ INFO] [1483429418.947685612]: request for sending goal [3] is received
[ INFO] [1483429418.948517146]: Goal just went active
[ INFO] [1483429421.776359616]: Goal [3] Finished in state [SUCCEEDED]

3.3.4.3. 识别
Recognize faces continuously. This would not stop until you preempt or cancel the goal. So lets preempt it by sending the next goal.
$ rostopic pub -1 /fr_order face_recognition/FRClientGoal — 1 “none”

3.3.4.4. 退出
$ rostopic pub -1 /fr_order face_recognition/FRClientGoal — 4 “none”

3.4. 参数和消息 Param and message
This message includes 2 fields:
— int order_id
— string order_argument

The FaceRecognitionGoal message has 2 fields: ‘order_id’ is an integer specifying a goal, ‘order_argument’ is a string used to specify an argument for the goal if necessary:

order_id = 2, then order_argument = person_name, =(Add face images)
Goal is to acquire training images for a NEW person. The video stream is processed for detecting a face which is saved and used as a training image for the new person. This process is continued until the desired number of training images for the new person is acquired. The name of the new person is provided as “order_argument”

order_id = 3, without args, =(Train)
The database is (re)trained from the training images

order_id = 0, without, =(Recognize Once)
Goal is to acknowledge the first face recognized in the video stream. When the first face is recognized with a confidence value higher than the desirable confidence threshold, the name of the person and the confidence value are sent back to the client as result.
order_id = 1, without, =(Recognize Continuously)
Goal is to continuously recognize faces in the video stream. Every face recognized with confidence value higher than the desirable confidence threshold and its confidence value are sent back to the client as feedback. This goal is persuaded for infinite time until it is canceled or preempted by another goal.

order_id = 4, without, =(Exit)
The program exits.

  • 4. 基于RGBD-camera的人脸识别

这个是RGBD相机,不同于常用webcam,增加了深度信息。
This package contains software for detecting heads and faces and recognizing people. Head and face detection utilize the Viola-Jones classifier on depth or color images, respectively. Face recognition of previously learned people is based on either Eigenfaces or Fisherfaces. The recognition method can be configured and trained with a simple interface as explained in the next section.

4.1. 安装cob_people_perception
$ roscd
$ cd ../src
$ git clone https://github.com/ipa-rmb/cob_people_perception.git
$ git clone https://github.com/ipa-rmb/cob_perception_common.git
$ cd ..
$ source ./devel/setup.bash
$ rosdep install –from-path src/ -y -i
$ catkin_make -DCMAKE_BUILD_TYPE=”Release”

4.2. 流程
Then install Openni (http://wiki.ros.org/openni_launch) and start the Openni driver (old Kinect, Asus) with
$ roslaunch openni_launch openni.launch
or the Openni2 driver (new Asus, http://wiki.ros.org/openni2_launch) with
$ roslaunch openni2_launch openni2.launch
or any other driver according to your used sensor.

When using the openni or openni2 driver, please ensure that depth_registration is activated (e.g. by using rosrun rqt_reconfigure rqt_reconfigure). Also check that the camera_namespace argument and the camera topics colorimage_in_topic and pointcloud_rgb_in_topic, which are set in the ros/launch/people_detection.launch file, correspond to the topic names of your camera driver.

4.3. 示例
4.3.1 Then launch people detection
$ roslaunch cob_people_detection people_detection.launch
or with
$ roslaunch cob_people_detection people_detection.launch using_nodelets:=true
The second version uses nodelets for the first stages of data processing which might yield a substantially better runtime if your processor is fast enough on single core usage.

Now a window should pop up and present you with the current image of the camera. Heads will be framed with a light blue rectangle and detected faces are indicated in light green. For your convenience, the package contains a client for easy usage of people detection. Start the client with
$ rosrun cob_people_detection people_detection_client

4.3.2 functions
No identification data will be available the first time you start the people detection node on your computer. To record some data adjust the frame rate of the camera, first, by choosing 5 – activate/deactivate sensor message gateway in the client and then enter 1 to activate the sensor message gateway. The frame rate should be chosen somewhere between 5 and 30 Hz depending on your computer’s power.
Choose an option:
1 – capture face images
2 – update database labels
3 – delete database entries
4 – load recognition model (necessary if new images/persons were added to the database)
>> 5 – activate/deactivate sensor message gateway <<
6 – get detections
q – Quit

Type 1 to activate or 2 to deactivate the sensor message gateway: 1
At which target frame rate (Hz) shall the sensor message gateway operate: 20
Gateway successfully opened.

Now select 1 – capture face images from the menu of the client and enter the name of the first person to capture. Please do not use any whitespaces in the name. Following, you are asked to select how to record the data: manually by pressing a button or automatically. In the manual mode, you have to press c each time an image shall be captured and q to finish recording. Make sure that only one person is in the image during recording, otherwise no data will be accepted because the matching between face and label would be ambiguous.

Choose an option:
>> 1 – capture face images <<
2 – update database labels
3 – delete database entries
4 – load recognition model (necessary if new images/persons were added to the database)
5 – activate/deactivate sensor message gateway
6 – get detections
q – Quit

Input the label of the captured person: ChuckNorris
Mode of data capture: 0=manual, 1=continuous: 0
Recording job was sent to the server …
Waiting for the capture service to become available …
[ INFO] [1345100028.962812337]: waitForService: Service [/cob_people_detection/face_capture/capture_image] has not been advertised, waiting…
[ INFO] [1345100028.985320699]: waitForService: Service [/cob_people_detection/face_capture/capture_image] is now available.
Hit ‘q’ key to quit or ‘c’ key to capture an image.
Image capture initiated …
image 1 successfully captured.
Image capture initiated …
image 2 successfully captured.
Image capture initiated …
image 3 successfully captured.
Image capture initiated …
image 4 successfully captured.
Finishing recording …
Data recording finished successfully.
Current State: SUCCEEDED   Message: Manual capture finished successfully.

If you are using one of the more modern recognition methods (choices 2=LDA2D or 3=PCA2D, can be set in file cob_people_detection/ros/launch/face_recognizer_params.yaml as parameter recognition_method) then please be aware that they require at least two different people in the training data. The next step, building a recognition model, will not start with only one person available with these algorithms. If you quit the program before recording a second person, the program might not start anymore. Then please delete all data from ~/.ros/cob_people_detection/files/training_data, start the program and record two people.

After training, you need to build a recognition model with the new data. To do so, just select 4 – load recognition model from the client’s menu. In the following prompt you can either list all persons that you captured and that shall be recognized by the system, e.g. by typing

Choose an option:
1 – capture face images
2 – update database labels
3 – delete database entries
>> 4 – load recognition model (necessary if new images/persons were added to
>> the database) <<
5 – activate/deactivate sensor message gateway
6 – get detections
q – Quit

Enter the labels that should occur in the recognition model. By sending an empty list, all available data will be used.
Enter label (finish by entering ‘q’): ChuckNorris
Enter label (finish by entering ‘q’): OlliKahn
Enter label (finish by entering ‘q’): q
Recognition model is loaded by the server …

A new recognition model is currently loaded or generated by the server. The following labels will be covered:
– ChuckNorris
– OlliKahn
The new recognition model has been successfully loaded.
Current State: SUCCEEDED   Message: Model successfully loaded.
or you can directly enter q and all available persons will be trained.

After all steps succeeded, you can watch the recognition results in the display. Although you may access all training data functions (update, delete, etc.) from the client directly you may also access the files directly, which are located in your home folder at ~/.ros/cob_people_detection/files/training_data.

智能机器人(43):ROS视觉

7、视觉

7.1 首先测试,连接Kinect之后:
$ roslaunch openni_launch openni.launch
即可查看视频:
$ rosrun image_view image_view image:=/camera/rgb/image_color
7.2、自建测试包
$ cd ~/catkin_ws/src
$ catkin_create_pkg mysample_opencv sensor_msgs cv_bridge rospy std_msgs
$ catkin_make
okay.
$ mkdir ./scripts
$ mkdir ./launch
7.3、测试depth image
OpenCV2可以无缝链接到Python,先启动OpenNI驱动以获取点云图:
$ roslaunch openni_launch openni.launch
启动脚本:
$ python ~/catkin_ws/src/mysample_opencv/scripts/cv_bridge_demo.py
如图。

NOTE: 桥接poencv和ros-python的cv_bridge_demo.py脚本:
#!/usr/bin/env python
import rospy
import sys
import cv2
import cv2.cv as cv
from sensor_msgs.msg import Image, CameraInfo
from cv_bridge import CvBridge, CvBridgeError
import numpy as np
class cvBridgeDemo():
def __init__(self):
def image_callback(self, ros_image):
def depth_callback(self, ros_image):
def main(args):
try:
cvBridgeDemo()
rospy.spin()
except KeyboardInterrupt:
print “shutting down main”
cv.DestroyAllWindows()
if __name__ == ‘__main__’:
main(sys.argv)

7.3、测试point cloud
PCL支持OpenNI的3D接口,可以在Riz图形化。先启动OpenNI驱动以获取点云图:
$ roslaunch openni_launch openni.launch
然后可视化显示:
$ rosrun rviz rviz
启动后把fixed frame设置为camera_link,添加一个PointCloud2显示选项,把topic设置为/camera/depth/points,即可看到点云图,再把ColorTransformer设置为AxisColor从而近处红色远处紫色蓝色,如图。

7.4、测试laser scan
先启动OpenNI驱动以获取点云图:
$ roslaunch openni_launch openni.launch
然后启动depthimage到laserscan的转换:
$ roslaunch mysample_openvc a.launch
最后可视化显示:
$ rosrun rviz rviz
启动后把fixed frame设置为camera_link,添加一个laserscan显示选项,把topic设置为/scan,即可看到激光扫描,图。
NOTE: 变换depthimage到laserscan的a.launch脚本:

args=”load depthimage_to_laserscan/DepthImageToLaserScanNodelet lasersc$

智能机器人(42):IP-camera

一、GStreamer
二、gst-launch
三、在linux环境使用ip-camera
四、在ros环境使用ip-camera

  • 一、GStreamer

GStreamer是个建立音视频管道的工具包,例如stream一个媒体文件到互联网,再例如把一个test.avi文件stream到一个V4L的camera。V4L的capture主要是用:
1–gst-launch-1.0 for capturing video
2–FFMpeg for saving and editing video
3–v4l-ctl for controlling your video card
4–mpv for viewing videos
5–gst-inspect for listing elements

  • 二、gst-launch

操作GStreamer主要用gst-launch这个CLI,这个命令行工具可以创建pipeline、初始化、然后运行,例如可以把一个avi媒体文件模拟成摄像头,把ip-camera回环成一个usb-camera,等等。
1–例如最简单的:
$ gst-launch-1.0 fakesrc ! fakesink
这样建立了一条simplest pipline,实现connects a single (fake) source to a single (fake) sink。
2–例如记录视频,create an AVI file with raw video and no audio:
$ gst-launch-1.0 v4l2src device=$VIDEO_DEVICE ! $VIDEO_CAPABILITIES ! avimux ! filesink location=test.avi
3–例如记录音频,create an AVI file with raw audio and no video:
$ gst-launch-1.0 $alsasrc device=$AUDIO_DEVICE ! $AUDIO_CAPABILITIES ! avimux ! filesink location=test.avi
4–记录音频和视频:
$ gst-launch-1.0 v4l2src device=$VIDEO_DEVICE ! $VIDEO_CAPABILITIES ! mux. alsasrc device=$AUDIO_DEVICE ! $AUDIO_CAPABILITIES ! mux. avimux name=mux ! filesink location=test-$.avi
This pipe has three parts:a video source leading to a named element (! name. with a full stop means “pipe to the name element”) an audio source leading to the same element a named muxer element leading to a file sink
5–把一个source分为多个out:
$ gst-launch-1.0 v4l2src device=$VIDEO_DEVICE ! $VIDEO_CAPABILITIES ! avimux ! tee name=network ! filesink location=test.avi tcpclientsink host=127.0.0.1 port=5678
This sends your stream to a file (filesink) and out over the network (tcpclientsink). To make this work, you’ll need another program listening on the specified port (e.g. nc -l 127.0.0.1 -p 5678).
6–从相机查看图片:
$ gst-launch-0.10 v4l2src do-timestamp=true device=$VIDEO_DEVICE ! video/x-raw-yuv,format=(fourcc)UYVY,width=320,height=240 ! ffmpegcolorspace ! autovideosink
7–其它选项
gst-launch -v –gst-debug-level=3,输出的调试信息会多一些。
IP camera, gstreamer and virtual video devices,指明画幅参数。

  • 三、linux环境使用IP相机

1、安装
1.1、包
$ sudo apt-get install v4l2loopback-dkms
1.2、源码
如果失败apt-get method may fail durning install. some wranings were treated as errors. Not sure if it’s a compiler compatibility issue.
$ sudo su
$ git clone https://github.com/umlaeute/v4l2loopback.git
$ cd v4l2loopback
$ make && make install
(是否需要sudo忘记了)
2、加载模块
2.1、加载
$ sudo modprobe v4l2loopback
2.2、如果提示not found
$ sudo depmod v4l2loopback
$ sudo modprobe v4l2loopback
2.3、如果还失败
$ sudo apt-get install linux-generic<——–maybe freeze you kernel
$ sudo apt-get install v4l2loopback-dkms
$ sudo depmod v4l2loopback
$ sudo modprobe v4l2loopback<——————
or
$ sudo modprobe v4l2loopback video_nr=7 3、验证
$ v4l2-ctl –list-devices
会出来虚拟设备,这里是video1,我原有的usb-camera是video0是。
Dummy video device (0x0000) (platform:v4l2loopback-000):
/dev/video*
同样也可以
$ dmesg v4l2loopback | grep v4l2
如果出现libv4l2: error getting pixformat: Invalid argument
or:
v4l2loopback: module verification failed: signature and/or required key missing – tainting kernel
美观它。

4、建立媒体源
the simpest is:
$ gst-launch v4l2src device=/dev/video0 ! v4l2sink device=/dev/video1
4.1、接收流媒体是:
$ gst-launch-0.10 udpsrc port=1234 ! theoradec ! ffmpegcolorspace ! ximagesink
发送流媒体是:
$ gst-launch-0.10 v4l2src ! ffmpegcolorspace ! theoraenc ! udpsink host=127.0.0.1 port=1234
注意先执行接收的。
4.2、显示或保存
显示视频:
$ gst-launch v4l2src ! xvimagesink
视频保存成文件:
$ gst-launch v4l2src ! video/x-raw-yuv,width=320,height=240 ! ffmpegcolorspace ! jpegenc ! avimux ! filesink location=osug-1.avi
4.3、某段音频伪装成相机video1:
$ gst-launch videotestsrc ! v4l2sink device=/dev/video1
查看下,ok:
$ cheese –device=/dev/video1
相机0伪装成相机1:
$ gst-launch v4l2src device=/dev/video0 ! v4l2sink device=/dev/video1
查看下,ok:
$ cheese –device=/dev/video1
文件伪装成相机1:
$ gst-launch filesrc location=”/home/zdh991/tmp/osug-1.avi” ! avidemux ! v4l2sink device=/dev/video1
查看下,ok:
$ cheese –device=/dev/video1
IP相机伪装成相机1:
$ gst-launch souphttpsrc location=http://192.168.1.126:80 ! jpegdec ! ffmpegcolorspace ! v4l2sink device=/dev/video1
查看下,ok:
$ cheese –device=/dev/video1

5、利用iphone模拟测试
IP camera, gstreamer, and virtual video devices.
如果没有IP-camera,可以使用iphone的摄像头。
安装一个app例如third eye,让iphone接入局域网分配。
在iphone获取到ip地址后,从pc上测试通过浏览器输入iphone-ipaddress:8091,如果可以实时显示iphone摄像头的视频,说明okay。

利用gat-launch操作iphone的这个ip:port把这个摄像头作为wifi的ip-camera来使用:
$ gst-launch souphttpsrc location=”http://172.20.10.11:8091″ ! jpegdec ! ffmpegcolorspace ! v4l2sink device=/dev/video1
这里面172.20.10.11:8091就是IP-camera的地址和端口。注意pc和iphone在同一subnet。

  • 四、ROS环境使用IP-camera

ros提供gscam这个node,可以用image view查看,例如:
$ gst-launch souphttpsrc location=http://[user]:[password]@[camera_ip]/mjpg/video.mjpg ! jpegdec ! v4l2sink device=/dev/video0
$ GSCAM_CONFIG=”rtspsrc location=rtsp://CameraIP/ipcam.sdp ! video/x-raw-rgb,framerate=30/1 ! ffmpegcolorspace”
$ rosrun image_view image_view image:=/gscam/image_raw
使用gscam也可以配合rosbridge的websocket,例如wss://ip:port。

  • A、gst-launch备忘

这个工具只能建立简单地pipeline。尤其是它只在特定层级之上模拟pipeline和应用的交互。可以很简单的快速测试pipeline。注意gst-launch主要是一个调试工具,真正用gst_parse_launch()这个API来创建pipeline。

gst-launch的命令行包括一个在PIPELINE-DESCRIPTION之后的一系列选项。简单说,一个PIPELINE-DESCRPTION是一系列用!分隔开的元素:
$ gst-launch-0.10 videotestsrc ! ffmpegcolorspace ! autovideosink
这用videotestsrc,ffmpegcolorspace和autovideosink三个element。GStreamer会把他们的输出pad和输入pad连接起来,如果存在超过1个可用的输入/输出pad,那么就用pad的Caps来确定兼容的pad。

element
element可能是有属性的,在命令行里格式就是“属性=值”,多个属性用空格来分开。可以用gst-inspect工具来查一下element的属性。
$ gst-launch-0.10 videotestsrc pattern=11 ! ffmpegcolorspace ! autovideosink
element可以用name这个属性来设置名称,这样一些复杂的包含分支的pipeline可以创建了。有了名字,就可以使用前面创建的element,这在使用有多个pad的element(比如demuxer或者tee等)时是必不可少的。
$ gst-launch-0.10 videotestsrc ! ffmpegcolorspace ! tee name=t ! queue ! autovideosink t. ! queue ! autovideosink
这把videotestsrc先连接了ffmpegcolorspace,然后连接了tee element 这个tee就被命名成‘t’,然后一路输出到queue以及autovideosink,另一路输出到另一个queue和autovideosink。

Pads
在连接两个element时与其让GStreamer来选择哪个Pad,宁可直接指定Pad。可以在命名element后使用.+pad名字的方法来做到这点(element必须先命名)。同样可以用gst-inspect来查看element里面pad的名字。
$ gst-launch-0.10.exe souphttpsrc location=http://docs.gstreamer.com/media/sintel_trailer-480p.webm ! matroskademux name=d d.video_00 ! matroskamux ! filesink location=sintel_video.mkv
这使用souphttpsrc在internet上锁定了一个媒体文件,这个文件是webm格式的。可以用matroskademux来打开这个文件,因为媒体包含音频和视频,所以创建了两个输出Pad,名字分别是video_00和audio_00。把video_00和matroskamux element连接起来,把视频流重新打包,最后连接到filesink,这样就把流存到了一个名叫intel_video.mkv的文件。总之找了一个webm文件,去掉了声音,仅把视频拿出来存成了一个新文件。如果保持声音,那么就应该这样:
$ gst-launch-0.10.exe souphttpsrc location=http://docs.gstreamer.com/media/sintel_trailer-480p.webm ! matroskademux name=d d.audio_00 ! vorbisparse ! matroskamux ! filesink location=sintel_audio.mka
这里的vorbisparse element会从流里面取出一些信息,然后放到Pad的Caps里面,这样下一个element,也就是matroskamux就可以知道如何处理这个流了。这个处理在抓取视频的时候是不用做的,因为matroskademux已经做了这件事情。
注意上面两个例子中媒体没有被解码和播放,仅仅只是把数据搬动了一下而已。

Caps过滤
当一个element有不止一个pad时,连接下一个element可能是模糊不清的,因为下游的element可能有不止一个的兼容的输入pad,或者它的输入pad可以和所有的输出pad兼容。在这样的情况下,GStreamer会使用第一个可以连接的Pad,这样相当于说GStreamer是随机找一个pad来连接的。
$ gst-launch-0.10 souphttpsrc location=http://docs.gstreamer.com/media/sintel_trailer-480p.webm ! matroskademux ! filesink location=test
这里和上一个例用了同样的媒体文件和demuxer。finksink输入pad是任意格式,这意味着它可以接受所有的媒体格式。那么matroskademux的哪个pad可以接到filesink呢?video_00还是audio_00?无法知道。为了消除这种不确定性,前面例子中用了pad的名字的方法,这里使用Caps过滤的方法:
$ gst-launch-0.10 souphttpsrc location=http://docs.gstreamer.com/media/sintel_trailer-480p.webm ! matroskademux ! video/x-vp8 ! matroskamux ! filesink location=sintel_video.mkv
一个Caps过滤动作类似于让element不做任何动作,仅仅接受给出的Caps。在这个例子中,在matroskademux和matroskamux中间加入了一个ievideo/x-vp8的Caps过滤,这样就表明在matroskademux中我们仅仅需要能生成这种类型视频的输出Pad。
需要用gst-inspect工具来查看一个element能接受和生成的Caps,用gst-discoverer来查看文件里面包含的Caps。如果需要查看在pipeline里面一个element生成的Caps,在gst-launch里面使用-v参数即可。
一个调整视频比例的pipeline。videoscale element可以调整输入尺寸然后再输出。例子里面用Caps过滤设置了视频大小为320×200:
$ gst-launch-0.10 uridecodebin uri=http://docs.gstreamer.com/media/sintel_trailer-480p.webm ! queue ! videoscale ! video/x-raw-yuv,width=320,height=200 ! ffmpegcolorspace ! autovideosink