GMS - Global Monitoring System
An ounce of knowledge is worth more than a ton of data.
Overview
GMS is a fully automated system for discovering camera IoT devices based on
IP addresses. Sampling and scoring discovered devices utilizing object
recognition, and annotating camera content. Application is applying face
recognition to cameras found in the wild (Millions of cameras distributed world-wide).
The Problem
Standard approaches (i.e., Deep Learning, CNN, RNN, etc.) fail at this task, even though small children (ages 3-5) are already
capable to tell if an image contains a face, or not and point out its location. For more information on why PAC Theory and its derivatives will not cope with this and similar complex tasks see Towards a Practical Estimate of Training Sample Size. To demonstrate this, consider the following 11 images:











Previous Approaches
When applying face recognition API's from the following vendors:
- https://www.betafaceapi.com/demo.html
- https://facedetection.com/online-reverse-image-search
- http://www.elgom3a.com/post/online-face-detection
- https://how-old.net/
- https://www.twinsornot.net
- https://howhot.io
- https://www.faceplusplus.com/face-detection
- http://liuliu.me/ccv/js/nss
- http://www.codejungle.org/facedetection
- http://PicWiser.com
- http://facedetection.kuznech.com/
- http://www.facesearch.com
- https://skybiometry.com/demo/face-detect
- https://azure.microsoft.com/en-us/services/cognitive-services/face
- https://www.kairos.com/demos
- https://cmusatyalab.github.io/openface
- https://findface.pro/en/#free-demo
- http://www.bioenabletech.com/bioenable
- https://www.faceblind.org/facetests
- Google
- IBL Classifier
Just one of the first 19 vendor API's recognizes a face in a single image. All others fail without recognizing a face in a single image.
The important thing to note here is that these images are not exceptions, but that there are large amounts for which either false negatives or positives are to be expected (based on a representative sample for one US metropolitan area).
Why retraining existing models on these images in not an option
When dealing with the output of millions of cameras (not including Smartphones and cell phones), the number of images to test and possibly retrain is computationally not feasible.
Furthermore, a child at age 3-5 yrs only sees a very limited number of images of faces, but will very quickly attain the required proficiency for this cognitive task - i.e., deal with new unseen images (no retraining necessary).
Some more evaluations...
In the mean time, a couple more of the large international search engines were evaluated. Baidu, fairs no better than Google. For the first set of images (the child), the recommended similar images have little to nothing in common with the target images.
For the second set of images, Baidu search never completes and delivers no similar image suggestions.
Yandex fairs the best from all. For some of the images of the first type the search engine finds a few somewhat similar images. For the second set, it also fails. It appears, that the DoG algorithms used by Yandex image recognition engine provide
somewhat better results than the deep learning approaches espoused by Goggle, Baidu, etc. All search engines are nowhere near to act as reliable tools for annotating image content.
© , Steve Romaniuk.