- January 26, 2020
- Posted by: Web Team
- Category: Uncategorized
Photo by Seth Doyle on Unsplash
Part 2: Feasibility – Big Picture
Aerial is my code name for an image analysis service or business which uses a machine learning module for object recognition and detection on images that are taken from a very low altitude such as from a drone. The business of Aerial would be a consultancy of some sort that explores customer needs, creates a media pipeline, probably trains a neural network for custom object detection, does the actual object recognition, and creates reports and visualisations for the client.
This is the second part of a four part series where I discuss how I might develop an Aerial Image Analysis Service (or Business) with machine learning.
The first part did some investigation into the desirability of such a service, this part discusses the big picture feasibility, the next part will concentrate on machine learning options and the final part of commercial viability.
Even though we are not going into the details of the machine learning until the next article, this article is quite technical in its own way. With some experience or training in aspects of machine learning (but not necessarily programming) decisions makers would quickly come to similar conclusions to below.
Possible Modalities (with the help of Cairos Augusto)
Modalities is a catch all term that often describes how we are delivering something, in this case I am using it to describe what types of information we are gathering from our aerial vantage point and the type of analysis we do with it.
Image Analysis
The most obvious thing that we might do is take high resolution images or video (which we would analyse frame by frame anyway) and do some sort of object detection of what is in those frames. The “base level service” so to speak would get the media (how the client delivers that to us would need to be determined) review the images and annotate where on a large scale image/map where those objects were detected. Possibly the analysis would create structured data from the analysis, and further visualisations to help communicate what was found.
Real time detection
If we have control of the drone and with the addition of drone sensory data such as an infrared camera and GPS, and assuming we can transmit cost effectively to the analysis pipeline, we could detect for example animals (or other animate things like dust, water, fire) in real time. Imagine that you have some cattle to count, you may be able to identify each of the animals, put a virtual code or brand on them, using the GPS location of the object and then track the animals. This sort of data is enhanced because we have the velocity, position and height of the drone. We can plot the animal data perhaps on a live map or image – perhaps via an avatar which is uniquely identified for each unique animal. There are challenges like the identification of seperate animals from two animals that are close together or that look very similar, but we could potentially use standard machine learning analysis (eg using size, gait, colouring) to distinguish individuals. This live image could also be the client’s interface, such they could search for specific animals or objects, and retrace their paths (“the fox did go into the henhouse!!”). We could potentially place triggers on items within the live image – for example animals, people, or other moving things that cross boundaries of some sort (“Phar lap’s cousin just jumped the fence!”).
Larger areas and temporal analysis
To analyse larger areas over lengths of time (probably not live images images for really big areas initially) we can use a mixture of machine learning and traditional photogrammetry to piece together large amounts of different landscape data, not only what we are trying to detect but including topographical and other data. Using the GPS and inertial system in the drone, plus a few reference points on ground, and potentially machine learning, we can create a mosaic of images both across a wide area and over long temporal periods. The creation of these mosaics of sensory data allows us to analyse one single large image over a specified space and time. It’s likely for some analyses the large image would be used – but for other analyses (eg machine learning) we would need to break up the original images into even smaller images. The report to the client may, rather than be an image mosaic of the original images, be a data mosaic of everything that is happening, perhaps constructed image showing topography, landforms and the spatial and temporal flow of objects (animals, vehicles, water, whatever) over the landscape.
General thoughts influencing feasibility
The client needs
Every client will have different needs, they might be all recognising objects or changes in the landscape , but those objects will differ between clients and will have specific temporal and spatial requirements. What is done once images are analysed is important too. Does the client want the images to be analysed in real time? Or do they need to be aggregated over time and space, and assimilated into other data such as weather, movements of goods etc?
Fragile models for object recognition
Deep Learning models are often very specific to the type of images they have been trained and tested on. For example a model trained on conifer trees might have difficulty recognising gum trees, or a model trained to find small water courses in one area of the world might not be able to discover them in another part of the world because of different vegetation, rock formations etc .
I think Jose Portilla says in one of his fantastic Udemy AI/ML courses is super important :
“…machine learning is not performed in a ‘vacuum’, but instead a collaborative process where we should consult with experts in the domain..”
By experts in the domain, in this case it’s our clients expertise that we would need to leverage.
Small data set
Although we can use prebuilt models models (public or private) previously trained and then further fine tune them with the clients specific images of objects that need detection, we might not have a lot of new images to train on.
Flexibility of approach
Sometimes we can’t recognise the real target because there aren’t enough images of it or it is hidden. We may need to use a number of correlated targets in our analysis pipeline – we may need to be quite creative how we extract insights from the images.
Flexibility of engagement
Every client is going to have a different way of storing or uploading or managing their images. Each client will therefore need a custom architecture and web portal to manage their images.
Repeatability
We may need to repeatedly analyse very similar images for example to track fauna, people or vehicles. The pipeline needs to be automated so as not have to manually compare analyses, but still find temporal insights.
Communicability
How to communicate the insights? Every client will likely have different deliverables, and so how the results are communicated needs customisation.
We might start with standard reports – augmented with visual and temporal elements – that is : annotated and live images, or even annotate different types of maps.
Insights, discoveries, detections might simply go into an online queryable database.
Concluding & “Why not automate the whole process”
The challenges above tell us that for a quicker and more accurate recognition the machine learning models need to be customised with the help of domain specific knowledge. We might need to do specific training for certain landscapes and images (see next blog).
Clients won’t have the same image needs, image storage requirements, nor the same reporting formats. Clients will have specific business needs that need to be replicated within a custom analysis & synthesis system.
In addition results, reports and conclusions are likely to go through an iterative analysis as we discover insights and unexpected discoveries and integrate them into our discovery process.
Thus to solve the clients’ problems we must work across many domains, images, image analysis, machine learning, mapping, landforms, and the actual business needs. It therefore seems likely that this aerial analysis process benefits more from a consultative and collaborative engagement rather than interaction with an automated, low human touch process. The close customer engagement and custom requirements will effect the viability of this service.