Listen to this story
David O Selznik once said, “The success of a production depends on the attention paid to detail”. We are in a phase where AI is pervasive and technology has revealed applications across various industries—from transportation to finance. Among the varied fields in AI, perhaps the most powerful, captivating and quietly pervasive is computer vision.
Why computer vision?
Computer vision focuses on simulating certain complexity of the human visual system so that computers can also recognize and analyse items in pictures and videos in a manner similar to humans. Computer vision can identify early indicators of rising demand and notify managers and other supply chain participants when they need to make additional product purchases.
Although there have been many such advancements, the majority of these proof-of-concept (PoC) experiments have not been applied in the real world. Why is that?
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
The PoC-to-Production gap is a situation when ML projects run into major obstacles and difficulties on their way to actual deployment. Some of the most fundamental challenges in computer vision include the need for enormous amounts of computation to carry out tasks like facial recognition or autonomous driving in real-time along with ways to extract and represent the vast amount of human experience within a computer system in a manner that makes retrieval simple.
Like any other product development, computer vision-based product development generally starts with a proof of concept. Often, the focus of PoC is inclined towards evaluating the algorithm on a sample dataset. This approach is understandable, considering the core of any computer vision solution is algorithm and data scientists want to validate the algorithm feasibility upfront. But, during the PoC, it is also equally important to realise how these algorithms are going to work in the production environment on real-life data. Unlike any other AI solution, computer vision solutions depend on several factors in production which need upfront attention, especially the infrastructure ecosystem—camera hardware, compute, network bandwidth, data pipelines, and more. If these are not accounted for during the early stages of the project, it certainly impacts the productisation of the journey.
Below are few of the key factors which influence productization efforts:
A typical computer vision solution uses either a security camera or a custom camera hardware, or images taken from a mobile phone, drones or even digital images as input.
It is important that these input sources are reliable and meet the algorithm requirement, especially the image quality, camera positioning, field of view, frame per second (fps) and more. In general, if it’s a green field type of project where we are installing new cameras exclusively for the solution, then, it’s relatively easier compared to a brown field project, where the solution must work on the existing ecosystem. Here, the challenges are making sure that the solution is compatible with the existing cameras, video sources, firewall limitation (if any), available network bandwidth and others.
While developing computer vision solutions, it is advisable to build application layers which can abstract the underlying camera hardware. This will provide lots of benefits during productisation, regardless of the type of project—green or brown field. Another important aspect of infrastructure is the compute. Establishing the compute requirements and strategy on where to run the algorithm—edge or data centre or cloud or far edge—should be part of the POC evaluation. General recommendation is that, if the use case demands real-time processing with a huge volume of streaming data, it’s always advisable to run the algorithm closer to the source—edge computing—or leveraging cloud is an option.
Credit: Swaroop Shivaram
In addition, as computer vision solutions are compute intensive it is important that the HW investment/cost is validated against the business returns (ROI) upfront to avoid any surprises during production transition.
A majority of computer vision algorithms today use deep learning neural networks which require a significant amount of training data to build a robust model. The ability to collect diverse datasets from various sources, storing this huge volume of data, selectively filtering the required dataset, timely curation/labelling of the dataset for model train is crucial to productise and scale any computer vision model. Following are a few things to keep in mind for a robust pipeline:
- Process necessary data: A typical security camera with decent quality video will generate nearly 3GB of data per day, but not all this data is useful for training the model. Ability to filter out relevant data and processing the same is key as part of the data management. Filtering data based on activity in the video, eliminating similar frames using traditional image processing techniques are few viable options to consider.
- Checkpoints: Computer vision data pipeline typically includes multiple stages, starting from data ingestion from multiple sources to pre-processing the ingested data to selectively filtering the data to curating the same, followed by model training. Each of these stages are sequential and critical. Therefore, the fundamental notion behind checkpoints is to avoid repeating the entire cycle in the event that one step fails. A pipeline’s distinct processes must be segregated so that they can each be activated independently in the event of a failure.
- Thorough documentation: This knowledge not only enables you to maintain the pipeline after leaving that company, but it also enables new members to redesign things as necessary.
Investing on building a robust data pipeline, where tools/techniques like active learning, model-assisted data labelling, dataset meta management, labelling tools, etc. will be handy. Companies may transform data into active intelligence that will help drive smarter decisions and simplify the bottom line by investing in strong data analytics pipelines. This also ensures data quality and integrity along with data classification, metadata management and lineage for data governance.
Many different activities might be involved in model deployment, but it always depends on how the business plans to use the model. Once the solution is deployed in production, we need a feedback loop to ensure that the model is performing as desired on the real-world dataset. Building tools to proactively understand model performance in production, real-time metrics indicating the accuracy deviations, helps to take corrective action. Another option is to see if we can leverage end users to get real-time feedback through a well-defined interactive tool—this can be part of the human-in-loop feedback. These approaches will help support, amp-up and maintain the solution effectively.
But are we solving the right problem?
As computer vision solutions mainly deal with videos and images, it’s very easy to visualise and understand the solution. This often creates a lot of excitement and potential around this technology. It’s important that this excitement is translated into a well-defined business problem; else, we will end up building a cool tech solution without creating any business impact. With the power of computer vision technology, we often get biased to use this technology everywhere (to solve every problem), and this results in force fitting the technology when the same problem can be solved effectively without computer vision. Not every problem needs computer vision as a solution. The key here is to ensure we are solving the right problem using the right technology.
With video and images there is a lot of sensitivity around the data as it includes personal identifiable information (PIA), -being responsible and ethical while building the computer vision-based solution is extremely important so that we don’t compromise the privacy of an individual. Having strong governance policies to review the solution to ensure privacy at each stage of the project is one of the ways to tackle this.
In addition, creating awareness among the developer community on ethical and responsible AI is a must to have. Build for what is required rather than what the technology can offer; i.e., if the requirement is to count people in a video, we really don’t need to do face recognition, there are multiple ways to count people without capturing /compromising the personal identifiable information. Creating this awareness and ensuring everyone who is building this solution is responsible is key for success.
Simply put, unless the data problem is solved, traditional computer vision and image recognition technology will stay out of reach for the majority of businesses. Since the outcomes of the model can alter if the incoming data changes, it is crucial to regularly monitor the model’s performance. In most cases, model outputs must be followed by some sort of action in order for them to be useful. It is possible to create effective, high-performing computer vision models by streamlining your data and model pipelines and avoiding common mistakes. You must create a continuous learning loop to constantly retrain and test your effective model in order to combat data drift and the problem of stale models. By establishing repeatable, automated operations, models can be designed to scale.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here.