The Possibilities of Ubiquitous Video Streams

Leor Grebler

- Last Updated: December 2, 2024

Leor Grebler

- Last Updated: December 2, 2024

The idea of video cameras everywhere is used to conjure up thoughts of Police States or 1984. Today however, each of us walks around with at least two cameras at the ready.

One on their phone and likely another such as video surveillance of the interior of their homes or overlooking their doorsteps. Cameras are everywhere and the tech giants have been investing huge sums into making this technology cheap, accessible, and ubiquitous.

In March, Amazon announced it was acquiring Ring, the video doorbell company. Several years back, Google had acquired Nest, which then acquired Dropcam and brought it into the fold. The two represent billions of dollars invested in developing both the hardware as well as the infrastructure to support large scale video recording and analysis. Over the same period, dozens of alternative products have come to market such as the WeMo NetCam, Netgear Arlo, and Canary. The video camera on other devices such as the Echo Show and the JIBO also have the flexibility of doubling as cameras for the home.

With so much video data being streamed, it begs the question of what's possible when you combine multiple streams together along with some of the latest technologies around AI? What can consumers expect of these devices over the next 2-3 years and what are the considerations, especially around privacy, that we'll need to resolve?

Large advances in hardware technology coupled with new means of processing video have allowed for the costs to exponentially decrease over the years and for the capabilities of these devices to similarly experience exponential growth. Bandwidth, latency, and congestion issues of wireless network technology being addressed means that 4K, 60 FPS video can be streamed without concern about the image being grainy, or buffering.

Behind the scenes, computer vision technology has become commoditized with more service providers offering up the technology and more functionalities being extended to developers. New technology around edge computing may allow for the benefits of computer vision AI with the security of local-based processing.

What can you do today?

In the home, the primary placement of cameras likely include:

Baby cams to check on infants and toddlers
Doorbell cameras that face out onto a front porch
Outdoor cameras looking at backyards
Indoor cameras looking at entry ways

Most of the cameras on the market today come with the ability to stream the video to a phone or desktop, backup the video online, take time lapse images, and push voice to the outcome through the camera. Some also have alerting features through app notifications, email, or text message for event triggers such as motion or the identification of a person.

With these features alone, you can already do a lot:

Know if a package has been delivered
See if anyone is home or has come / gone
Check if the surrounding area is safe
Get a sense of the environment remotely (e.g. is there light inside yet)
Provide voice communication to someone in the area
Check if an infant child is sleeping / safe

However, when you start to add more cameras combined with AI, you can abstract a lot more information about the environment.

Not Hotdog

Google, Amazon, Microsoft, and IBM, among others, now offer computer vision APIs that can be implemented by even a novice developer with extended amazing functionality. These include the ability to:

Identify an object
Identify a person
Understand logos
Extract text
Determine "inappropriate content"
Transcribe the video
Identify handwriting
Identify smiles
Identify emotion
Estimate peoples' ages
Identify gesture
Identify foods

There is a lot of overlap among the service providers and while today these services are still too expensive for continuous use (they cost pennies per minute), the price will likely drop to pennies per hour or day over the next few years. Even with only this capability, it's already possible to start extending the applications that are currently available on today's webcams such as:

Tracking a user from room to room
Logging when someone arrives home or leaves
Tracking the emotion of different people in frame throughout the day
Keeping a record of what we're talking about
Tracking visitors to the home

Today, this is achievable without needing to develop new technologies. What's coming next will reshape how we adopt these devices.

Living on the Edge

Edge computing conjures up thoughts of sensor data collection and running different rules based on this. Where is the application for consumers with video?

The next big revolutions in video will come from on-device chips that will be able to perform much of the same functionalities as connected and always-streaming devices but without the video being stored in the cloud where it is vulnerable to hacking or other misuse.

Google Clips is a good example of edge processing with vision. The camera uses its hardware onboard to analyze the image and pick the best clips. It then allows the user to retrieve these. This allows the device to be always-on but not always compromising privacy.

More capabilities are being pushed to the edge including face detection and facial recognition, emotion detection, and object analysis. With this, some camera systems might only stream an abstraction of what they see versus the entire video stream. It's possible that future versions of these cameras with edge computing can have all of the functionality of today's cloud cameras plus ultra low latency because they're working on the local network.

The abstraction capabilities might be so good that there would essentially be an air gap between what the camera sees and what information is communicated. Retrieving photos might then involve physically connecting the camera to a computer. However, this could mean ultimate privacy.

The Best Version of You aka Let Me Take a #Selfie

Another application that we're likely to see come out from cameras in the home over the next 3-5 years is more health and beauty tracking. We've already seen the strange Amazon Echo Look, which came out last year, with its ability to track your outfits and recommend new styles. It also incorporates the Alexa Assistant so you can speak with it.

We're seeing other cameras incorporate 3D vision to be able to do in-house body scans to check for body composition and measure changes over time. Eventually, this capability might be built into regular cameras or through post processing to determine whether you're looking sickly, gaining or losing weight, or stressed. This information can feed into your AI assistant to help make you happier and healthier.

The Virtual Concierge or Caretaker

There's a growing market for aging in the home while at the same time, people are becoming more comfortable with the idea of having video feeds inside their home. Strategically placed cameras that avoid compromising footage but still give a sense of activity (e.g. pointed at the front door, towards the fridge in the kitchen, or in a laundry room).

Today, there's already a market for live remote camera monitoring of commercial buildings. Companies like UCIT have real people sit behind monitors and act as both surveillance and concierge for property. The same, in a limited capacity, can be applied to homes to provide less intrusive monitoring but still be reliable.

A hybrid approach of a live human checking on activity together with computer vision that can alert if there hasn't been activity or there's been visitors could be useful. If the human sees an issue, they can escalate to a family member or the police.

The Next Five Years

The next generation of in-home cameras is going to combine advanced embedded AI features together with a highly reliable connection to online processing. We'll see Alexa, Google Assistant, and Bixby, among others, embedded into the products and with that, the capability for them to understand what's happening around us. Maybe we'll become more comfortable with the idea of live streaming inside our home if the benefits are substantial.