Big Data & Smart Cities: A Convoluted Web

Susan Morrow

- Last Updated: December 2, 2024

Susan Morrow

- Last Updated: December 2, 2024

Continuing on from my previous post about the "blood" of our smart cities in "Data and Identity in the Smart City," I want to look further into the way data is used in these smarter living spaces.

IoT devices act as the beating heart of a city, pumping the blood (data) around. But, IoT devices can also generate data, and the underlying structures of the data life cycle will be aggregated and deeply intertwined.

IoT and Cyber-Physical Systems: The Beating Heart of the Smart City

The Internet of Things (IoT) is a manufacturing success story of the modern era. These devices are used for both consumer and commercial reasons. This dual usage model means that IoT devices are part of the critical infrastructure of the smart city, almost like a smart superstructure. IoT devices act like the beating heart of the city, pumping the blood (data) around. But, IoT devices can also generate data. For example, the health apps market is expected to grow at an astonishing CAGR of over 44% to 2025 (Grand View Research). This is an important area of the application of IoT technology as it can play a crucial part in the optimization of continuously stretched healthcare services.

As our population grows in size, it also grows in healthcare needs. Non-communicable diseases are as harmful to the smart city as communicable ones were to our first cities. Diabetes 2, for example, is predicted by Boyle, et al., to grow by 165% to 2050 in North America. Healthcare apps, which communicate health data to healthcare workers, can help to alleviate the stress on emergency responders and hospital care, which we are already experiencing in major cities.

Cyber-physical systems (CPS) are needed to create a ubiquitous, always connected, service infrastructure within the smart city. Where the cyber (integrated network, e.g., the internet) and physical (e.g,. an electrical grid) meet, data feeds the collaboration of the two. There are a number of technical challenges in creating a homogeneous layer in the smart city, and the cyber-physical is one of them. However, technical challenges aside, this post, and the associated book to come, is about privacy. As we discover more ways to connect the cyber and the physical, we need to be cognizant of their thirst for our data to analyze and use to optimize the service.

Much of the smart city will be dependent on a variety of cyber-physical systems, including smart transport, smart grids, smart medical services, and so on. The risk of a data breach is a serious consideration in the design and implementation of a CPS. A CPS is a critical infrastructure in its own right but overlaid with data, it becomes a superstructure. Some of these systems may also interact to become co-dependent. You can see where this is going. Touch one CPS and you may well expose others. An example of how this is impacting our lives currently, in a non-smart scenario, is in the world of credential stuffing.

A recent Federal Trade Commission (FTC) investigation into data exposure affecting an accountancy firm, TaxSlayer, set a new security and compliance precedent. This concerned the use of "credential stuffing" practices, which involve indirect credential exposure. For example, company Y has user credentials compromised; the knock-on effect is that company Z suffers a data exposure using those compromised credentials. The precedent set by the FTC was in the ruling that company Z was as liable as company Y. This has created a rule of interoperability that will need to be understood and accounted for in co-dependent CPS within a smart city context—a kind of “six degrees of separation” law. In a highly connected smart city, this will have ramifications.

Again, respect for the privacy of an individual's data has to become second nature to the smart city design. Compliance and law, at least, will dictate this.

A Case Study for a Smart City and Its Data: Big Medical Data

“A Process-Based Approach to Informational Privacy and the Case of Big Medical Data” by Michael Birnhack, professor of law at Tel Aviv University, offers an interesting perspective. Birnhack’s views fit in with the idea of big data being its own interconnected superstructure. In a comparison of traditional medical research to big medical data research, Birnhack says:

“Data, not the body. Big medical data research is conducted ex-post, after treatment and after the collection of data. Whereas in traditional data, the patient becomes a human subject and then a data subject, consenting to the anonymous use, in a big data context, the patient becomes a data subject, skipping the status of human subject.”

Birnhack urges us to take a process-based approach to system design and data privacy. He also points out the failings of de-identification of big medical data, as the practicalities of research result in pseudonymity rather than anonymity.

Recently, Ann Cavoukian, ex-information and privacy commissioner for Ontario, Canada, presented her views on smart city data. Cavoukian is currently an advisor for Waterfront Toronto, an initiative looking at sustainable development using data and smart technologies. Cavoukian stated that the project should mandate the use of “strong de-identification protocols” that have exceptional results in preventing re-identification as mentioned in Birnhack’s paper. However, because of the smart superstructure way that data is collected, aggregated, and shared within a complex ecosystem of a smart city, getting consent across this is very complicated. As Ann Cavoukian points out, in a city with sensors, collecting consent can be very difficult.

What is most interesting about big medical data, which of course can be applied to other big data that is personal, is the dehumanization of this data. Our smart cities will draw on this data like never before. The underlying structures of the data life cycle will be aggregated and deeply intertwined.

We need to bear this in mind for all our smart city data governance. Governance should not be siloed because our data is not siloed.