As development teams race to build out AI tools, it is becoming increasingly common to train algorithms on edge devices. Federated learning, a subset of distributed machine learning, is a relatively new approach that allows companies to improve their AI tools without explicitly accessing raw user data.
Conceived by Google in 2017, federated learning is a decentralized learning model through which algorithms are trained on edge devices. In regard to Google’s “on-device machine learning” approach, the search giant pushed their predictive text algorithm to Android devices, aggregated the data and sent a summary of the new knowledge back to a central server. To protect the integrity of the user data, this data was either delivered via homomorphic encryption or differential privacy, which is the practice of adding noise to the data in order to obfuscate the results.
Generally speaking, with federated learning, the AI algorithm is trained without ever recognizing any individual user’s specific data; in fact, the raw data never leaves the device itself. Only aggregated model updates are sent back. These model updates are then decrypted upon delivery to the central server. Test versions of the updated model are then sent back to select devices, and after this process is repeated thousands of times, the AI algorithm is significantly improved—all while never jeopardizing user privacy.
This technology is expected to make waves in the healthcare sector. For example, federated learning is currently being explored by medical start-up Owkin. Seeking to leverage patient data from several healthcare organizations, Owkin uses federated learning to build AI algorithms with data from various hospitals. This can have far-reaching effects, especially as it’s invaluable that hospitals are able to share disease progression data with each other while preserving the integrity of patient data and adhering to HIPAA regulations. By no means is healthcare the only sector employing this technology; federated learning will be increasingly used by autonomous car companies, smart cities, drones, and fintech organizations. Several other federated learning start-ups are coming to market, including Snips, S20.ai, and Xnor.ai, which was recently acquired by Apple.
Seeing as these AI algorithms are worth a great deal of money, it’s expected that these models will be a lucrative target for hackers. Nefarious actors will attempt to perform man-in-the-middle attacks. However, as mentioned earlier, by adding noise and aggregating data from various devices and then encrypting this aggregate data, companies can make things difficult for hackers.
Perhaps more concerning are attacks that poison the model itself. A hacker could conceivably compromise the model through his or her own device, or by taking over another user’s device on the network. Ironically, because federated learning aggregates the data from different devices and sends the encrypted summaries back to the central server, hackers who enter via a backdoor are given a degree of cover. Because of this, it is difficult, if not impossible, to identify where anomalies are located.
Bandwidth and Processing Limitations
Although on-device machine learning effectively trains algorithms without exposing raw user data, it does require a ton of local power and memory. Companies attempt to circumvent this by only training their AI algorithms on the edge when devices are idle, charging, or connected to Wi-Fi; however, this is a perpetual challenge.
The Impact of 5G
As 5G expands across the globe, edge devices will no longer be limited by bandwidth and processing speed constraints. According to a recent Nokia report, 4G base stations can support 100,000 devices per square kilometer; whereas, the forthcoming 5G stations will support up to 1 million devices in the same area. With enhanced mobile broadband and low latency, 5G will provide energy efficiency, while facilitating device-to-device communication (D2D). In fact, it is predicted that 5G will usher in a 10-100x increase in bandwidth and a 5-10x decrease in latency.
When 5G becomes more prevalent, we’ll experience faster networks, more endpoints, and a larger attack surface, which may attract an influx of DDoS attacks. Also, 5G comes with a slicing feature, which allows slices (virtual networks) to be easily created, modified, and deleted based on the needs of users. According to a research manuscript on the disruptive force of 5G, it remains to be seen whether this network slicing component will allay security concerns or bring a host of new problems.
To summarize, there are new concerns from both a privacy and a security perspective; however, the fact remains: 5G is ultimately a boon for federated learning.