How Federated Learning Overcomes AI Privacy Concerns

Updated on July 19, 2020
RonElFran profile image

Ron is a retired engineer and manager for IBM and other high tech companies. He writes extensively and in depth about modern technology.

Source

The use of artificial intelligence (AI) and machine learning (ML) to extract value from mountains of data is accelerating. In areas such as marketing, health, autonomous vehicles, banking, and the internet of things (IoT), the ability of AI/ML* to discern subtle patterns and correlations in large datasets is providing insights and capabilities that were previously unavailable.

To perform its magic, a machine learning model or algorithm must be “trained” to discern patterns of interest in the data it ingests. The accuracy of the model depends directly on the amount of data used to train it. That’s why for most real-world use cases, producing an effective and useful AI/ML model requires huge amounts of training data. And that presents a problem with respect to privacy.

* For our purposes in this article, we'll use the terms AI, ML, and AI/ML interchangeably.

Privacy is a major issue for AI today

Here’s an example of the problem.

Developing AI/ML algorithms that can reliably assist physicians in diagnosing medical conditions requires that the models be trained using immense quantities of data from real patients. The amount and variety of data required is far beyond what a single hospital could provide. Traditionally, that has meant that the data from many institutions had to be pooled in a centralized repository to aggregate the huge amount required for training the ML model.

But with today’s emphasis on privacy, sharing the personal information of patients has become extremely problematical. The European Union’s General Data Protection Regulation (GDPR), for example, strictly forbids exchanging an individual’s personal information (PI) between different organizations without that person’s express permission. It also gives individuals control over the use to which their information can be put. The impracticality of obtaining consent from each person whose data forms part of a training dataset severely limits the development of effective AI/ML diagnostic assistants.

But a new approach initially developed by Google in 2017, called Federated Learning, allows AI models to be trained without the requirement of sharing and consolidating private information.

What is federated learning?

Federated learning was developed as a means of eliminating the requirement for a central store of raw data for AI model training. Instead, model training is carried out at each data source. (Examples of data sources, often referred to as endpoint devices or clients, include consumers’ smartphones, IoT devices, autonomous vehicles, and electronic health information systems). Only model updates, and never the raw data residing on the endpoint devices, are sent to a central location

Here’s how it works.

The federated learning process

How federated learning works
How federated learning works | Source

First, a generic machine learning model is generated at a central server. This model, which is nothing more than a starting baseline, is distributed to all endpoint or client devices. In the case of smartphones or IoT devices, for example, these could number in the millions. It is in the clients that the raw data, including any potentially sensitive or protected personal information, resides.

Each client updates the ML model it receives from the central server, using its own data as training inputs. The client then returns its locally updated model to the central server, which aggregates the updates from all clients and uses them to generate a new baseline model. The new baseline is then distributed to the clients, and the cycle is repeated until the baseline is optimized.

In its announcement of this new technology, Google provided a concrete, real-world example of its value. Although most users are unaware of it, whenever they type text into their smartphone, they are using AI. That’s because smartphones use an AI-based predictive text model to attempt to predict the next word when you begin typing text into the phone. As Karen Hao, artificial intelligence reporter for the MIT Technology Review, notes in a recent article, it is federated learning that “allowed Google to train its predictive text model on all the messages sent and received by Android users—without ever actually reading them or removing them from their phones.”

Impact of federated learning on machine learning

Federated learning is expected to fundamentally change how AI models are developed. A good example of that transformation is in the way medical AI models are trained. Before the advent of federated learning, the necessity of amassing huge quantities of data at a central location severely limited researchers’ ability to develop effective AI diagnostic models. As Karen Hao says,

“You can’t deploy a breast cancer detection model around the world when it’s only been trained on a few thousand patients from the same hospital. All this could change with federated learning.”

Today, most organizations have only a limited supply of internally generated data they can use in training their AI models; and they face huge obstacles, due to legal, regulatory or business restrictions, in acquiring valid training data from other organizations to augment the data available internally. Federated learning should give a tremendous boost to the use of AI in areas such medicine, IoT, autonomous vehicles, etc., by allowing organizations to collaborate in building accurate AI models while keeping their sensitive personal or business data safely in-house.

Potential issues with federated learning

Training AI models is a compute- and memory-intensive process. Because federated learning requires that such training take place on endpoint devices such as smartphones, autonomous vehicles, or IoT devices, the compute load on those devices could be disruptive to their normal functions. One approach to mitigating these difficulties is to schedule AI model training processes for times when the device would normally be idle.

In addition, having perhaps millions of devices sending and receiving model updates across a network could cause bandwidth limitation problems. Google has addressed this issue with its Federated Averaging algorithm, which can train deep networks using 10-100x less communication compared to an implementation lacking that feature.

Another, perhaps more serious issue is the vulnerability of federated learning to what’s called “model poisoning.” Because a federated learning AI model is developed by ingesting model update data from large numbers of endpoint devices, malicious actors may have the opportunity to compromise the final model by fabricating or “poisoning” the model update information sent from some endpoint devices. This might allow them to create back doors into the model. Because model update data is extremely difficult for humans to interpret, and because keeping the source of model information anonymous is a design feature of many federated learning implementations, identifying the source, or even the existence, of tainted information provided to the baseline model could be extremely difficult.

Protecting against this possibility will probably involve development of some kind of “set a good AI model to catch a bad AI model” strategy.

The future of federated learning

The ability to train AI/ML models without violating data privacy is a huge technological advancement. That’s why Federated learning has the potential to be a game-changer in many AI application areas, including computer vision, natural language processing, health care, autonomous vehicles, IoT, and the large-scale prediction and recommendation applications used in e-commerce systems. It would be no exaggeration to say that, to a significant degree, federated learning is reshaping the future of AI.

© 2020 Ronald E Franklin

Comments

    0 of 8192 characters used
    Post Comment
    • RonElFran profile imageAUTHOR

      Ronald E Franklin 

      3 weeks ago from Mechanicsburg, PA

      Much appreciated, Jo.

    • RonElFran profile imageAUTHOR

      Ronald E Franklin 

      3 weeks ago from Mechanicsburg, PA

      Thanks, Eric.

    • jo miller profile image

      Jo Miller 

      3 weeks ago from Tennessee

      Very thorough and informative.

    • Ericdierker profile image

      Eric Dierker 

      3 weeks ago from Spring Valley, CA. U.S.A.

      How very interesting and so relevant today. I have to admit that I am very easily modeled and that is OK with me.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, turbofuture.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://maven.io/company/pages/privacy

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)