Federated Learning for Domain-Specific Language Models

Overview

At ServiceNow, most ML use cases were NLP: text classification, sentiment analysis, ticket routing, summarization. The standard pipeline was a language model encoder (BERT) plus a task-specific head trained on downstream data. The problem was that open-domain language models did not perform well on enterprise text. Customer data is full of internal jargon, product names, workflow terminology, and domain-specific patterns that models trained on Wikipedia and news articles have never seen. We needed domain-specific language models, and training those required customer data.

ServiceNow's customers each operate in completely siloed environments, on cloud or on-prem. Their data never leaves their infrastructure. The existing process involved asking customers for explicit permission, then extracting and storing their data internally. Most customers were reluctant due to privacy concerns, and even when they agreed, the extraction and storage required significant compliance and governance work. The pipeline was slow, expensive, and undersupplied with data because too few customers opted in.

I led the applied research effort to show that federated learning could replace the centralized training procedure, removing the need to extract or commingle customer data.

Result

The federated model's performance on downstream intent classification was indistinguishable from the centrally trained model.

The Hypothesis

Federated learning offered a different approach. Instead of bringing the data to the model, you bring the model to the data. Each customer trains a copy of the model locally on their own data, then sends back only the updated model weights. A central server aggregates these updates and produces a new model. The raw data never leaves the customer's infrastructure.

The business hypothesis: if customers no longer need to hand over their data, more of them will participate, which means more diverse training data and better ML products. The technical hypothesis: a federatively trained BERT model would perform as well as one trained on the same data in the traditional centralized way.

What I Did

Phase 1: Federated Training and Benchmarking

We gathered datasets from five customers who had opted into ServiceNow's data-sharing program. Rather than pooling this data centrally, we simulated a federated setup. A server initialized a BERT model, made copies, and distributed one to each client. Each client trained on their local data and sent the updated model back to the server, which averaged the weights using FedAvg (McMahan et al.). This process ran for multiple rounds until the model converged.

To evaluate the result, we froze the federated language model and used it as a feature extractor for downstream intent classification tasks. We ran a benchmarking pipeline: embeddings from the frozen model fed into downstream classifiers, with hyperparameter search to find the best configuration for each task. We compared the aggregated scores against a BERT model trained on the same data in the conventional centralized way.

We also checked for catastrophic forgetting to confirm the federated model retained its general-purpose capabilities. It did. No measurable degradation.

Phase 2: Privacy Evaluation

Federated learning prevents the transfer of raw customer data, but it does not automatically prevent the model from memorizing that data. A model trained on sensitive text can encode specific details (names, phone numbers, internal identifiers) into its parameters. The fact that the training data never left the customer's infrastructure does not mean the resulting model is safe to share.

We ran membership inference attacks on the federated model to measure what it had learned about individual training examples. The methodology was based on "The Secret Sharer" by Carlini et al., which provides a framework for quantifying unintended memorization in neural networks. We specifically tested whether the model could be probed to extract sensitive information like phone numbers from the training data. I documented the findings in an internal report.

Federated learning is a data transfer solution, not a privacy solution. If you want actual privacy guarantees, you need additional techniques on top of it.

Results

Demonstrated that a federatively trained BERT model performs indistinguishably from its centrally trained counterpart on intent classification tasks.
Validated the technical feasibility of replacing ServiceNow's centralized training pipeline with federated learning.
Completed a privacy evaluation documenting the model's memorization characteristics under adversarial probing.
The business case: customers who previously refused to share data could now participate, since their data never leaves their infrastructure. Business impact is measured by the number of new customers willing to opt in under the federated setup.