This Week's Sponsor:

Washing Machine X9

Spring Clean Your Mac Effortlessly


Apple Is Using Differential Privacy to Improve Apple Intelligence

Apple has been using differential privacy for nearly ten years to collect its users data in a way that isn’t traceable back to an individual. As Apple explains in a recent post on its Machine Learning Research site:

This approach works by randomly polling participating devices for whether they’ve seen a particular fragment, and devices respond anonymously with a noisy signal. By noisy, we mean that devices may provide the true signal of whether a fragment was seen or a randomly selected signal for an alternative fragment or no matches at all. By calibrating how often devices send randomly selected responses, we ensure that hundreds of people using the same term are needed before the word can be discoverable.

The company has used the technique to analyze everything from the popularity of emoji to what words to suggest with QuickType.

Now, Apple is using differential privacy to mine the data of users who have opted into sharing device analytics to improve Apple Intelligence. So far, the technique’s use has been limited to improving Genmoji, but in upcoming OS releases, it will be used for “Image Playground, Image Wand, Memories Creation and Writing Tools in Apple Intelligence, as well as in Visual Intelligence,” too.

The report explains that:

Building on our many years of experience using techniques like differential privacy, as well as new techniques like synthetic data generation, we are able to improve Apple Intelligence features while protecting user privacy for users who opt in to the device analytics program. These techniques allow Apple to understand overall trends, without learning information about any individual, like what prompts they use or the content of their emails. As we continue to advance the state of the art in machine learning and AI to enhance our product experiences, we remain committed to developing and implementing cutting-edge techniques to protect user privacy.

For Genmoji, this means collecting data on the most popular prompts used to create the emoji-like images. Apple explains that written content is more challenging but that it can use an LLM to generate synthetic data like emails. The synthetic data is then sent to users’ devices who have opted into device analytics to determine which data matches actual user data most closely and frequently, again using differential privacy to prevent individual device identification.

Using differential privacy to improve Apple Intelligence without directly scraping user data is clever, but it does make me wonder why something similar wasn’t used to generate Apple’s large language models that were trained on the contents of the Internet. Perhaps that’s not possible at the scale of an LLM, or maybe that initial model needs a level of precision that differential privacy doesn’t offer, but I think it’s fair to ask.