According to the rumor, Apple has long been considering developing AI until 2024 and beyond, and new research could now greatly contribute to making this a reality while being able to maintain Apple’s demand for security and privacy.
To date, large language models (LLMs) like those on which ChatGPT is based have been powered by computers hosted in data centers and accessible via a web page or iPhone app. These are huge software that require equally huge amounts of resources to work properly, making it problematic to run them locally on phones like the upcoming iPhone 16. But by running the LLMs in data centers, there is a privacy issue to consider, and as Apple is already striving to keep as many Siri queries as possible on the device, it is not surprising that Apple would want to do the same with any LLM implementation it is working on.
Now, a research paper could have the answer and could open the door to Apple’s internal GPT debuting outside of Apple Park. But if Siri really needs a major upgrade, could the 2024 iPhones come too soon?
The research paper, titled « LLM in a flash: Efficient Large Language Model Inference with Limited Memory », is written by several Apple engineers and explains how an LLM could be used on devices with limited RAM (or DRAM), such as iPhones. The paper would also be useful for upgrading Siri on similar devices limited in RAM, such as low-end MacBooks and the iPad, not to mention the Apple Watch.
« Large language models (LLMs) are at the heart of modern natural language processing, offering exceptional performance in various tasks, » begins the paper. « However, their intensive computational and memory needs pose challenges, especially for devices with limited DRAM capacity. This article addresses the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory and bringing them on demand into the DRAM. »
Flash storage, or the storage you choose when purchasing your iPhone, is much more abundant and can be reserved for storing LLM data. The paper addresses different ways of using a device’s flash storage instead of DRAM. There are two main methods addressed, namely « windowing » and « row and column packing. »
The paper explains that « these methods collectively allow models to be run up to two times the size of the available DRAM, with a 4 to 5 times increase and a 20 to 25 times increase in inference speed compared to naive loading approaches in the CPU and GPU, respectively. »
The benefits of such an approach are obvious. Not only would storing an LLM on an iPhone be beneficial in terms of eliminating the need to store it in a remote data center and improving privacy, but it would also be much faster. Eliminating the latency created by poor data connections is one thing, but the speed increase goes beyond that and could allow Siri to respond more accurately and quickly than ever.
According to some rumors, Apple is already working on bringing improved microphones to the iPhone 16 range, likely in order to ensure that Siri hears more clearly what people are asking. Add to that the potential for an LLM breakthrough, and the 2024 iPhones could have serious AI capabilities.
More from iMore