The M3 Ultra Mac Studio for Local LLMs →

Linked By Federico Viticci

Speaking of the new Mac Studio and Apple making the best computers for AI: this is a terrific overview by Max Weinbach about the new M3 Ultra chip and its real-world performance with various on-device LLMs:

The Mac I’ve been using for the past few days is the Mac Studio with M3 Ultra SoC, 32-core CPU, 80-core GPU, 256GB Unified Memory (192GB usable for VRAM), and 4TB SSD. It’s the fastest computer I have. It is faster in my workflows for even AI than my gaming PC (which will be used for comparisons below; it has an Intel i9 13900K, RTX 5090, 64GB of DDR5, and a 2TB NVMe SSD).

It’s a very technical read, but the comparison between the M3 Ultra and a vanilla (non-optimized) RTX 5090 is mind-blogging to me. According to Weinbach, it all comes down to Apple’s MLX framework:

I’ll keep it brief; the LLM performance is essentially as good as you’ll get for the majority of models. You’ll be able to run better models faster with larger context windows on a Mac Studio or any Mac with Unified Memory than essentially any PC on the market. This is simply the inherent benefit of not only Apple Silicon but Apple’s MLX framework (the reason we can efficiently run the models without preloading KV Cache into memory, as well as generate tokens faster as context windows grow).

In case you’re not familiar, MLX is Apple’s open-source framework that – I’m simplifying – optimizes training and serving models on Apple Silicon’s unified memory architecture. It is a wonderful project with over 1,600 community models available for download.

As Weinbach concludes:

I see one of the best combos any developer can do as: M3 Ultra Mac Studio with an Nvidia 8xH100 rented rack. Hopper and Blackwell are outstanding for servers, M3 Ultra is outstanding for your desk. Different machines for a different use, while it’s fun to compare these for sport, that’s not the reality.⁠⁠

There really is no competition for an AI workstation today. The reality is, the only option is a Mac Studio.

Don’t miss the benchmarks in the story.

Permalink

Is Apple Shipping the Best AI Computers?→

Linked By Federico Viticci

For all the criticism (mine included) surrounding Apple’s delay of various Apple Intelligence features, I found this different perspective by Ben Thompson fascinating and worth considering:

What that means in practical terms is that Apple just shipped the best consumer-grade AI computer ever. A Mac Studio with an M3 Ultra chip and 512GB RAM can run a 4-bit quantized version of DeepSeek R1 — a state-of-the-art open-source reasoning model — right on your desktop. It’s not perfect — quantization reduces precision, and the memory bandwidth is a bottleneck that limits performance — but this is something you simply can’t do with a standalone Nvidia chip, pro or consumer. The former can, of course, be interconnected, giving you superior performance, but that costs hundreds of thousands of dollars all-in; the only real alternative for home use would be a server CPU and gobs of RAM, but that’s even slower, and you have to put it together yourself. Apple didn’t, of course, explicitly design the M3 Ultra for R1; the architectural decisions undergirding this chip were surely made years ago. In fact, if you want to include the critical decision to pursue a unified memory architecture, then your timeline has to extend back to the late 2000s, whenever the key architectural decisions were made for Apple’s first A4 chip, which debuted in the original iPad in 2010. Regardless, the fact of the matter is that you can make a strong case that Apple is the best consumer hardware company in AI, and this week affirmed that reality.

Anecdotally speaking, based on the people who cover AI that I follow these days, it seems there are largely two buckets of folks who are into local, on-device models: those who have set up pricey NVIDIA rigs at home for their CUDA cores (the vast minority); and – the undeniable majority – those who run a spectrum of local models on their Macs of different shapes and configurations (usually, MacBook Pros). If you have to run high-end, performance-intensive local models for academic or scientific workflows on a desktop, the M3 Ultra Mac Studio sounds like an absolute winner.

However, I’d point out that – again, as far as local, on-device models are concerned – Apple is not shipping the best possible hardware on smartphones.

While the entire iPhone 16 lineup is stuck on 8 GB of RAM (and we know how memory-hungry these models can be), Android phones with at least 12 GB or 16 GB of RAM are becoming pretty much the norm now, especially in flagship territory. Even better in Android land, what are being advertised as “gaming phones” with a whopping 24 GB of RAM (such as the ASUS ROG Phone 9 Pro or the RedMagic 10 Pro) may actually make for compelling pocket computers to run smaller, distilled versions of DeepSeek, LLama, or Mistral with better performance than current iPhones.

Interestingly, I keep going back to this quote from Mark Gurman’s latest report on Apple’s AI challenges:

There are also concerns internally that fixing Siri will require having more powerful AI models run on Apple’s devices. That could strain the hardware, meaning Apple either has to reduce its set of features or make the models run more slowly on current or older devices. It would also require upping the hardware capabilities of future products to make the features run at full strength.

Given Apple’s struggles, their preference for a hybrid on-device/server-based AI system, and the market’s evolution on Android, I don’t think Apple can afford to ship 8 GB on iPhones for much longer if they’re serious about AI and positioning their hardware as the best consumer-grade AI computers.

Permalink

Notes on the Apple Intelligence Delay

By Federico Viticci

Simon Willison, one of the more authoritative independent voices in the LLM space right now, published a good theory on what may have happened with Apple’s delay of Apple Intelligence’s Siri personalization features:

I have a hunch that this delay might relate to security.

These new Apple Intelligence features involve Siri responding to requests to access information in applications and then perform actions on the user’s behalf.

This is the worst possible combination for prompt injection attacks! Any time an LLM-based system has access to private data, tools it can call and potentially malicious instructions (like emails and text messages from untrusted strangers) there’s a risk that an attacker might subvert those tools and use them to damage or exfiltration a user’s data.

Willison has been writing about prompt injection attacks since 2023. We know that Mail’s AI summaries were (at least initially?) sort of susceptible to prompt injections (using hidden HTML elements), as were Writing Tools during the beta period. It’s scary to imagine what would happen with a well-crafted prompt injection when the attack’s surface area becomes the entire assistant directly plugged into your favorite apps with your data. But then again, one has to wonder why these features were demoed at all at Apple’s biggest software event last year and if those previews – absent a real, in-person event – were actually animated prototypes.

On this note, I disagree with Jason Snell’s idea that previewing Apple Intelligence last year was a good move no matter what. Are we sure that “nobody is looking” at Apple’s position in the AI space right now and that Siri isn’t continuing down its path of damaging Apple’s software reputation, like MobileMe did? As a reminder, the iPhone 16 lineup was advertised as “built for Apple Intelligence” in commercials, interviews, and Apple’s website.

If the company’s executives are so certain that the 2024 marketing blitz worked, why are they pulling Apple Intelligence ads from YouTube when “nobody is looking”?

On another security note: knowing Apple’s penchant for user permission prompts (Shortcuts and macOS are the worst offenders), I wouldn’t be surprised if the company tried to mitigate Siri’s potential hallucinations and/or the risk of prompt injections with permission dialogs everywhere, and later realized the experience was terrible. Remember: Apple announced an App Intents-driven system with assistant schemas that included actions for your web browser, file manager, camera, and more. Getting any of those actions wrong (think: worse than not picking your mom up at the airport, but actually deleting some of your documents) could have pretty disastrous consequences.

Regardless of what happened, here’s the kicker: according to Mark Gurman, “some within Apple’s AI division” believe that the delayed Apple Intelligence features may be scrapped altogether and replaced by a new system rebuilt from scratch. From his story, pay close attention to this paragraph:

There are also concerns internally that fixing Siri will require having more powerful AI models run on Apple’s devices. That could strain the hardware, meaning Apple either has to reduce its set of features or make the models run more slowly on current or older devices. It would also require upping the hardware capabilities of future products to make the features run at full strength.

Inference costs may have gone down over the past 12 months and context windows may have gotten bigger, but I’m guessing there’s only so much you can do locally with 8 GB of RAM when you have to draw on the user’s personal context across (potentially) dozens of different apps, and then have conversations with the user about those results. It’ll be interesting to watch what Apple does here within the next 1-2 years: more RAM for the same price on iPhones, even more tasks handed off to Private Cloud Compute, or a combination of both?

We’ll see how this will play out at WWDC 2025 and beyond. I continue to think that Apple and Google have the most exciting takes on AI in terms of applying the technology to user’s phones and apps they use everyday. The only difference is that one company’s announcements were theoretical, and the other’s are shipping today. It seems clear now that Apple got caught off guard by LLMs while they were going down the Vision Pro path, and I’ll be curious to see how their marketing strategy will play out in the coming months.

Introducing NPC XL: More NPC, Every Week

By Federico Viticci

Welcome to NPC XL.

Ever since Brendon, John, and I started our podcast about portable gaming – NPC: Next Portable Console – last year, I knew I’d found something special. It’s not just that the three of us are obsessed with handhelds and portable consoles; it’s that we work well together, and we’re having so much fun doing the show every two weeks. Who wouldn’t want to do even more with a project they love?

So today, we’re announcing some big changes to NPC:

We’re taking the regular show weekly, for free, for everyone!
We’re introducing NPC XL, a members-only version of NPC with extra content, available exclusively through our new Patreon for $5/month.
NPC is getting its own YouTube channel. With an expansion of the show, it made sense to let it grow beyond the MacStories YouTube channel.
NPC is joining the (awesome) TWG Discord server with a dedicated channel for community feedback and participation.

You can find our Patreon here, and we also dropped a surprise episode of NPC today announcing the expansion of the show:

Now, allow me to spend a few more words on why we’re doing this and what you can expect from becoming a patron of NPC XL.

Gemini for iOS Gets Lock Screen Widgets, Control Center Integration, Basic Shortcuts Actions

By Federico Viticci

Gemini for iOS.

When I last wrote about Gemini for iOS, I noted the app’s lackluster integration with several system features. But since – unlike others in the AI space – the team at Google is actually shipping new stuff on a weekly basis, I’m not too surprised to see that the latest version of Gemini for iOS has brought extensive support for widgets.

Specifically, Gemini for iOS now offers a collection of Lock Screen widgets that also appear as controls in iOS 18’s Control Center, and there are barebones Shortcuts actions to go along with them. In both the Lock Screen’s widget gallery and Control Center, you’ll find Gemini widgets to:

type a prompt,
Talk Live,
open the microphone (for dictation),
open the camera,
share an image (with a Photos picker), and
share a document (with a Files picker).

It’s nice to see these integrations with Photos and Files; notably, Gemini now also has a share extension that lets you add the same media types – plus URLs from webpages – to a prompt from anywhere on iOS.

The Shortcuts integration is a little less exciting since Google implemented old-school actions that do not support customizable parameters. Instead, Gemini only offers actions to open the app in three modes: type, dictate, or Talk Live. That’s disappointing, and I would have preferred to see the ability to pass text or images from Shortcuts directly to Gemini.

While today’s updates are welcome, Google still has plenty of work left to do on Apple’s platforms. For starters, they don’t have an iPad version of the Gemini app. There are no Home Screen widgets yet. And the Shortcuts integration, as we’ve seen, could go much deeper. Still, the inclusion of controls, basic Shortcuts actions, and a share extension goes a long way toward making Gemini easier to access on iOS – that is, until the entire assistant is integrated as an extension for Apple Intelligence.

“Everyone Is Caught Up, Except for Apple”→

Linked By Federico Viticci

Good post by Parker Ortolani (who’s blogging more frequently now; I recommend subscribing to his blog) on the new (and surprisingly good looking?) Alexa+ and where Apple stands with Siri:

So here we are. Everyone is caught up, except for Apple. Siri may have a pretty glowing animation but it is not even remotely the same kind of personal assistant that these others are. Even the version of Siri shown at WWDC last year doesn’t appear to be quite as powerful as Alexa+. Who knows how good the app intents powered Siri will even be at the end of the day when it ships, after all according to reports it has been pushed back and looks like an increasingly difficult endeavor. I obviously want Siri to be great. It desperately needs improvement, not just to compete but to make using an iPhone an even better experience.

I continue to think that Apple has immense potential for Apple Intelligence and Siri if they get both to work right with their ecosystem. But at this point, I have to wonder if we’ll see GTA 6 before Siri gets any good.

Permalink

With Pokémon Champions, Competitive Pokémon ‘VGC’ May Finally Go Mainstream

By Federico Viticci

Pokémon Champions artwork. Source: Pokémon.com.

Today is Pokémon Day, and, as they do every year, The Pokémon Company held a Pokémon Presents keynote showcasing the latest updates coming to their slate of titles, including a gameplay reveal for the upcoming Pokémon Legends: Z-A for Nintendo Switch.

Beyond ChatGPT’s Extension: How to Redirect Safari Searches to Any LLM

By Federico Viticci

xSearch for Safari.

Earlier this week, OpenAI’s official ChatGPT app for iPhone and iPad was updated with a native Safari extension that lets you forward any search query from Safari’s address bar to ChatGPT Search. It’s a clever approach: rather than waiting for Apple to add a native ChatGPT Search option to their list of default search engines (if they ever will), OpenAI leveraged extensions’ ability to intercept queries in the address bar and redirect them to ChatGPT whenever you type something and press Return.

However, this is not the only option you have if you want to redirect your Safari search queries to a search engine other than the one that’s set as your default. While the solution I’ll propose below isn’t as frictionless as OpenAI’s native extension, it gets the job done, and until other LLMs like Claude, Gemini, Perplexity, and Le Chat ship their own Safari extensions, you can use my approach to give Safari more AI search capabilities right now.

Apple Vision Glasses Will Be Irresistible →

Linked By Federico Viticci

I found myself nodding in agreement from beginning to end with this story by Lachlan Campbell, who, after a year of Vision Pro, imagines what future Apple Vision glasses may be able to do and how they’d reshape our societal norms:

I’ve written about my long-term belief in spatial computing, and how visionOS 2 made small but notable progress. The pieces have clicked into place more recently for me for what an AR glasses version of Apple Vision would look like, and how it will change us. We don’t have the technology, hardware-wise, to build this product today, or we’d already be wearing it. We need significant leaps in batteries, mobile silicon, and displays to make this product work. Leaps in AI assistance, cameras, and computer vision would make this product better, too. But the industry is hard at work at all of these problems. This product is coming.

The basic pitch: augmented reality glasses with transparent lenses that can project more screen than you could ever own, wherever you are. The power of real software like iPad/Mac, an always-on intelligent assistant, POV photos/video/audio, and listening to audio without headphones. Control it like Apple Vision Pro with your eyes, hands, and voice, optionally pairing accessories (primarily AirPods and any of stylus/keyboard/trackpad/mice work for faster/more precise inputs). It’s cellular (with an Apple-designed modem) and entirely wireless. It combines the ideas of ambient computing that Humane (RIP) and Meta Ray-Bans have begun, including a wearable assistant, POV photography, and ambient audio with everything you love about your current Apple products.

I may be stating the obvious here, but I fundamentally believe that headsets are a dead end and glasses are the ultimate form factor we should be striving for. Or let me put it another way: every time I use visionOS, I remember how futuristic everything about it still feels…and how much I wish I was looking at it through glasses instead.

There’s a real possibility we may have Apple glasses (and an Apple foldable?) by 2030, and I wish I could just skip ahead five years now. As Lachlan argues, we’re marching toward all of this.

Permalink