‘Lunar Lake’ Explained: How Intel’s Moonshot Mobile CPUs Will Escalate the AI Wars

June 4, 2024

2 Views 0

SaveSavedRemoved 0

TAIPEI—”Chips are cool again,” said Michele Johnston Holthaus, executive vice president and general manager of the Client Computing Group at Intel, last week near the start of a spate of presentations to press, analysts, and developers about the silicon titan’s coming mobile processors. Indeed, we agree with her: These last few weeks have been wild times for laptop CPUs, with field-shaking claims, releases, and moves by Qualcomm, AMD, and now Intel.Intel just rolled out deeper details on “Lunar Lake,” its next-generation mobile platform. This article follows a two-day event preceding the 2024 Computex trade show in Taiwan, the seminal annual event for the PC ecosystem. The big takeaways: Lunar Lake will be a significant leap on multiple fronts over the chipmaker’s first generation of mobile chips with local AI or neural processing units (NPUs), 2023’s “Meteor Lake.” The new architecture promises a massive jump in the AI benchmark of trillion operations per second (TOPS)—up to 120 TOPS, to be specific—along with a much-improved integrated graphics core dubbed Xe2 and a design that emphasizes power efficiency and battery life. In a nod to changing times and tech, Lunar Lake drops Intel’s longtime simultaneous multithreading technology (SMT), Hyper-Threading, along the way.

(Credit: John Burek)

To clarify, Intel isn’t talking yet about specific Lunar Lake processors, clock speeds, or other specs; this event focused on the architectural details and design thinking behind the upcoming CPU line. Even so, there’s a lot to chew on, including some substantial counterclaims to recent proclamations from Qualcomm about its Arm-based Snapdragon X chips, which had a moment in the sun at Microsoft’s Build conference in May and are featured in much-anticipated models from Lenovo, HP, and others. No one has independently tested a Snapdragon X laptop in the wild, but Qualcomm has already made lofty performance and efficiency claims.”Lunar Lake offers faster core performance than Arm,” Holthaus said flatly. “We’re going to bust the myth that [x86] can’t be as efficient.”What Is Lunar Lake?Lunar Lake is the codename for Intel’s next-generation mobile processors, expected to hit the market in the third quarter of 2024 and announced in a teaser a few weeks before the Taipei event. The chips will be manufactured partly by TSMC and partly by Intel, with the latter responsible for the assembly, testing, and packaging (the last employing Intel’s Foveros 3D stacking tech).Lunar Lake’s expected debut follows last year’s Meteor Lake mobile processors, the first Intel chips with a built-in neural processing unit. The company’s forthcoming high-performance desktop chips, “Arrow Lake,” will follow later in ’24, ushering in a new AI-enabled desktop platform and new CPU socket. As a mobile CPU emphasizing power efficiency in every area, Lunar Lake will employ a new GPU, a new NPU, and a rethought set of Performance cores (P-cores) and Efficient cores (E-cores). The goal of longer battery life is intertwined with changes to how system memory will be implemented.

(Credit: Intel)

A host of Intel’s claims around Lunar Lake revolve around AI and integrated graphics (IGP) performance, memory bandwidth, and power efficiency, including addressing the TOPS arms race developing among Intel, Qualcomm, and AMD head-on. One of the presenters in Taipei, Rob Bruckner, corporate VP and CTO of client platform architecture in Intel’s Client Computing Group, declared that AI supremacy is not just about having the top TOPS, however. Implementing supporting technologies like memory (and its associated bandwidth) and access to a robust ecosystem of software partners and tools are just as important. The last was a theme Intel hammered repeatedly during the event.These changes follow what Intel says is a strong start to the “AI PC” movement that arguably began with Meteor Lake. The chipmaker says it has shipped 8 million Core Ultra parts to date, incorporated into 230-plus design wins among ecosystem partners—which, Intel claims, outpaces all of its rivals’ shipments of AI PCs combined. It expects Lunar Lake to keep that momentum going.The Thinking Behind Lunar LakeIntel outlines four main goals for the Lunar Lake project: maximizing core performance per watt, implementing a powerful new graphics engine, muscling up the computing throughout, and examining all of this through a lens of energy efficiency.Intel’s approach leverages many techniques. The first is to supercharge the “hybrid cores” in a new way to balance compute power and power consumption. Intel will also double down on its Arc integrated graphics, looking beyond mere display and gaming performance—GPU muscle is also about content creation, and a nontrivial segment of developers have shown an affinity for AI workloads that rely on the GPU. Indeed, Intel is optimizing its new platform for AI performance with multiple engines behind it—the IGP, CPU, and NPU—all contributors to platform AI power, as different developers leverage different processor resources in their designs. (The NPU is still in its infancy, after all.)

(Credit: John Burek)

The company’s professed lead in advanced packaging (anchored by its Foveros 3D tech) will be deployed alongside a new memory-on-package approach. Optimizing the memory interface is another crucial factor in AI performance. TOPS figures have gained some prominence as a simple expression of AI potential, but they leave out a lot of nuance. For one thing, memory bandwidth is a crucial piece of AI performance, as well. Moving Lunar Lake’s LPDDR5X memory onto the chip package (with room for two ranks of 32GB LPDDR5X) may reduce PC manufacturers’ configuration flexibility, but reduces the space required (a plus for makers of thin laptops) and delivers a decided improvement in memory bandwidth plus a modest one in power consumption.Intel’s overall performance claims for Lunar Lake are impressive. The company promises faster multithreaded performance than Meteor Lake but perhaps even more remarkable gains for single-threaded when it comes to power consumption (more about that in the next section). As for AI throughput, Intel predicts more than triple the AI throughput across CPU, GPU, and NPU. And graphics should see up to a 50% boost in performance versus Iris Xe in Meteor Lake for gamers and creators.Last but not least, Intel says overall system-on-chip (SoC) power consumption should drop by as much as 40%. That’s exciting news for battery life, giving system makers more flexibility when juggling unplugged life versus chassis size.AI, though, will be a defining metric from now on. With 120 platform TOPS (up to 48 provided by the NPU, based on INT8), Lunar Lake is ready for the coming Copilot+ PC era and aims to balance the heavy simultaneous demands of AI, battery savings, cost control, and thin system design. It’s doing that with a reorg of the CPU portion (containing both P- and E-Cores), the new IGP, and a revamped NPU. Let’s start with the compute portion.The New CPU Cores: A New ‘Island’ GovernmentLunar Lake isn’t just about more muscle (though it has that) or optimizing for all-out performance (though it can shift between efficient and high-performance modes as needed). This silicon generation is a rethink based on power efficiency and is purpose-built to better run the right tasks on the right cores at the right time.The compute tile (one of the “chiplets” that comprise the processor as a whole) represents a complete redesign from Meteor Lake. Lunar Lake features four Performance cores and four Efficient cores, with the four P-cores sharing 12MB of L3 cache and each core allocated 2.5MB of L2. The four E-cores have been moved to a single Low Power Island (LPI) cluster; they share 4MB of L2 cache, double the amount of Meteor Lake’s LPI. As far as classic CPU performance is concerned, Intel claims single-threaded performance matching Meteor Lake at half the power consumption. Plus, the architecture allows finer ramping up and down of P-core clocks, in increments of 16.67MHz versus the earlier 100MHz.

(Credit: John Burek)

The architecture behind the P-cores is dubbed “Lion Cove” with that of the E-cores called “Skymont.” With Lion Cove, Intel has dropped the Hyper-Threading thread-doubling capabilities (a.k.a. symmetric multithreading) that have prevailed for two decades to improve performance with heavily threaded tasks. (The company has said Hyper-Threading could return where it makes sense in future designs.) The logic behind ditching Hyper-Threading here: E-cores are now so effective for day-to-day tasks (and you’ll find more E-cores in the LPI) that they can deliver comparable performance without SMT most of the time. (The kinds of tasks that Lunar Lake systems will face tend to be lightly threaded, at four threads or less.)It’s not all down to the hardware, either: Intel’s Thread Director (the firmware that, using machine learning, tips off the operating system’s scheduler as to which cores are best suited to run which portions of a task) can land on the proper core in more fine-grained fashion than before. Applications can be cast into different “containment zones” in the OS that indicate the best cores to use, and laptop makers can fine-tune Thread Director aggressiveness for performance versus efficiency. The ability to shift between core types mid-task has been streamlined, too.

(Credit: John Burek)

Another factor is the fact that the main system memory is being integrated into the SoC. Intel estimates Lion Cove should bring about a 14% performance improvement versus Meteor Lake’s “Redwood Cove” P-cores.The new Skymont E-cores look to be the real stars this time, however. The E-cores on the LPI have more cache and cores than their Meteor Lake “Crestmont” counterparts; Intel projects up to a whopping 68% more instructions per clock (IPC).

(Credit: John Burek)

In addition, the new E-cores are expected to be 20% to 80% more performant per watt at the low end. This range illustrates how Intel was able to sideline Hyper-Threading in the new design. Regardless, the cores are designed to cover a full range of workloads, putting task threads to the most efficient cores first and only moving up to more power-hungry P-cores when necessary.Another key change is the second-gen redesign of the LPI, the home of the efficiency cores. Meteor Lake marked the debut of the Low Power Island, but only some E-cores resided on the LPI while others were on the compute tile proper. With Lunar Lake, all of the E-cores are contained in one LPI unit. That keeps the compute tile from “lighting up” when only E-core-appropriate tasks are required.Meet NPU 4: Pops of the TOPSIntel notes that the NPU in Lunar Lake is technically its “NPU 4.0,” following that of Meteor Lake and two earlier discrete-chip efforts with its acquired Movidius tech in 2018 and 2021. As before, the NPU is expected near term to deliver the computational oomph for “persistent” background tasks that require AI acceleration (such as blur effects, background imposition, and filters in video calls). The NPU will also cover AI assistants and generative AI—but at a lower power cost than shunting those tasks onto the CPU or GPU. As before, the NPU tends to represent concentrated power savings rather than “performance first.”

(Credit: John Burek)

As mentioned, the Lunar Lake NPU is rated for 48 peak TOPS (under INT8) versus the 11.5 of Meteor Lake’s NPU, while being twice as power-efficient. Intel also increased the engine count and frequency and, naturally, tweaked the architecture. Vector and matrix calculations predominate in AI workloads, and much of the change has centered around servicing those with specialized engines.The mechanics of exactly how the new NPU works are beyond the scope of the average reader (or this writer), but the top-line points are that Intel claims Lunar Lake’s NPU is the largest dedicated AI accelerator for PCs today, with six neural compute engines plus a dozen specialized digital signal processors that speed up operations with large language models (LLMs) and transformer workloads. Intel sees the client AI world evolving, with certain portions of the processor better suited to servicing particular tasks. The chipmaker sees approximately 30% of its software partners favoring the NPU for primary AI acceleration, which leaves a significant number writing code that relies on the CPU and GPU.The New Xe2 Graphics: Stealth ‘Battlemage’ Emerges First on Lunar LakeSome excitement buzzed when Intel teased that Lunar Lake’s new integrated graphics processor (IGP) would be based on the follow-up to its first-generation Arc “Alchemist” graphics core, “Battlemage.” Most onlookers assumed that Battlemage would appear first in discrete graphics solutions as Alchemist did. Here, it shows up in an IGP solution called Xe2, along the lines of the Iris Xe integrated graphics seen before. Intel notes that Xe2 should benefit from the two years of software growing pains that Arc went through.

Recommended by Our Editors

(Credit: John Burek)

Intel’s overarching claims for Xe2 include up to one and a half times the graphics performance of the Meteor Lake IGP, a big leap if it pans out. This IGP is all about the eights: It comprises eight next-gen Xe cores, eight enhanced ray-tracing units, and 8MB of new “memory side” cache. The Xe cores also have an AI acceleration aspect governed by what Intel dubs Xe Matrix Extensions (XMX). These are the power behind the 67 peak TOPS (again under INT8) that the IGP contributes to Lunar Lake’s total 120 TOPS.The display engine portion of the graphics tile is another matter. With three discrete display pipes, the output supports DisplayPort, HDMI 2.1, and (new here) eDisplayPort 1.5. The engine allows up to four connections including eDP (used internally for laptop panels). Supported peak resolution and screen refresh rate combos include 1080p or 1440p at up to 360Hz, up to three 4K HDR streams at 60Hz, and 8K HDR at 60Hz.Intel notes several tweaks to this portion of the IGP, including panel replay which yields some fine-grained reduction in power consumption during idle frames or even static portions of a given frame. Additional fine-tuning includes more efficient frame sending (to reduce the number of times a resource needs to be activated) or reducing the need to send redundant frames at all, which saves power. Repeated portions of frames, such as aspect-ratio blackout bars, can be withheld for further efficiencies.Of course, the beneficial effects of the display engine tweaks will vary according to the application and use case, but Intel showed a slide that listed power savings for various tasks such as YouTube viewing or multi-tab web browsing.The new memory-side cache comes into play with the IGP’s media engine portion. The aim of this cache is to limit (where possible) access to system memory when dealing with media workloads. It’s also designed to cut power consumption when doing encodes. Encode/decode at 10-bit HDR 8K 60Hz is supported, with support for (among other codecs) AVC, HEVC (H.265), AV1, and VP9. New to the party is VVC decode (H.266), which Intel says will reduce bitrate while preserving quality at 10% file-size reduction versus AV1. One of VVC’s tricks is to send “reference data” in a frame to a buffer that retains it for use, preserving bandwidth.

(Credit: John Burek)

Much of this will be under-the-hood stuff invisible to most users. But for visual appeal, Intel showed off the spanking-new game F1 2024 running at more than 60fps on the Xe2 IGP, albeit with use of the company’s XeSS upscaling.

(Credit: John Burek)

Intel’s Moonshot: A Mobile Processor for a Fast-Moving TargetAll this is just an overview of the silicon. There’s much more nuance attached to Lunar Lake that’s beyond the scope of this article, such as the integration of Wi-Fi 7 and Bluetooth 5.4—not to mention the chipmaker’s recommendation to manufacturers that all Lunar Lake PCs have at least two and preferably three 40Gbps Thunderbolt 4 ports. That would be a boon for high-speed external drive connectivity for emerging storage devices and easier data sharing and system sync via the emerging Thunderbolt Share connectivity.

(Credit: John Burek)

Overall, Lunar Lake looks well positioned to catch the AI-everywhere zeitgeist. As Intel’s Rob Bruckner pointed out, AI will come at us in a trio of waves. The first, centered around machine learning, was roughly in a parallel timescale to Meteor Lake, reflecting the first steps into a PC understanding and adapting to how we use it and the application of AI to a few critical background tasks.The AI wave underway nowadays—the highly visible and controversial generative-experience wave—comprises things like Microsoft’s Copilot, inferencing, user-accessible LLMs, and AI enhancements to entertainment and work collaboration. The last, the agent wave, is coming fast: You’ll empower AI agents as virtual “experts” on your behalf to execute complex tasks.Meeting these demands to pkeep this revolution going is as much about amping up compute performance as maintaining battery life and efficiency at scale. Of course, retaining a balance among the contending elements will be critical.Considering that Intel forecasts an addressable market of $1 trillion for AI processors by 2030, you can see a lot of incentive for the company and its competitors to ramp up on the predictable data-center side of things (just look at Nvidia in 2023 and 2024!). But this goes for the client side, too, offloading some of the insatiable processing, bandwidth, and cost demands on the data center to local processing. As several Intel reps noted during its days of presentations, AI in processors is today roughly where graphics acceleration on processors was years ago. Once, integrated graphics seemed pointless and barely able to light up the screen. Today, IGPs are far more capable and intrinsic to most CPUs.Look ahead a few years, and local AI processing won’t be the kind of discrete novelty we see right now. AI will just be part and parcel of every processor—a natural, inseparable part. Lunar Lake looks to be a giant leap in that on-die space race. With systems expected to hit the market in Q3 for the holiday season, we’ll know soon enough if Intel succeeded in achieving orbit.[Note: PCMag attended Intel’s 2024 Tech Tour in Taipei by invitation but paid the cost of all travel and lodging.]

Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.