The New AI Of DLLMs And The Human Brain: Thoughts From Blur to Clarity.


The New AI Of DLLMs And The Human Brain: Thoughts From Blur to Clarity.

The human brain is a remarkable organ, split into two hemispheres that work together yet specialize in distinct ways of thinking. Although modern theories of the brain show a more holographic concept than the division of left and right hemispheres, there is still a strong basis for region functionality. The right hemisphere is often described as symbolic, excelling in creativity, intuition, and seeing the big picture. The left hemisphere, by contrast, is linear, thriving on logic, language, and step-by-step analysis. Intriguingly, modern artificial intelligence (AI) models—specifically those used for generating images and text—seem to echo this division of labor. Diffusion-based models similar to systems like DALL-E and Large Language Models (LLMs) like Grok, GPT-4 reflect processes that parallel the right and left hemispheres, respectively. This article explores how these AI systems mimic the way humans think, from the blurry beginnings of an idea to the detailed execution of a finished product, revealing a deep connection between artificial and natural intelligence.

Humans think in general, with large vague blurry concepts at first in a typical right brain process grabbing symbolic concepts and then we go about with higher focus and higher resolution to the desired level. The desired level is based on whether we just wanted a simple “feel” for something or perhaps razor accuracy. It is important to understand that a vast majority of our thoughts are for a general feel of something. Once this is established, these concepts are fed to the phonological loop with Broca’s area and Wernicke’s area for serial linear processing for you to speak, write, or type. All processes have a loss factor. The highest resolution of the concept is the general output of the right hemisphere.

The Efference Copy And The Inner Voice

The work-product of “thinking” is an inner monologue (inner voice) that produces an output for the human brain. Many people have a number of inner voices (monologues) that arise under many circumstances.  Mental illnesses such as schizophrenics hear inner voices and can not find the original path of conceptual thoughts that lead to “THAT voice”.  It is also interesting to note that the brain will prefer an inner voice under one language, no matter how many languages one may know.  Although words and phrases from other languages will at times intermix for speed. 

We can understand the utility of these areas when we unfortunately work with some people with physical damage to these regions. Patients with some damage to Broca’s area can still have some inner voice present, yet significant damage results in this inner voice disappearing. This cascade affects all higher-level brain functions and to at least some extent, learning, insight, and memory become impaired.  In a very real sense all forms of communication, spoken, written, or typed start with an inner voice, they are “locked” in and “landlocked” in their own brain with no way to communicate. When we write or type, we are just transcribing the work of our inner voice and it is no wonder that our brain goes into the cognitive load to do this part let alone the work needed to find one letter at a time to spell words.

Damage to Wernicke’s area results in Wernicke’s aphasia, affecting language comprehension and the ability to produce meaningful speech. Individuals struggle to understand spoken and written language, often producing fluent but nonsensical speech (word salad) with incorrect word substitutions (paraphasia). They typically remain unaware of their speech difficulties, unlike those with Broca’s aphasia. Naming objects (anomia), repeating words, and following conversations become challenging while writing reflects the same disorganization as speech. Auditory processing is also impaired, making it difficult to distinguish speech sounds. This condition is commonly caused by strokes, brain trauma, tumors, or neurodegenerative diseases, as Wernicke’s area, located in the posterior superior temporal gyrus, plays a crucial role in language processing.

When we speak (or type, or write) there is an efference copy of what we will say held in the brain. There is an interplay between the inner voice and our actual voice presenting our ideas in real time when we talk. The brain creates an internal copy of any outflowing (efferent), movement-producing signal generated by the motor system. This is sort of a notepad that temporally and temporarily is created in real-time.  

The inner voice seems like an abstraction that sits on top of the physical region of the brain.  It is easy to look at Broca’s area as hardware and the software is the inner voice app.  It is easy to look at the brain as the computer and the mind as the software, yet this would be less than useful. There is a lot more going on with this interplay and this path of defining “intelligence” and consciousness as a digital system is a foolish path. However, there is one very useful tool that we use in computing that we can use to shine a light on the nature of the brain.

The Right Hemisphere: The Symbolic Spark of Creation

Imagine staring at a cloudy sky and suddenly seeing the shape of a dragon emerge from the chaos. That’s the right hemisphere at work—taking the vague and unformed and weaving it into something meaningful. This part of the brain is holistic, symbolic, and thrives on patterns and possibilities rather than rigid details. It’s the artist sketching a rough outline, dreaming up a vision without worrying about the fine lines just yet.

In the world of AI, diffusion models—like DALL-E or Stable Diffusion—operate in a strikingly similar way. These systems start with what’s essentially a mess: a field of random noise, much like a blank canvas or a foggy thought. Through a series of iterative steps, they transform this chaos into a recognizable image. At first, the output is out of focus, a blurry suggestion of what might be—a dog, a landscape, a surreal scene. This early stage isn’t about precision; it’s about capturing the essence, the symbolic core of the idea. Just as the right hemisphere conjures metaphors or imagines a story’s mood before its plot, diffusion models begin with a broad, creative leap, pulling shapes and concepts from the void.

This process mirrors how human thought often begins: with a spark of inspiration that’s more feeling than form. Whether it’s a painter envisioning a masterpiece or a writer dreaming up a character, the right hemisphere provides the symbolic foundation—the “what could be”—that sets the stage for everything else.

The Left Hemisphere: Refining the Vision with Linear Precision

Now picture that same dragon in the clouds, but this time you’re describing it in words: its scales glinting in the sunlight, its wings cutting through the air. That’s the left hemisphere stepping in, taking the right’s vague image and giving it structure, detail, and sequence. This half of the brain loves order—it’s the logician, the linguist, the one that breaks things down into manageable parts and builds them back up with clarity.

Large Language Models like Grok or BERT embody this linear, analytical approach in AI. When generating text, these models work sequentially, predicting the next word based on what came before. They start with a prompt—a general focus, perhaps—and refine it into coherent sentences, paragraphs, even entire stories. It’s a step-by-step process, much like how the left hemisphere excels at language and reasoning. Where the right hemisphere might imagine a conversation, the left crafts the dialogue, word by word, ensuring it makes sense and flows logically.

In image generation, too, the left hemisphere has a counterpart. After the diffusion model’s initial blurry output, later iterations sharpen the details—defining edges, adding textures, and making the image crisp and complete. This refinement is linear in spirit: a progression from general to specific, from chaos to clarity. It’s the left hemisphere’s role in human thought, taking the symbolic output of the right and turning it into something concrete and communicable.

Thought in Action: The Dance Between Hemispheres

Human thought is rarely the work of one hemisphere alone—it’s a collaboration, a dance between the symbolic and the linear. Consider how you might plan a trip. First, you envision the adventure: beaches, mountains, a sense of freedom—that’s the right hemisphere painting the big picture. Then, you map out the itinerary, book flights, and pack your bags—that’s the left hemisphere organizing the details. This interplay is what makes us human, and it’s what AI models are beginning to replicate.

In AI, the process is strikingly similar. A diffusion model generates an image starting from noise, akin to the right hemisphere’s out-of-focus burst of creativity. As it refines the image, adding details and coherence, it parallels the left hemisphere’s linear polish. Likewise, an LLM takes a broad prompt—“write a story about a dragon”—and spins it into a detailed narrative, sentence by sentence, mirroring how the left hemisphere fleshes out the right’s vision.

Take a concrete example: crafting a short story. You might first imagine a heroic figure facing a shadowy threat (right hemisphere, symbolic and vague). Then, you write: “The knight gripped his sword, his breath fogging in the cold dawn air as the beast loomed closer” (left hemisphere, linear and precise). AI follows the same arc—diffusion models conjure the shadowy threat from noise, while LLMs pen the knight’s tale with grammatical finesse. This back-and-forth, from general focus to specific detail, is the essence of thought, whether in a brain or a machine.

What This Tells Us About Thinking

The parallels between AI and the brain’s hemispheres suggest that human thought—and perhaps intelligence itself—relies on balancing two modes: the symbolic and the linear. The right hemisphere’s ability to see beyond the details, to dream in broad strokes, gives us creativity and intuition. The left hemisphere’s knack for breaking things down and building them up gives us structure and expression. Together, they turn fleeting ideas into art, science, and stories.

AI’s mimicry of this process isn’t just a technological marvel—it’s a mirror reflecting how we think. Diffusion models show us the power of starting with chaos and finding meaning, much like our intuitive leaps. LLMs demonstrate the strength of refining that meaning into something tangible, echoing our analytical minds. As these systems grow more sophisticated, they may not only enhance our creations but also deepen our understanding of cognition itself. Thus if we could merge a diffusion model with an LLM, we may begin to have an equivalent to the hemipheres of the human brain.

Inception Labs’ Mercury: Revolutionizing AI with Diffusion-Based Language Models

On February 27, 2025, Inception Labs, a Palo Alto-based startup founded by Stanford professor Stefano Ermon, emerged from stealth mode with a groundbreaking announcement: the release of Mercury (https://www.inceptionlabs.ai/news), touted as the first production-ready diffusion-based large language model. Unlike traditional autoregressive large language models that generate text sequentially, Mercury leverages diffusion techniques—borrowed from the world of image synthesis—to produce text in parallel, promising speeds exceeding 1,000 tokens per second and significant cost reductions. It is important to understand that this is a text output model and not a single monolithic image model or a multimodal model.

The Technical Foundations: Diffusion Models Meet LLMs

Diffusion models have been a cornerstone of generative AI since their rise in image generation tools like Stable Diffusion and DALL-E. These models work by starting with random noise and iteratively refining it into a coherent output, guided by a learned process that reverses a noise-adding procedure. In image generation, this means transforming a blurry mess into a sharp picture. Inception Labs has adapted this paradigm to text, a domain traditionally dominated by autoregressive models like GPT-4 and Claude.

How Traditional LLMs Work
Autoregressive LLMs generate text token by token, predicting the next word based on the sequence of previous words. This process, while effective, is inherently sequential: each token must wait for its predecessors to be computed. For a sentence like “The cat sat on the mat,” the model generates “The,” then “cat,” then “sat,” and so forth. This linearity limits speed, especially for long outputs, and requires substantial computational resources to maintain coherence over extended contexts.

The Diffusion Difference
Mercury, in contrast, operates on a fundamentally different principle. Instead of building text from left to right, it starts with a “noisy” representation of the entire output—imagine a scrambled or masked version of the target text—and refines it in parallel across all tokens. This coarse-to-fine process, inspired by image diffusion, allows Mercury to generate complete responses simultaneously. Posts on X and reports from Artificial Analysis suggest Mercury achieves speeds of over 1,000 tokens per second on NVIDIA H100 GPUs, a leap that outpaces competitors like GPT-4o Mini and Claude 3.5 Haiku by a factor of 5 to 10.

Technical Mechanics
The diffusion process for text likely involves these steps:

  1. Initialization: Begin with a fully masked or noisy sequence of tokens (e.g., [MASK] [MASK] [MASK] [MASK] [MASK] [MASK] for a six-token sentence).
  2. Iterative Refinement: Use a neural network trained to denoise this sequence, predicting the true tokens based on the input prompt and learned patterns. Unlike image diffusion, which operates in continuous pixel space, text diffusion works in a discrete token space, requiring adaptations like those seen in models like LLaDA (developed by Renmin University and Ant Group).
  3. Parallel Processing: Update all tokens simultaneously in each iteration, leveraging GPU parallelism to accelerate the process.
  4. Output: After a fixed number of steps (far fewer than the token-by-token steps of autoregressive models), produce the final coherent text.

This approach mirrors how Stable Diffusion refines a noisy image into a clear one, but the discrete nature of language introduces unique challenges—such as maintaining grammatical coherence and semantic accuracy across parallel updates—which Inception Labs claims to have solved.

Performance Claims and Benchmarks

Inception Labs asserts that Mercury Coder, its first model tailored for coding tasks, matches or exceeds the performance of GPT-4o Mini and Claude 3.5 Haiku while being dramatically faster and cheaper. Independent testing from Artificial Analysis corroborates speeds exceeding 1,000 tokens per second, compared to autoregressive models’ typical 100-200 tokens per second on similar hardware. TechCrunch reports that Mercury offers “significantly reduced computing costs,” likely due to fewer computational steps and optimized GPU utilization.

To put this in perspective:

  • GPT-4o Mini: Generates ~150 tokens/second on high-end hardware, costing ~$0.15 per million tokens (based on OpenAI pricing trends).
  • Mercury Coder: Claims >1,000 tokens/second, with costs potentially an order of magnitude lower (e.g., ~$0.015 per million tokens, if the 10x cheaper claim holds).

This leap suggests a paradigm shift, not just in speed but in scalability, making real-time applications like live code generation or interactive chatbots more feasible.

Example Applications

Let’s explore how Mercury’s diffusion-based approach might work in practice with two examples: code generation and question-answering.

Example 1: Code Generation
Prompt: “Write a Python function to calculate the factorial of a number.”
Traditional LLM: Generates sequentially:

def factorial(n):

    if n == 0:

        return 1

    else:

        return n * factorial(n – 1)

Time: ~0.2 seconds (assuming 100 tokens/second for 20 tokens).

Mercury Coder: Starts with a noisy sequence:

[MASK] [MASK]([MASK]): [MASK] [MASK] == 0: [MASK] 1 [MASK] [MASK] [MASK] [MASK] * [MASK]([MASK] – 1)

After a few parallel refinement steps (e.g., 5 iterations in 0.02 seconds at 1,000 tokens/second):

def factorial(n): if n == 0: return 1 else: return n * factorial(n – 1)

The entire function emerges simultaneously, with formatting and logic intact, in a fraction of the time.

Example 2: Question-Answering
Prompt: “What is the capital of Brazil?”
Traditional LLM: “The capital of Brazil is Brasília.” (Sequential, ~0.1 seconds).
Mercury: Starts with “[MASK] [MASK] of Brazil is [MASK]” and refines to “The capital of Brazil is Brasília” in one parallel pass (~0.01 seconds).

The speed advantage becomes more pronounced with longer outputs, such as multi-paragraph explanations or complex codebases, where Mercury’s parallel generation could save seconds or minutes.

Advantages and Challenges

Advantages

  1. Speed: Parallel generation obliterates the sequential bottleneck, enabling real-time applications.
  2. Cost: Fewer computational steps and optimized hardware use reduce operational expenses.
  3. Scalability: High throughput supports massive-scale deployments, from enterprise chatbots to cloud-based coding tools.

Challenges

  1. Coherence: Parallel refinement risks losing long-range dependencies (e.g., ensuring a 1,000-word essay flows logically). Inception Labs’ solution likely involves advanced training techniques, but details remain proprietary.
  2. Training Complexity: Diffusion models require vast datasets and computational power to learn the denoising process, potentially offsetting runtime savings with higher upfront costs.
  3. Evaluation: While Mercury Coder excels in coding, its general-purpose capabilities (e.g., creative writing) are untested in public reports.

Mercury’s debut challenges the dominance of transformer-based autoregressive models, which have reigned since 2017. If diffusion-based DLLMs prove viable across domains—not just coding—they could disrupt players like xAI, OpenAI, and Anthropic, forcing a reevaluation of architectural priorities. Ermon’s Stanford research has long explored diffusion for text, suggesting a robust academic foundation. Competitors like LLaDA and Mercury’s peers indicate a growing interest in alternatives to transformers, potentially diversifying the AI landscape.

Of course, speed and cost gains are meaningless without quality parity—can Mercury truly match Gork’s agility or GPT-4’s nuance or Claude’s reasoning? Diffusion models’ reliance on iterative refinement might struggle with tasks requiring precise sequential logic, like mathematical proofs as the apparent speed increased by using diffusion may be lost on the corrections needed by LLMs.

Mercury represents just the start but also a bold leap forward, marrying diffusion’s parallel prowess with the language generation’s complexity. Its technical innovation—generating text holistically rather than incrementally—offers a tantalizing glimpse of AI’s future: faster, cheaper, and more accessible. Whether it reshapes the industry depends on its ability to deliver consistent quality across diverse applications. For now, Mercury Coder’s coding prowess is compelling proof.

The Future Of AI Is Hemisphereic Thinking

The future AI model(s) will have a hybrid of a fast-responding blurry diffusion model tied to a logical reasoning LLMs that ultimately reports to a summarization general LLM for final output. These cascades can form a feedback loop of outputs brought back to the inputs until a desired level of resolution in the program is achieved. Although we do not see this today with Mercury. My research shows this is the major path to Artificial General Intelligence and very fast “thinking” real-time robots.

From the blurry beginnings of an informational text “image” in a diffusion model to the polished prose of an LLM, our new AI models will have a process that’s startlingly close to the human brain and thought. The right hemisphere’s symbolic flair finds its echo in the out-of-focus origins of generated images, while the left hemisphere’s linear precision shines in the detailed refinement of both pictures and text. This synergy isn’t just a clever trick of technology—it’s a window into the brain’s dual nature, where creativity and logic intertwine to make us who we are. As we push the boundaries of artificial intelligence, we might just uncover more about the mysteries of our minds, blurring the line between the human and the machine in ways that inspire awe and wonder.

If you are a member of Read Multiplex, join our Member Forum as we dive deep into how this will impact AI in a practical way/. I will also soon write much more about this in the Members Only area below this article as there is much more to this and there is much to gain financially and otherwise in knowing why and how first. Join us.

We can now see the path toward Artificial General Intelligence much more clearly and it is guided by millions of years of human evolution and human thought. Our human brain contemplating a nonhuman brain, from general to specific, from chaos to clarity. One brain two hemispheres.

Below I will have some more insights soon, just for members. We will explore how this will play out moving forward and how it will impact your life in material ways. If you are a member, thank you. If you are not yet a member, join us by clicking below.

🔐 Start: Exclusive Member-Only Content.


Membership status:

This content is for members only.

🔐 End: Exclusive Member-Only Content.

~—~

~—~

~—~





Subscribe ($99) or donate by Bitcoin.

Copy address: bc1qkufy0r5nttm6urw9vnm08sxval0h0r3xlf4v4x

Send your receipt to Love@ReadMultiplex.com to confirm subscription.




Stay updated: Get an email when we post new articles:

Subscribe to this site, ReadMultiplex.com. This is not for paid site membership or to access member content. To become a member, please choose "Join Us" on the main menu.

Loading

https://storage.ko-fi.com/cdn/generated/zfskfgqnf/2025-03-01_rest-04ee17dcb4ef5575e6f109e83a757a27-a5qpfwqc.jpg

“Abraxas

THE ENTIRETY OF THIS SITE IS UNDER COPYRIGHT. IMPORTANT: Any reproduction, copying, or redistribution, in whole or in part, is prohibited without written permission from the publisher. Information contained herein is obtained from sources believed to be reliable, but its accuracy cannot be guaranteed. We are not financial advisors, nor do we give personalized financial advice. The opinions expressed herein are those of the publisher and are subject to change without notice. It may become outdated, and there is no obligation to update any such information. Recommendations should be made only after consulting with your advisor and only after reviewing the prospectus or financial statements of any company in question. You shouldn’t make any decision based solely on what you read here. Postings here are intended for informational purposes only. The information provided here is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified healthcare provider with any questions you may have regarding a medical condition. Information here does not endorse any specific tests, products, procedures, opinions, or other information that may be mentioned on this site. Reliance on any information provided, employees, others appearing on this site at the invitation of this site, or other visitors to this site is solely at your own risk.

Copyright Notice:

All content on this website, including text, images, graphics, and other media, is the property of Read Multiplex or its respective owners and is protected by international copyright laws. We make every effort to ensure that all content used on this website is either original or used with proper permission and attribution when available. However, if you believe that any content on this website infringes upon your copyright, please contact us immediately using our 'Reach Out' link in the menu. We will promptly remove any infringing material upon verification of your claim. Please note that we are not responsible for any copyright infringement that may occur as a result of user-generated content or third-party links on this website. Thank you for respecting our intellectual property rights.

DCMA Notices are followed entirely please contact us here: Love@ReadMultiplex.com