An AI Model Just Compressed An Entire Encyclopedia Into A Single, High-Resolution Image.


An AI Model Just Compressed An Entire Encyclopedia Into A Single, High-Resolution Image.

A mind-blowing breakthrough by China’s DeepSeek-OCR, they just unleashed an electrifying 3-billion-parameter vision-language model that obliterates the boundaries between text and vision with jaw-dropping optical compression. It isn’t just an OCR upgrade, it’s a seismic paradigm shift on how machines perceive and conquer data. DeepSeek-OCR compresses long documents into vision tokens with a staggering 97% decoding precision at a 10x compression ratio. That’s thousands of textual tokens distilled into a mere 100 vision tokens per page, outmuscling GOT-OCR2.0 (256 tokens) and MinerU2.0 (6,000 tokens) by up to 60x fewer tokens on the OmniDocBench.

This article and academic paper is sponsored by Read Multiplex Members who subscribe here to support my work.

Link: https://readmultiplex.com/join-us-become-a-member/

It is also sponsored by many who have donated a “cup of Coffee”.

Link: https://ko-fi.com/brianroemmele

It’s like compressing an entire encyclopedia into a single, high-definition snapshot. At the core of this is the DeepEncoder, a turbocharged fusion of the SAM (Segment Anything Model) and CLIP (Contrastive Language–Image Pretraining) backbones, supercharged by a 16x convolutional compressor.

This maintains high-resolution perception while slashing activation memory, transforming thousands of image patches into a lean 100-200 vision tokens. Get ready for the multi-resolution “Gundam” mode—scaling from 512×512 to a monstrous 1280×1280 pixels!

It blends local tiles with a global view, tackling invoices, blueprints, and newspapers with zero retraining. It’s a shape-shifting computational marvel, mirroring the human eye’s dynamic focus with pixel-perfect precision.

The training data?

Supplied by the Chinese government for free and not available to any US company.

You understand now why I have said the US needs a Manhattan Project for AI training data? Do you hear me now? Oh still no? I’ll continue.

Over 30 million PDF pages across 100 languages, spiked with 10 million natural scene OCR samples, 10 million charts, 5 million chemical formulas, and 1 million geometry problems. This model doesn’t just read..it devours scientific diagrams and equations, turning raw data into a multidimensional knowledge.

Throughput? I am floored: over 200,000 pages per day on a single NVIDIA A100 GPU! This scalability is a game-changer, turning LLM data generation into a firehose of innovation, democratizing access to terabytes of insight for every AI pioneer out there.

This optical compression is the holy grail for LLM long-context woes. Imagine a million-token document shrunk into a 100,000-token visual map—DeepSeek-OCR reimagines context as a perceptual playground, paving the way for a GPT-5 that processes documents like a supercharged visual cortex!

The two-stage architecture is pure engineering poetry: DeepEncoder generates tokens, while a Mixture-of-Experts decoder spits out structured Markdown with multilingual flair. It’s a universal translator for the visual-textual multiverse, optimized for global domination!

Benchmarks? DeepSeek-OCR obliterates GOT-OCR2.0 and MinerU2.0, holding 60% accuracy at 20x compression! This opens a portal to applications once thought impossible—pushing the boundaries of computational physics into uncharted territory!

Live document analysis, streaming OCR for accessibility, and real-time translation with visual context are now economically viable, thanks to this compression breakthrough. It’s a real-time revolution, ready to transform our digital ecosystem!

This paper is a blueprint for the future proving text can be visually compressed 10x for long-term memory and reasoning. It’s a clarion call for a new AI era where perception beats just text, and models like Grok see documents in a single glance.

Clarifying the Concept in DeepSeek-OCR

DeepSeek-OCR isn’t about creating or modifying images to store data inside them (like hiding files or text within a picture file). Instead, its “optical compression” (or “contexts optical compression”) refers to an efficient way the AI model processes and represents visual information from images or documents during OCR (optical character recognition).

Here’s a breakdown:

  • What it actually does: The model takes an input image (e.g., a scanned document, PDF page, chart, or photo with text) and compresses its visual details into a small number of “vision tokens”—internal representations that the AI uses to understand and extract content. For example, it can boil down an entire document page into about 100 vision tokens while keeping 97% accuracy in decoding text, layouts, formulas, or even handwritten notes. This compression happens inside the model’s processing pipeline to make it faster and more efficient (e.g., using fewer computational resources like GPU memory), especially for large-scale tasks like processing 200,000+ pages per day on a single GPU.
  • Why it’s not about storing data in images: These vision tokens aren’t saved as part of a new image file; they’re temporary data structures used by the AI to generate outputs like extracted text, Markdown-formatted documents, or descriptions. The model outputs readable text (e.g., converting a blurry scan into clean, editable content), not a modified image. It’s more like a super-efficient scanner/reader than a data-hiding tool. The compression is for AI inference speed and token efficiency in language models, not for creating storage mediums.
  • Where confusion might come from: Terms like “10x compression” and “100 vision tokens per page” sound like data storage (e.g., shrinking files), but in this context, it’s about reducing the input size for AI processing without losing key details. If you’re running DeepSeek-OCR locally (as in the installation guide), you input an existing image and get text output—nothing gets embedded or stored in a new image.

The paper: https://github.com/deepseek-ai/DeepSeek-OCR

Installing DeepSeek-OCR Locally

System Requirements

Normally I would recommend LM Studios for beginners, to install this model, however LM Studios has not been able to work the first generation of DeepSeek OCR and may not get this ability for a while with the new versions. So we have to go on a different more complex path.

Before installing DeepSeek-OCR, ensure your computer meets these minimum requirements. DeepSeek-OCR is a 3-billion-parameter vision-language model designed for optical character recognition (OCR) and document processing. It performs best with GPU acceleration, but can run on CPU (though much slower, especially for large files).

  • Operating System:
  • Windows 10 or 11 (64-bit).
  • macOS 12 (Monterey) or later (Apple Silicon recommended for better performance; Intel Macs will use CPU only).
  • Linux (e.g., Ubuntu 20.04 or later, or equivalent distributions like Fedora or Debian).
  • Hardware:
  • CPU: Modern multi-core processor (e.g., Intel i5/i7 or AMD equivalent; Apple M1 or later for Macs).
  • RAM: At least 16 GB (32 GB recommended for processing large documents or images).
  • Storage: At least 10 GB free space (for the model files, dependencies, and temporary files).
  • GPU (optional but highly recommended for speed): NVIDIA GPU with at least 8 GB VRAM (e.g., RTX 3060 or better). CUDA compute capability 6.0 or higher. Not available on Macs without external NVIDIA hardware.
  • Software:
  • Python: Version 3.12 or later (installed via Conda for ease).
  • Conda: Version 23.x or later (a package manager that simplifies environment setup; we’ll install it).
  • Git: For downloading the code repository.
  • Internet Connection: Required during installation for downloading packages and the model (offline use possible afterward).
  • For GPU Users: CUDA Toolkit 11.8 or later (installed separately on Windows/Linux; not supported on Macs).

Note: On Macs without NVIDIA GPUs, you’ll use CPU or Apple’s Metal Performance Shaders (MPS) for acceleration on Apple Silicon. Performance on CPU will be slower (e.g., minutes per page vs. seconds on GPU). If you have an older computer, test with small images first.

Introduction

DeepSeek-OCR is an advanced AI model developed by DeepSeek-AI, released on October 20, 2025. It specializes in “contexts optical compression,” meaning it can efficiently process images, documents, and PDFs—extracting text, layouts, and even formulas with high accuracy while using fewer computational resources than similar models. For example, it can compress a full document page into just 100 “vision tokens” (a way AI models represent visual data) while achieving 97% accuracy on benchmarks.

Installing DeepSeek-OCR locally on your computer allows you to run it offline, ensuring privacy (no data sent to cloud servers) and customization. This is ideal for developers, researchers, or anyone handling sensitive documents like scanned books, invoices, or charts. Unlike cloud-based OCR tools, a local installation gives you full control, but it requires some setup.

This guide assumes you have only basic computer knowledge—no prior experience with programming or AI models. We’ll explain every term and step in simple language. “Command line” or “terminal” refers to a text-based interface where you type commands (like a black window for entering instructions). We’ll walk you through opening it and copying/pasting commands.

The installation involves:

  1. Installing basic tools (Conda and Git).
  2. Downloading the code and model.
  3. Setting up a “virtual environment” (a isolated space for the software to avoid conflicts with other programs).
  4. Installing dependencies (required libraries like PyTorch for AI computations).
  5. Testing the installation.

We’ll cover separate instructions for Windows, macOS, and Linux, as some steps differ by OS. If you encounter errors, refer to the “Debugging” section at the end.

Installation on Windows

Windows users: We’ll use the Command Prompt or PowerShell for commands. If you have an NVIDIA GPU, we’ll enable CUDA for faster performance.

Step 1: Install Conda

Conda is a free tool that manages Python and packages, making installation easier.

  • Go to https://docs.conda.io/en/latest/miniconda.html in your web browser.
  • Scroll to “Miniconda3 Windows 64-bit” and download the installer (it’s an .exe file).
  • Double-click the downloaded file to run it.
  • Follow the on-screen prompts: Accept the license, choose “Just Me” installation, and check “Add Miniconda3 to my PATH environment variable” (this lets you use Conda from the command line).
  • Click “Install” and wait (it may take a few minutes).
  • Once done, close and reopen any open windows.

Step 2: Install Git

Git is a tool for downloading code from online repositories.

  • Go to https://git-scm.com/downloads.
  • Download the Windows installer.
  • Run the .exe file.
  • Follow prompts: Use default options, but ensure “Git from the command line and also from 3rd-party software” is selected.
  • Finish the installation.

Step 3: Open Command Prompt

  • Press Windows key + S, type “cmd”, and open “Command Prompt”.

Step 4: Create a Project Folder

  • In Command Prompt, type: mkdir deepseek-ocr and press Enter (this creates a folder).
  • Type: cd deepseek-ocr and press Enter (navigates into the folder).

Step 5: Clone the Repository

  • Type: git clone https://github.com/deepseek-ai/DeepSeek-OCR.git and press Enter.
  • Wait for it to download (may take a minute).

Step 6: Create and Activate Conda Environment

  • Type: conda create -n deepseek-ocr python=3.12 -y and press Enter (creates an isolated Python space).
  • Type: conda activate deepseek-ocr and press Enter (activates it; you’ll see “(deepseek-ocr)” in the prompt).

Step 7: Install Dependencies

  • If you have an NVIDIA GPU:
  • First, install CUDA if not already: Download from https://developer.nvidia.com/cuda-11-8-download-archive (select Windows, your version, and follow instructions).
  • Then, type: pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118 and press Enter.
  • If no GPU (CPU only):
  • Type: pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu and press Enter.
  • Install other packages: Type pip install transformers==4.46.3 tokenizers==0.20.3 einops addict easydict and press Enter.
  • If using GPU, install Flash Attention: pip install flash-attn==2.7.3 --no-build-isolation. Skip this for CPU (it won’t work and isn’t needed).

Step 8: Download and Test the Model

  • Navigate to the repo: cd DeepSeek-OCR\DeepSeek-OCR-hf (use backslashes on Windows).
  • Create a test script: Open Notepad, paste the code below, save as test_ocr.py in the current folder.
  from transformers import AutoModel, AutoTokenizer
  import torch

  model_name = 'deepseek-ai/DeepSeek-OCR'
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_name, trust_remote_code=True, use_safetensors=True)

  # For CPU: Comment out or remove .cuda() if present in model code
  device = 'cuda' if torch.cuda.is_available() else 'cpu'
  dtype = torch.bfloat16 if device == 'cuda' else torch.float32
  model = model.eval().to(device).to(dtype)

  prompt = "<image>\n<|grounding|>OCR this image."
  image_file = 'path/to/your/image.jpg'  # Replace with a real image path
  output_path = 'output'  # Folder for results

  res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path=output_path, base_size=1024, image_size=640, crop_mode=True, save_results=True, test_compress=True)
  print(res)
  • Replace path/to/your/image.jpg with a real image file path (e.g., C:\Users\YourName\Downloads\test.jpg).
  • Run: python test_ocr.py and press Enter. It should download the model (several GB) and output OCR results.

Installation on macOS

macOS users: We’ll use Terminal for commands. Apple Silicon Macs can use MPS for speedup; Intel Macs use CPU.

Step 1: Install Conda

  • Open Safari or Chrome, go to https://docs.conda.io/en/latest/miniconda.html.
  • Download “Miniconda3 macOS Apple M1 64-bit bash” (for Apple Silicon) or “Miniconda3 macOS Intel x86 64-bit bash” (for Intel).
  • Open Terminal (press Command + Space, type “Terminal”).
  • Navigate to Downloads: cd Downloads.
  • Install: bash Miniconda3-latest-MacOSX-arm64.sh (or the file name you downloaded). Follow prompts (type “yes” when asked).

Step 2: Install Git

  • Git is usually pre-installed. Check: Type git --version in Terminal. If not, download from https://git-scm.com/downloads and install.

Step 3: Open Terminal and Create Project Folder

  • In Terminal, type: mkdir deepseek-ocr and Enter.
  • cd deepseek-ocr.

Step 4: Clone the Repository

  • git clone https://github.com/deepseek-ai/DeepSeek-OCR.git.

Step 5: Create and Activate Conda Environment

  • conda create -n deepseek-ocr python=3.12 -y.
  • conda activate deepseek-ocr.

Step 6: Install Dependencies

  • For Apple Silicon (MPS): pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0.
  • For Intel Mac (CPU): Same as above (PyTorch auto-detects).
  • Other packages: pip install transformers==4.46.3 tokenizers==0.20.3 einops addict easydict.
  • Skip Flash Attention (not supported on Mac).

Step 7: Download and Test the Model

  • cd DeepSeek-OCR/DeepSeek-OCR-hf.
  • Create test_ocr.py using TextEdit or nano (in Terminal: nano test_ocr.py, paste code, Ctrl+O to save, Ctrl+X to exit).
  • Use this modified code (for MPS/CPU):
  from transformers import AutoModel, AutoTokenizer
  import torch

  model_name = 'deepseek-ai/DeepSeek-OCR'
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
  model = AutoModel.from_pretrained(model_name, trust_remote_code=True, use_safetensors=True)

  device = 'mps' if torch.backends.mps.is_available() else 'cpu'
  dtype = torch.float32  # BF16 may not work on MPS/CPU
  model = model.eval().to(device).to(dtype)

  prompt = "<image>\n<|grounding|>OCR this image."
  image_file = '/path/to/your/image.jpg'  # e.g., /Users/YourName/Downloads/test.jpg
  output_path = 'output'

  res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path=output_path, base_size=1024, image_size=640, crop_mode=True, save_results=True, test_compress=True)
  print(res)
  • Run: python test_ocr.py.

Installation on Linux

Linux users: Use your distribution’s terminal. We’ll assume Ubuntu; adjust for others (e.g., use yum instead of apt on Fedora).

Step 1: Install Conda

  • Open browser, go to https://docs.conda.io/en/latest/miniconda.html.
  • Download “Miniconda3 Linux 64-bit”.
  • Open Terminal (Ctrl+Alt+T on Ubuntu).
  • cd Downloads.
  • bash Miniconda3-latest-Linux-x86_64.sh. Follow prompts.

Step 2: Install Git

  • sudo apt update && sudo apt install git -y (enter password if prompted).

Step 3: Create Project Folder

  • mkdir deepseek-ocr.
  • cd deepseek-ocr.

Step 4: Clone the Repository

  • git clone https://github.com/deepseek-ai/DeepSeek-OCR.git.

Step 5: Create and Activate Conda Environment

  • conda create -n deepseek-ocr python=3.12 -y.
  • conda activate deepseek-ocr.

Step 6: Install Dependencies

  • If GPU: Install CUDA first (https://developer.nvidia.com/cuda-11-8-download-archive, follow Linux instructions).
  • GPU: pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118.
  • CPU: pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu.
  • Others: pip install transformers==4.46.3 tokenizers==0.20.3 einops addict easydict.
  • GPU only: pip install flash-attn==2.7.3 --no-build-isolation.

Step 7: Download and Test the Model

  • cd DeepSeek-OCR/DeepSeek-OCR-hf.
  • Create test_ocr.py with nano: nano test_ocr.py, paste GPU/CPU code from Windows section (adjust device=’cuda’ or ‘cpu’).
  • Run: python test_ocr.py.

Using DeepSeek-OCR

Once installed, DeepSeek-OCR can be used for:

  • AI Model Training
  • Document Conversion: Turn scanned PDFs or images into editable Markdown or text (e.g., prompt: “\n<|grounding|>Convert the document to markdown.”).
  • Text Extraction: OCR handwritten notes, receipts, or books in 100+ languages.
  • Chart/Formula Parsing: Analyze diagrams or math equations (e.g., “\nParse the figure.”).
  • Image Description: Generate detailed captions (e.g., “\nDescribe this image in detail.”).
  • Batch Processing: Run on folders of files for automation, like archiving old documents.
  • Integration: Embed in scripts for apps, like a personal document scanner.

For PDFs, use the vLLM scripts in the repo (GPU only; edit config.py for paths).

Debugging and Troubleshooting

If something goes wrong here are common issues:

  • Command Not Found (e.g., conda, git): Ensure they’re installed and added to PATH. Restart your terminal/Command Prompt.
  • Pip Install Errors: Check internet; try pip install --upgrade pip. If wheel fails, ensure Python version matches (3.12).
  • CUDA Not Available: Verify NVIDIA drivers/CUDA installed (run nvidia-smi on Windows/Linux). For CPU, remove .cuda() from code.
  • Flash-Attn Install Fails: Skip it for CPU/MPS; load model without ‘_attn_implementation’ arg.
  • Model Download Stuck: Hugging Face may throttle; try later or use a VPN. Model is ~6-8 GB.
  • Out of Memory: Reduce image_size (e.g., to 512) or use smaller mode (tiny/small).
  • Whitespace-Only Output: Change prompt (e.g., from grounding to “Free OCR.”).
  • Slow Performance: Normal on CPU; use smaller images or get a GPU.
  • Permission Denied: Run as admin (Windows: right-click Command Prompt > Run as Admin; Linux: sudo).
  • For more, check the GitHub issues: https://github.com/deepseek-ai/DeepSeek-OCR/issues.

I’ll have more insights as I do more testing on this platfom. I am experimenting with this now on 1870-1970 offline data that I have digitalized.

I will offer more of my insight soon to ReadMultiplex.com members below. If you are not a member, join us. This supports my work. You can also just buy me a Coffee to keep me awake. Either way, thank you!

🔐 Start: Exclusive Member-Only Content.


Membership status:

This content is for members only.

🔐 End: Exclusive Member-Only Content.

~—~

~—~

~—~





Subscribe ($99) or donate by Bitcoin.

Copy address: bc1qkufy0r5nttm6urw9vnm08sxval0h0r3xlf4v4x

Send your receipt to Love@ReadMultiplex.com to confirm subscription.




Stay updated: Get an email when we post new articles:

Subscribe to this site, ReadMultiplex.com. This is not for paid site membership or to access member content. To become a member, please choose "Join Us" on the main menu.

Loading

https://storage.ko-fi.com/cdn/generated/zfskfgqnf/2025-03-01_rest-04ee17dcb4ef5575e6f109e83a757a27-a5qpfwqc.jpg

“Abraxas

THE ENTIRETY OF THIS SITE IS UNDER COPYRIGHT. IMPORTANT: Any reproduction, copying, or redistribution, in whole or in part, is prohibited without written permission from the publisher. Information contained herein is obtained from sources believed to be reliable, but its accuracy cannot be guaranteed. We are not financial advisors, nor do we give personalized financial advice. The opinions expressed herein are those of the publisher and are subject to change without notice. It may become outdated, and there is no obligation to update any such information. Recommendations should be made only after consulting with your advisor and only after reviewing the prospectus or financial statements of any company in question. You shouldn’t make any decision based solely on what you read here. Postings here are intended for informational purposes only. The information provided here is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or other qualified healthcare provider with any questions you may have regarding a medical condition. Information here does not endorse any specific tests, products, procedures, opinions, or other information that may be mentioned on this site. Reliance on any information provided, employees, others appearing on this site at the invitation of this site, or other visitors to this site is solely at your own risk.

Copyright Notice:

All content on this website, including text, images, graphics, and other media, is the property of Read Multiplex or its respective owners and is protected by international copyright laws. We make every effort to ensure that all content used on this website is either original or used with proper permission and attribution when available. However, if you believe that any content on this website infringes upon your copyright, please contact us immediately using our 'Reach Out' link in the menu. We will promptly remove any infringing material upon verification of your claim. Please note that we are not responsible for any copyright infringement that may occur as a result of user-generated content or third-party links on this website. Thank you for respecting our intellectual property rights.

DCMA Notices are followed entirely please contact us here: Love@ReadMultiplex.com