Skip to main content
  1. Data Science Blog/

Exploring Graphics Processing Units (GPUs)

·713 words·4 mins· loading · ·
AI Hardware & Infrastructure IT Infrastructure Artificial Intelligence (AI) Machine Learning (ML) AI Hardware

On This Page

Table of Contents
Share with :

Exploring Graphics Processing Units (GPUs)

Exploring Graphics Processing Units (GPUs)
#

Overall Computational Power of GPUs
#

  • โšก Incredible Calculation Speed: Modern GPUs can perform tens of trillions of calculations per second (e.g., 36 trillion for Cyberpunk 2077).
  • ๐ŸŒ Human Comparison: Achieving this manually would require the equivalent of over 4,400 Earths full of people, each doing one calculation every second.

GPU vs. CPU
#

  • ๐Ÿšข Cargo Ship vs. Airplane Analogy: GPUs are like cargo ships (massive capacity, slower), and CPUs are like jets (fast, versatile, fewer tasks at once).
  • โš–๏ธ Different Strengths: CPUs handle operating systems, flexible tasks, and fewer but more complex instructions. GPUs excel at huge amounts of simple, repetitive calculations.
  • ๐Ÿ”€ Parallel vs. General Purpose: GPUs are less flexible but highly parallel, CPUs are more general-purpose and can run a wide variety of programs and instructions.

GPU Architecture & Components (GA102 Example)
#

  • ๐Ÿ’ฝ Central GPU Die (GA102): A large chip with 28.3 billion transistors organized into Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and cores.
  • ๐Ÿ—๏ธ Hierarchical Structure: GA102 has 7 GPCs โ†’ 12 SMs per GPC โ†’ 4 Warps per SM โ†’ 32 CUDA Per Wrap and 4 Tensor Per Warmp and 1 Ray Tracing Per GPC.
  • ๐Ÿ”ข Types of Cores:
    • โš™๏ธ CUDA Cores: Handle basic arithmetic (addition, multiplication) most commonly used in gaming.
    • ๐Ÿงฉ Tensor Cores: Perform massive matrix calculations for AI and neural networks.
    • ๐Ÿ’Ž Ray Tracing Cores: Specialized for lighting and reflection calculations in real-time graphics.

Manufacturing & Binning
#

  • ๐Ÿ”ง Shared Chip Design: Different GPU models (e.g., 3080, 3090, 3090 Ti) share the same GA102 design.
  • ๐Ÿ•ณ๏ธ Defects & Binning: Manufacturing imperfections result in some cores being disabled. This leads to different โ€œtiersโ€ of the same GPU architecture.

CUDA Core Internals
#

  • โž• Simple Calculator Design: Each CUDA core is basically a tiny calculator that does fused multiply-add (FMA) and a few other operations.
  • ๐Ÿ’ป Common Operations: Primarily handles 32-bit floating-point and integer arithmetic. More complex math (division, trignometry) is done by fewer, special function units.

Memory Systems: GDDR6X & GDDR7
#

  • ๐Ÿ’พ Graphics Memory: GDDR6X chips (by Micron) feed terabytes of data per second into the GPUโ€™s thousands of cores.
  • ๐Ÿš€ High Bandwidth: GPU memory operates at huge bandwidths (over 1 terabyte/s) compared to typical CPU memory (~64 GB/s).
  • ๐Ÿ”ข Beyond Binary: GDDR6X and GDDR7 use multiple voltage levels (PAM-4 and PAM-3) to encode more data per signal, increasing transfer rates.
  • ๐Ÿ—๏ธ Future Memory Tech: Micron also develops HBM (High Bandwidth Memory) for AI accelerators, stacking memory chips in 3D, greatly boosting capacity and speed while reducing power.

Parallel Computing Concepts (SIMD & SIMT)
#

  • โ™ป๏ธ Embarrassingly Parallel: Tasks like graphics rendering, Bitcoin mining, or AI training are easily split into millions of independent calculations.
  • ๐Ÿ“œ Single Instruction Multiple Data (SIMD): Apply the same instruction to many data points at onceโ€”perfect for transforming millions of vertices in a 3D scene.
  • ๐Ÿ”“ From SIMD to SIMT: Newer GPUs use Single Instruction Multiple Threads (SIMT), allowing threads to progress independently and handle complex branching more efficiently.

Thread & Warp Organization
#

  • ๐Ÿ“ฆ Thread Hierarchy: Threads โ†’ Warps (groups of 32 threads) โ†’ Thread Blocks โ†’ Grids.
  • ๐ŸŽ›๏ธ Gigathread Engine: Manages the allocation of thread blocks to streaming multiprocessors, optimizing parallel processing.

Practical Applications
#

  • ๐ŸŽฎ Video Games: GPUs transform coordinates, apply textures, shading, and handle complex rendering pipelines. Millions of identical operations on different vertices and pixels are done in parallel.
  • โ‚ฟ Bitcoin Mining: GPUs can run the SHA-256 hashing algorithm in parallel many millions of times per second. Though now replaced by ASIC miners, GPUs were initially very efficient at this.
  • ๐Ÿค– AI & Neural Networks: Tensor cores accelerate matrix multiplications critical for training neural nets and powering generative AI.
  • ๐Ÿ’ก Ray Tracing: Specialized cores handle ray tracing calculations for realistic lighting and reflections in real-time graphics.

Micronโ€™s Role & Advancements
#

  • ๐Ÿญ Micron Memory Chips: GDDR6X and future GDDR7 designed by Micron power high-speed data transfers on GPUs.
  • ๐Ÿ”ฎ Innovations in Memory: High Bandwidth Memory (HBM) for AI chips stacks DRAM vertically, creating high-capacity, high-throughput solutions at lower energy costs.
  • ๐Ÿ“š Technological Marvel: Modern graphics cards are a blend of advanced materials, clever architectures, and innovative manufacturing. They enable astonishing levels of visual realism, parallel computation, and AI capabilities.

How do Graphics Cards Work? Exploring GPU Architecture

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

Roadmap to Reality
·990 words·5 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Scientific Journey Self-Discovery Personal Growth Cosmic Perspective Human Evolution Technology Biology Neuroscience
Roadmap to Reality # A Scientific Journey to Know the Universe โ€” and the Self # ๐ŸŒฑ Introduction: The โ€ฆ
From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours
·893 words·5 mins· loading
Personal Branding Cybersecurity Technology Trends & Future Personal Branding LinkedIn Profile Professional Identity Cybersecurity Online Presence Digital Identity Online Branding
๐Ÿ’” From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours # “In โ€ฆ
Exploring CSS Frameworks - A Collection of Lightweight, Responsive, and Themeable Alternatives
·1378 words·7 mins· loading
Web Development Frontend Development Design Systems CSS Frameworks Lightweight CSS Responsive CSS Themeable CSS CSS Utilities Utility-First CSS
Exploring CSS Frameworks # There are many CSS frameworks and approaches you can use besides โ€ฆ
Dimensions of Software Architecture: Balancing Concerns
·873 words·5 mins· loading
Software Architecture Software Architecture Technical Debt Maintainability Scalability Performance
Dimensions of Software Architecture # Call these “Architectural Concern Categories” or โ€ฆ
Understanding `async`, `await`, and Concurrency in Python
·616 words·3 mins· loading
Python Asyncio Concurrency Synchronous Programming Asynchronous Programming
Understanding async, await, and Concurrency # Understanding async, await, and Concurrency in Python โ€ฆ