Skip to main content
  1. Data Science Blog/

Topic Modeling with BERT

·225 words·2 mins· loading · ·
Natural Language Processing (NLP) Data Analysis & Visualization AI/ML Models Language Models (LLMs) NLP Applications Text Analysis Natural Language Processing (NLP) Machine Learning Text Mining

Topic Modeling with BERT

Topic Modeling with BERT
#

Key steps in BERTopic modelling are as following.

  • Use “Sentence Embedding” models to embed the sentences of the article
  • Reduce the dimensionality of embedding using UMAP
  • Cluster these documents (reduced dimensions) using HDBSAN
  • Use c-TF-IDF extract keywords, their frequency and IDF for each cluster.
  • MMR: Maximize Candidate Relevance. How many words in a topic can represent the topic?
  • Intertopic Distance Map
  • Use similarity matrix (heatmap), dandogram (hierarchical map), to visualize the topics and key_words.
  • Traction of topic over time period. Some may be irrelevant and for other traction may be increasing or decreasing.

Installation
#

# Installation, with sentence-transformers, can be done using pypi:

pip install bertopic

# If you want to install BERTopic with other embedding models, you can choose one of the following:

# Choose an embedding backend
pip install bertopic[flair, gensim, spacy, use]

# Topic modeling with images
pip install bertopic[vision]

Supported Topic Modelling Techniques
#

BERTopic supports all kinds of topic modeling techniques as below.

  • Guided
  • Supervised
  • Semi-supervised
  • Manual
  • Multi-topic distributions
  • Hierarchical
  • Class-based
  • Dynamic
  • Online/Incremental
  • Multimodal
  • Multi-aspect
  • Text Generation/LLM
  • Merge Models

Related Resources#

Tools in BERTopic
#

Tools-in-BERTopic

Best Topic Modeling Tool in BERTopic
#

BEST-Tools-in-BERTopic

BERTopic Model Building
#

BERTopic-Model-Building

Application
#

  • arXiv Dataset (1.7m+ STEP papers)
  • Images/photographs
  • Historical Documents
  • News articles

Related

Quantum Measurement, Randomness, and Everyday Technology
·778 words·4 mins· loading
Interdisciplinary Topics Research & Academia Quantum Physics Quantum Mechanics Quantum Computing Interdisciplinary Topics
Quantum Measurement, Randomness, and Everyday Technology # This is Part 2 of Learning Quantum …
AI Agents as First-Class Citizens: Why Managing the Digital Workforce Is the Next HR Challenge
·2607 words·13 mins· loading
Artificial Intelligence Business & Career Technology Trends & Future AI Integration Future of Work AI Governance Organizational Design Generative AI
AI Agents as First-Class Citizens # Why Managing the Digital Workforce Is the Next HR Challenge …
When Consciousness Becomes Cosmos: Fields, Particles, Matter, and the Emergence of Size
·5741 words·27 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Quantum Field Theory Consciousness Physics Advaita Vedanta Philosophy of Mind Emergence Metaphysics
When Consciousness Becomes Cosmos # From Consciousness to Cosmos: Fields, Particles, Matter, and …
Occam's Razor: Why the Simplest Explanation Often Wins
·994 words·5 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Data Science Occam's Razor Critical Thinking Scientific Method Simplicity Decision Making Machine Learning Software Development
Occam’s Razor: Why the Simplest Explanation Often Wins # Prefer fewer assumptions until the …
From Claw Code to Clean Room: A Developer's Guide to Re-implementing Software Without Getting Sued
·2854 words·14 mins· loading
AI Ethics & Governance Software Development Technology Trends & Future Clean Room Design Intellectual Property AI Code Generation Software Copyright Trade Secrets Software Development
From Claw Code to Clean Room: A Developer’s Guide to Re-implementing Software Without Getting …