Skip to main content
  1. Data Science Blog/

Topic Modeling with BERT

·225 words·2 mins· loading · ·
Natural Language Processing (NLP) Data Analysis & Visualization AI/ML Models Language Models (LLMs) NLP Applications Text Analysis Natural Language Processing (NLP) Machine Learning Text Mining

Topic Modeling with BERT

Topic Modeling with BERT
#

Key steps in BERTopic modelling are as following.

  • Use “Sentence Embedding” models to embed the sentences of the article
  • Reduce the dimensionality of embedding using UMAP
  • Cluster these documents (reduced dimensions) using HDBSAN
  • Use c-TF-IDF extract keywords, their frequency and IDF for each cluster.
  • MMR: Maximize Candidate Relevance. How many words in a topic can represent the topic?
  • Intertopic Distance Map
  • Use similarity matrix (heatmap), dandogram (hierarchical map), to visualize the topics and key_words.
  • Traction of topic over time period. Some may be irrelevant and for other traction may be increasing or decreasing.

Installation
#

# Installation, with sentence-transformers, can be done using pypi:

pip install bertopic

# If you want to install BERTopic with other embedding models, you can choose one of the following:

# Choose an embedding backend
pip install bertopic[flair, gensim, spacy, use]

# Topic modeling with images
pip install bertopic[vision]

Supported Topic Modelling Techniques
#

BERTopic supports all kinds of topic modeling techniques as below.

  • Guided
  • Supervised
  • Semi-supervised
  • Manual
  • Multi-topic distributions
  • Hierarchical
  • Class-based
  • Dynamic
  • Online/Incremental
  • Multimodal
  • Multi-aspect
  • Text Generation/LLM
  • Merge Models

Related Resources#

Tools in BERTopic
#

Tools-in-BERTopic

Best Topic Modeling Tool in BERTopic
#

BEST-Tools-in-BERTopic

BERTopic Model Building
#

BERTopic-Model-Building

Application
#

  • arXiv Dataset (1.7m+ STEP papers)
  • Images/photographs
  • Historical Documents
  • News articles

Related

From Claw Code to Clean Room: A Developer's Guide to Re-implementing Software Without Getting Sued
·2854 words·14 mins· loading
AI Ethics & Governance Software Development Technology Trends & Future Clean Room Design Intellectual Property AI Code Generation Software Copyright Trade Secrets Software Development
From Claw Code to Clean Room: A Developer’s Guide to Re-implementing Software Without Getting …
100 Websites You Only Need on the Internet
·1402 words·7 mins· loading
Data Science Resources Data Science Artificial Intelligence Developer Tools AI Tools Productivity Tools Online Learning
100 Websites You Only Need on the Internet # The internet has billions of pages. Most of them are …
The AI Leadership Playbook: A Reusable Workflow Template
·939 words·5 mins· loading
Business & Career Artificial Intelligence Career Development AI Integration Generative AI Future of Work
The AI Leadership Playbook: A Reusable Workflow Template # Part 7 of the Human Skills, AI-Expanded …
Agentic AI for Business Leaders: When Agents Help and When They Do Not
·967 words·5 mins· loading
Artificial Intelligence Business & Career Technology Trends & Future Career Development AI Integration Generative AI Future of Work
Agentic AI for Business Leaders: When Agents Help and When They Do Not # Part 6 of the Human …
AI for Technology Executives: Scenarios and Prompts
·1169 words·6 mins· loading
Business & Career Artificial Intelligence Technology Trends & Future Career Development AI Integration Generative AI Cybersecurity
AI for Technology Executives: Scenarios and Prompts # Part 5 of the Human Skills, AI-Expanded …