Skip to main content
  1. Data Science Blog/

Types of Large Language Models (LLM)

·1071 words·6 mins· loading · ·
Language Models (LLMs) AI/ML Models Language Models (LLMs) Artificial Intelligence (AI) Natural Language Processing (NLP) Machine Learning (ML) AI Models Generative AI AI Technology Deep Learning (DL)

On This Page

Table of Contents
Share with :

Introduction:
#

The world of Generative AI (GenAI) is expanding at an astonishing rate, with new models emerging almost daily, each sporting unique names, capabilities, versions, and sizes. For AI professionals, keeping track of these models can feel like a full-time job. But for business users, IT professionals, and software developers trying to make the right choice, understanding the model’s name and what it represents can seem overwhelming. Wouldn’t it be helpful if we could decode the meaning behind these names to know if a model fits our needs and is worth the investment? In this article, we’ll break down how the names of GenAI models can reveal clues about their functionality and suitability for specific tasks, helping you make informed decisions with confidence.

Keep in mind you cannot read the entire letter from the envelop. But, from the handwriting, address from, address to, paper of envelop, weight of envelop speaks a lots. Sometimes envelop can confuse and for that purpose reading is the only choice, but most of the time you would know whether you should read it or handover it to someone else, as you are too busy to read all the letters.

There are indeed several variants of large language models (LLMs), each tailored for specific tasks or user interactions. Each variant serves its unique purpose, making LLMs adaptable across various domains, from customer service to creative arts and technical industries.

Instruct Models (Instruction-following models)
#

  • Purpose: These models are trained to follow instructions and produce concise, specific responses to queries.
  • Example Models: OpenAI’s GPT-4-turbo with instruct settings, Google’s T5.
  • Example Use Case: A project manager inputs “Generate a weekly report summary for the team based on last week’s meeting notes.” The model returns a structured report based on the prompt. They can be used for
    • Task automation,
    • Question answering,
    • Writing assistance,
    • Code generation, and
    • Instruction-based prompts.

Chat Models
#

  • Purpose: Designed for interactive dialogue, these models manage conversational turns, retain context, and respond coherently across exchanges.
  • Example Models: OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Bard.
  • Example Use Case: A healthcare chatbot discusses symptoms with patients and schedules appointments, holding context and providing personalized responses. These models can be used for creating
    • Customer support bots,
    • Personal assistants,
    • Tutoring bots, and
    • Interactive Q&A.

Code Models
#

  • Purpose: Code models are tuned to understand, generate, and refactor code, aiding programming tasks.
  • Example Models: OpenAI’s Codex, GitHub Copilot, Meta’s CodeLLaMA.
  • Example Use Case: A developer asks, “How do I implement a binary search algorithm?” The model generates code with comments explaining each step. They can be used for
    • Code completion,
    • Code debugging,
    • Documentation generation, and
    • Code refactoring.

Vision Models
#

  • Purpose: Vision models integrate text and image processing, interpreting and linking visual content with textual input.
  • Example Models: OpenAI’s GPT-4 with vision, Google’s Bard with image capabilities, DeepMind’s Flamingo.
  • Example Use Case: An e-commerce assistant identifies products in customer-uploaded photos, suggesting similar items from the catalog. These models can be used for
    • Image captioning,
    • Object recognition,
    • Multi-modal content generation, and
    • Visual Q&A.

Embedding Models
#

  • Purpose: Optimized to transform text into embeddings, these models facilitate similarity searches and semantic comparisons.
  • Example Models: OpenAI’s Ada embeddings, Google’s Universal Sentence Encoder, Cohere’s embeddings.
  • Example Use Case: These models can be used for
    • Semantic search,
    • Recommendation engines,
    • Clustering, and
    • Information retrieval.

Multimodal Models
#

  • Purpose: Capable of processing multiple data types, these models handle text, images, audio, and sometimes video for more comprehensive responses.
  • Example Models: Meta’s ImageBind, OpenAI’s GPT-4 with multimodal capabilities.
  • Example Use Case: A smart assistant in augmented reality uses visual and text input to explain objects, overlaying instructions and tips in real-time. These models can be used for
    • Complex queries involving text and images,
    • Virtual assistants that can process sensory data, and
    • Interactive educational tools.

Fine-Tuned Specialized Models
#

  • Purpose: Tailored for specific domains like medical, legal, or scientific fields, these models handle specialized knowledge and terminology.
  • Example Models: Legal-BERT, BioBERT, BloombergGPT.
  • Example Use Case: A legal assistant uses the model to quickly analyze case law, retrieving relevant information and saving manual review time. A fined tuned model can be used for
    • Legal document summarization,
    • Medical diagnosis support, and
    • Scientific literature analysis.

Summarization Models
#

  • Purpose: These models distill key information from long texts into concise summaries while retaining essential meanings.
  • Example Models: OpenAI’s GPT-3.5/4 with summarization settings, Hugging Face’s BART, Pegasus.
  • Example Use Case: A journalist summarizes a 20-page report into a brief article, capturing key findings and statistics. These models can be used by professionals for
    • Summarizing articles,
    • Generating abstracts,
    • Creating meeting notes, and
    • Summarizing legal or technical documents.

Generative Art Models
#

  • Purpose: Focused on creative outputs like stories, poetry, and dialogue, these models support artistic endeavors.
  • Example Models: ChatGPT in creative mode, Google’s LaMDA, character AI for roleplay.
  • Example Use Case: An author co-writes a fantasy novel with the model, generating dialogues and descriptive passages based on story arcs. A use can perform following tasks with these kinds of models
    • Storytelling,
    • Poetry generation,
    • Creative writing support, and
    • Interactive fiction.

Model Naming Convention
#

There is no standard around the model convention but model repository maintainer names these model in such a way that it is easy to identify. Thus the model name contains.

  • Creator (organization/individual)
  • Architecture (like GPT, Llama, Bard, gemma etc)
  • Capability (code, instruction, vision, language, english, multi-modal, etc.)
  • Model version (1,2,3 etc.)
  • Model size (1B, 3B, 7B, 11B, 70B, 504B etc.)

Example of GenAI Models
#

Below are some example of GenAI models. Now, you can read one at and time and figure out who made them, what modality they serve, what are their capabilities and what performance you expect from that model.

  • core42/jais-13b
  • core42/jais-13b-chat
  • elyza/ELYZA-japanese-Llama-2-13b
  • elyza/ELYZA-japanese-Llama-2-13b-instruct
  • elyza/ELYZA-japanese-Llama-2-7b
  • elyza/ELYZA-japanese-Llama-2-7b-instruct
  • google/codegemma-1.1-2b
  • google/codegemma-1.1-7b-it
  • google/codegemma-2b
  • google/codegemma-7b
  • google/gemma-1.1-7b-it
  • google/gemma-2b
  • google/gemma-2b-it
  • google/gemma-7b
  • meta-llama/Llama-3.2-11B-Vision
  • meta-llama/Llama-3.2-11B-Vision-Instruct
  • meta-llama/Llama-3.2-1B
  • meta-llama/Llama-3.2-1B-Instruct
  • meta-llama/Llama-3.2-3B
  • meta-llama/Llama-3.2-3B-Instruct
  • meta-llama/Llama-3.2-90B-Vision
  • meta-llama/Llama-3.2-90B-Vision-Instruct
  • meta-llama/Meta-Llama-3-70B
  • meta-llama/Meta-Llama-3-70B-Instruct
  • meta-llama/Meta-Llama-3-8B
  • meta-llama/Meta-Llama-3-8B-Instruct
  • meta-llama/Meta-Llama-3.1-405B-FP8
  • meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
  • meta-llama/Meta-Llama-3.1-70B
  • meta-llama/Meta-Llama-3.1-70B-Instruct
  • meta-llama/Meta-Llama-3.1-8B
  • meta-llama/Meta-Llama-3.1-8B-Instruct
  • microsoft/Phi-3-vision-128k-instruct
  • microsoft/Phi-3-mini-128k-instruct
  • microsoft/Phi-3-mini-4k-instruct
  • microsoft/Phi-3-mini-4k-instruct-gguf-fp16
  • microsoft/Phi-3-mini-4k-instruct-gguf-q4
  • mistralai/Mixtral-8x7B-v0.1
  • mistralai/Mistral-7B-Instruct-v0.3
  • Mistral-7B-Instruct-v0.2
  • Mistral-7B-v0.1
  • Mistral-7B-Instruct-v0.1
  • tiiuae/falcon-7b
  • falcon-40b-instruct
  • phi-2
  • CodeLlama-34b-Instruct-hf
  • CodeLlama-13b-Instruct-hf
  • CodeLlama-7b-Instruct-hf
  • Mixtral-8x7B-Instruct-v0.1

Conclusion:
Decoding the names of GenAI models equips us with insights that go beyond surface-level branding. By understanding what each component of a model’s name represents—its purpose, generation, size, and intended use—you can make better decisions aligned with your goals, whether you’re an AI expert or a newcomer. As the GenAI field continues to grow, being able to interpret these model names will remain a valuable skill, allowing you to stay ahead and choose the right tools to drive success in your projects.

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

Roadmap to Reality
·916 words·5 mins· loading
Philosophy & Cognitive Science Interdisciplinary Topics Scientific Journey Self-Discovery Personal Growth Cosmic Perspective Human Evolution Technology Biology Neuroscience
Roadmap to Reality # A Scientific Journey to Know the Universe — and the Self # 🌱 Introduction: The …
From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours
·893 words·5 mins· loading
Personal Branding Cybersecurity Technology Trends & Future Personal Branding LinkedIn Profile Professional Identity Cybersecurity Online Presence Digital Identity Online Branding
💔 From Being Hacked to Being Reborn: How I Rebuilt My LinkedIn Identity in 48 Hours # “In …
Exploring CSS Frameworks - A Collection of Lightweight, Responsive, and Themeable Alternatives
·1378 words·7 mins· loading
Web Development Frontend Development Design Systems CSS Frameworks Lightweight CSS Responsive CSS Themeable CSS CSS Utilities Utility-First CSS
Exploring CSS Frameworks # There are many CSS frameworks and approaches you can use besides …
Dimensions of Software Architecture: Balancing Concerns
·871 words·5 mins· loading
Software Architecture Software Architecture Technical Debt Maintainability Scalability Performance
Dimensions of Software Architecture # Call these “Architectural Concern Categories” or …
Understanding `async`, `await`, and Concurrency in Python
·637 words·3 mins· loading
Python Asyncio Concurrency Synchronous Programming Asynchronous Programming
Understanding async, await, and Concurrency # Understanding async, await, and Concurrency in Python …