14 minute read

Github Repos for Data Science

Github-Repos-for-DataScience

Sno. Repo Name Repo Description Language Starred Fork
1 Link A curated list of awesome transformer models.   504 35
2 Link :memo: An awesome Data Science repository to learn and apply for real world problems.   21350 5461
3 Link Detailed and tailored guide for undergraduate students or anybody want to dig deep into the field of AI with solid foundation.   6440 900
4 Link A collaborative catalog of NLP resources for Indic languages   432 64
5 Link The code from the Machine Learning Bookcamp book and a free course based on the book Jupyter Notebook 6042 1562
6 Link Publication-ready NN-architecture schematics. JavaScript 3770 478
7 Link This is a very early attempt at having chatGPT work within a telegram bot Python 1639 247
8 Link Code and data associated with the book “Statistics for Data Scientists: 50 Essential Concepts” R 1018 633
9 Link Source Code for ‘Text Analytics with Python,’ 2nd Edition by Dipanjan Sarkar Jupyter Notebook 72 70
10 Link 365 Days Computer Vision Learning Linkedin Post   395 129
11 Link 500 AI Machine learning Deep learning Computer vision NLP Projects with code   12607 3747
12 Link Computer Vision Papers of the week   16 4
13 Link Various vrittis associated with the ashtadhyayi Python 8 5
14 Link Python library to aid with your Hindi NLP tasks Python 2 2
15 Link   Python 38 27
16 Link A library for training and deploying machine learning models on Amazon SageMaker Python 1856 985
17 Link 200+ detailed flashcards useful for reviewing topics in machine learning, computer vision, and computer science.   1631 135
18 Link A curated list of community detection research papers with implementations. Python 2145 356
19 Link Fast Python Collaborative Filtering for Implicit Feedback Datasets Python 3177 599
20 Link State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm etc. Python 1449 317
21 Link 🌸 Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading Python 4807 175
22 Link Framework for fast prototyping of Graph Neural Networks Python 37 15
23 Link BookNLP, a natural language processing pipeline for books Python 700 74
24 Link The Carpentries website HTML 62 122
25 Link Instructor Training   161 271
26 Link A curated list of awesome Deep Learning tutorials, projects and communities.   21030 5852
27 Link The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels. Python 6057 500
28 Link A project to deploy an online app that predicts the win probability for each NBA game every day. Demonstrates end-to-end Machine Learning deployment. Jupyter Notebook 104 12
29 Link Learn deep learning with tensorflow2.0, keras and python through this comprehensive deep learning tutorial series. Learn deep learning from scratch. Deep learning series for beginners. Tensorflow tutorials, tensorflow 2.0 tutorial. deep learning tutorial python. Jupyter Notebook 643 1685
30 Link 🦘 Explore multimedia datasets at scale Jupyter Notebook 920 41
31 Link Python library for converting Python calculations into rendered latex. CSS 5211 398
32 Link Allows to scale the ChatGPT API to multiple simultaneous sessions with infinite contextual and adaptive memory powered by GPT and Redis datastore. Python 363 39
33 Link   Jupyter Notebook 9796 17732
34 Link 🧠 Material for the Deep Learning Study Group   385 52
35 Link Explanation to key concepts in ML   4297 355
36 Link ✍️ A carefully curated list of NLP paper summaries   1453 249
37 Link   Python 16 12
38 Link Fish Weight Prediction Deployment Python 1 0
39 Link   Jupyter Notebook 1 0
40 Link Malaria Detection Deployed PureBasic 1 0
41 Link NLP Jupyter Notebook 2 0
42 Link code for deep learning courses Jupyter Notebook 877 282
43 Link Epidemic Modeling for Everyone Jupyter Notebook 261 72
44 Link To map publicly available datasets related to General Assembly (Lok Sabha) elections in India. Jupyter Notebook 137 114
45 Link Free MLOps course from DataTalks.Club Jupyter Notebook 7175 1416
46 Link Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS. Python 2721 235
47 Link Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology Python 4367 1499
48 Link List of Computer Science courses with video lectures.   56725 8028
49 Link Contains relevant notebooks for the hands-on NLP workshop for the Analytics India Magazine Plugin Conference -2020 Edition Jupyter Notebook 70 45
50 Link Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models. Jupyter Notebook 130 65
51 Link Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines. Python 25169 7588
52 Link A tool for refurbishing and modernizing Python codebases Python 2250 44
53 Link Materials for Mathematical Tools for Neuroscience course at Harvard (Neurobio 212) Jupyter Notebook 410 55
54 Link MlOps End 2 End Jupyter Notebook 13 11
55 Link 📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.   24281 3381
56 Link 🪐 End-to-end NLP workflows from prototype to production Python 1106 444
57 Link 💫 Industrial-strength Natural Language Processing (NLP) in Python Python 26318 4134
58 Link Code release for “Dropout Reduces Underfitting” Python 290 16
59 Link Facebook AI Research Sequence-to-Sequence Toolkit written in Python. Python 26266 5833
60 Link Library for fast text representation and classification. HTML 24685 4608
61 Link HiPlot makes understanding high dimensional data easy TypeScript 2485 125
62 Link Inference code for LLaMA models Python 23064 3680
63 Link The fastai book, published as Jupyter Notebooks Jupyter Notebook 18575 7081
64 Link Free online textbook of Jupyter notebooks for fast.ai Computational Linear Algebra course Jupyter Notebook 9442 2413
65 Link List of Data Science Cheatsheets to rule the world   12211 3437
66 Link A very simple framework for state-of-the-art Natural Language Processing (NLP) Python 12852 2027
67 Link freeCodeCamp.org’s open-source codebase and curriculum. Learn to code for free. TypeScript 368430 32357
68 Link Learn how to responsibly develop, deploy and maintain production machine learning applications. Jupyter Notebook 33342 5462
69 Link Quick tool to draw fully connected neural network architectures   42 5
70 Link Google Research Jupyter Notebook 29664 7307
71 Link State of the Art Language models and Classifier for Sanskrit language (ancient indian language) Jupyter Notebook 63 20
72 Link Plotting Assignment 1 for Exploratory Data Analysis R 1 0
73 Link A curated list of awesome embedding models tutorials, projects and communities. Jupyter Notebook 1629 243
74 Link [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”. Python 50 4
75 Link 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch Python 15407 3082
76 Link ✨Fast Coreference Resolution in spaCy with Neural Networks C 2698 470
77 Link 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. Python 103371 20879
78 Link An NLP workshop about concrete solutions to real problems Jupyter Notebook 1078 453
79 Link ⚡ Building applications with LLMs through composability ⚡ Python 46250 5424
80 Link Badges for your personal developer branding, profile, and projects. SCSS 8133 1193
81 Link Instruction Tuning with GPT-4 HTML 2805 198
82 Link Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. Python 1789 139
83 Link Open3D: A Modern Library for 3D Data Processing C++ 8999 1987
84 Link Dnotebook is a Jupyter-like library for javaScript environment. It allows you to create and share pages that contain live code, text and visualizations. TypeScript 139 10
85 Link An unnecessarily tiny implementation of GPT-2 in NumPy. Python 2392 301
86 Link :octocat: Machine Learning for Cyber Security   5964 1626
87 Link A generic, simple and fast implementation of Deepmind’s AlphaZero algorithm. Julia 1132 119
88 Link A curated list of awesome Machine Learning frameworks, libraries and software. Python 59061 14052
89 Link Core functionality for the MLJ machine learning framework Julia 140 39
90 Link General Assembly’s Data Science course in Washington, DC Jupyter Notebook 187 212
91 Link MetaSeg: Packaged version of the Segment Anything repository Python 649 41
92 Link Essential Cheat Sheets for deep learning and machine learning researchers https://medium.com/@kailashahirwar/essential-cheat-sheets-for-machine-learning-and-deep-learning-researchers-efb6a8ebd2e5   14619 3457
93 Link Web interface for browsing, search and filtering recent arxiv submissions Python 4846 1319
94 Link A Python framework for creating maintainable and modular data science code. Python 8421 796
95 Link Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!! HTML 11207 2832
96 Link   Python 67 100
97 Link 😎 Awesome list of tools and projects with the awesome LangChain framework   2834 145
98 Link 🧑‍🏫 59 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, …), optimizers (adam, adabelief, …), gans(cyclegan, stylegan2, …), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, … 🧠 Jupyter Notebook 24047 2580
99 Link Let us control diffusion models! Python 20472 1899
100 Link Implementation of DALL-E 2, OpenAI’s updated text-to-image synthesis neural network, in Pytorch Python 9799 934
101 Link GUI-based software for training, evaluating and applying deep neural nets for image classification Python 82 18
102 Link PRegEx - Programmable Regular Expressions Python 718 21
103 Link Matplotlib Jupyter Integration TypeScript 1434 216
104 Link Code for the Behavior Retrieval Paper Python 9 1
105 Link Examples of Data Science projects and Artificial Intelligence use-cases Jupyter Notebook 344 267
106 Link 10 Weeks, 20 Lessons, Data Science for All! Jupyter Notebook 19639 3872
107 Link 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all Jupyter Notebook 49227 10169
108 Link Transformers at any scale Python 1667 92
109 Link Lab files for AI-102 - AI Engineer C# 342 452
110 Link GitHub User Guide for MCTs   38 25
111 Link Software and Data Carpentry instructor training course material HTML 2 0
112 Link Lightwood is Legos for Machine Learning. Python 369 82
113 Link Build Web Apps in Jupyter Notebook with Python only Python 3117 188
114 Link   Python 6 7
115 Link gpt4all: an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue C++ 45786 4869
116 Link Easy-to-use JavaScript library for most common data analysis tasks. TypeScript 121 8
117 Link Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022. Python 595 47
118 Link A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques. Jupyter Notebook 4139 654
119 Link Overview of Modern Deep Learning Techniques Applied to Natural Language Processing CSS 1294 198
120 Link 🌊 Online machine learning in Python Python 4262 474
121 Link Examples and guides for using the OpenAI API Jupyter Notebook 38518 5764
122 Link A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. C 3718 392
123 Link Python bindings to libpostal for fast international address parsing/normalization C 678 80
124 Link NeuralProphet: A simple forecasting package Python 2977 419
125 Link   Jupyter Notebook 79 89
126 Link The full dataset behind paperswithcode.com   267 27
127 Link Hindi POS Tags and keywords using TNT model. Created Date: 28 Sept 2018 Python 22 10
128 Link   Python 125 49
129 Link The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️ Python 3088 210
130 Link This is used for identifying whether a given text has sarcasm in it or not. Java 1 0
131 Link “Probabilistic Machine Learning” - a book series by Kevin Murphy Jupyter Notebook 3935 484
132 Link Using tensorboardX (tensorboard for pytorch) e.g. ploting more than one graph in the same chat etc. Python 5 0
133 Link An open-source, low-code machine learning library in Python Jupyter Notebook 7373 1604
134 Link Tensors and Dynamic neural networks in Python with strong GPU acceleration Python 67669 18541
135 Link Build Low Code Automated Tensorflow explainable models in just 3 lines of code. Library created by: Hasan Rafiq - https://www.linkedin.com/in/sam04/ Python 177 37
136 Link Code for 30DayChartChallenge R 34 11
137 Link Deep Web Extractor (DWX): Deep Web Extractor system is using statistical machine learning models for crawling and data discovery from the Deep Web (i.e., massive and quality portion of World Wide Web) to build knowledge based databases. HTML 4 1
138 Link 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants Python 16510 4359
139 Link DEPRECATED: We recommend using Rasa X https://rasa.com/docs/rasa-x/ for managing NLU data JavaScript 467 183
140 Link Containers for machine learning Python 4838 286
141 Link Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM. Jupyter Notebook 2403 338
142 Link An end to end Interactive Interface for correcting mistakes in OCR output. C++ 45 46
143 Link Sent2Vec encoder and training code from the paper “Skip-Thought Vectors” Python 2050 555
144 Link PyTorch code for Learning Deep Time-index Models for Time Series Forecasting (ICML 2023) Python 257 43
145 Link Merlion: A Machine Learning Framework for Time Series Intelligence Python 2991 258
146 Link The SAS Scripting Wrapper for Analytics Transfer (SWAT) package is the Python client to SAS Cloud Analytic Services (CAS). It allows users to execute CAS actions and process the results all from Python. Python 134 54
147 Link Datasets for deep learning with satellite & aerial imagery   246 33
148 Link Algorithms for outlier, adversarial and drift detection Python 1838 180
149 Link An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models HTML 3754 758
150 Link Deep Learning book the covers the principles of deep learning, motivation, explanations, state of the art papers for the various tasks and architectures: CNNs, object detection, semantic segmentation, generative models, denoising, super resolution, style transfer and style manipulation, inpaintig, self supervised learning, vision transformers, OCR, and multi modal. Hope that it will be useful to some of you 🙂   91 20
151 Link This shows how to fine-tune Bert language model and use PyTorch-transformers for text classififcation Jupyter Notebook 63 35
152 Link A Machine Learning project to translate Sanskrit text to English Jupyter Notebook 37 22
153 Link Data and code for “DocPrompting: Generating Code by Retrieving the Docs” @ICLR 2023 Python 165 10
154 Link Model to predict the sentiment of Hindi sentences developed this model during my 2nd-year Internship @ algo8.ai Jupyter Notebook 8 1
155 Link Code Sample of Book “Effective Python: 59 Specific Ways to Write Better Pyton” by Brett Slatkin Python 1362 213
156 Link This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI) Python 2855 224
157 Link CSCI-544 Final Project Python 9 6
158 Link StableLM: Stability AI Language Models Jupyter Notebook 14581 878
159 Link An awesome & curated list of best LLMOps tools for developers Shell 905 83
160 Link TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform. Python 986 136
161 Link Rich is a Python library for rich text and beautiful formatting in the terminal. Python 43554 1569
162 Link   Jupyter Notebook 2 2
163 Link My attempt at researching Quantum Mechanics & Quantum Computing when I was a junior. Jupyter Notebook 116 55
164 Link About Code release for “Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting” (NeurIPS 2021), https://arxiv.org/abs/2106.13008 Jupyter Notebook 1132 286
165 Link 🤖 Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained Jupyter Notebook 21377 3912
166 Link   Jupyter Notebook 40 41
167 Link A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. Python 2046 687
168 Link   Jupyter Notebook 102 340
169 Link A python library for user-friendly forecasting and anomaly detection on time series. Python 5968 673
170 Link Extrapolating knowledge graphs from unstructured text using GPT-3 🕵️‍♂️ JavaScript 3502 289
171 Link Official repo for paper “LeTI: Learning to Generate from Textual Interactions.” Python 50 6
172 Link A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python. Jupyter Notebook 1118 371
173 Link   Python 15233 2094
174 Link Resoruce to help you to prepare for your comming data science interviews   409 65
175 Link   Python 867 479
176 Link The GitHub repository for the paper “Informer” accepted by AAAI 2021. Python 3764 851
177 Link Scalable identity resolution, entity resolution, data mastering and deduplication using ML Java 741 85