Summary of AI ML Project
Introduction
- This page list all my github repo (private + public).
- These github repositories are related to all my projects, consultings, courses and POC (proof of concepts), technology explorations, github opensource contributions etc.
- These projects are related to Project Management, AI/ML, LLM, NLP, Cloud Computing, Software Architectures and Solutions,
- The purpose of this listing is dual, to help other’s knowing what is possible and what I have explored. Second, to remind myself what I already have explored and worked vs what didn’t work.
- Bigdata: Hadoop, Hive, Spark
Tech Skills
-
LLM Expertise: Prompt Engineering, Finetuning & Deployment (chatGPT, GPT4, Bard, LLaMA, LaMDA, PaLM, ).
-
ML Model Development: Feature Engineering, Tuning, Evaluation, Cross-Validation, Classical ML, NLP metrics, egression/Classification/Clustering, Ensemble Trees, Decision Tree, Random Forest, SVM.
-
AutoML: Automated ML (PyCaret, TPOT).
-
MLOps/DevOps:
-
Deep Learning / NLP & Embedding: Huggingface, RNN, LSTM, GRU, Transformers, BERT, FastText, NLTK, SpaCy, Word Embedding, Keras, PyTorch, TensorFlow, OpenAI, Embedding Transfer, CV model evaluation, CNN, YOLO
-
Big Data & Cloud: Hadoop, Spark, PySpark, Kafka, NoSQL (Cassandra, MongoDB)
-
Cloud Platforms: AWS, GCP, Azure, AWS Sagemaker, Aure AutoML, VertexAI
-
ML Frameworks: Tensorflow, Tensorflow lite/LiteRT, Tensorflow.js, Pytorch
-
Data Visualization: PowerBI, Tableau, Plotly, Seaborn, Matplotlib,
-
Mobile/Web App Dev: Flask, Gradio, Streamlit, Android Studio, Flutter
-
Programming Laguages: Python, R, Package Managers (pip, conda, npm), Dart
-
Markup Language: Markdown, LaTex, HTML/CSS
-
Statistics: Descriptive/Inferential Statistics, Prescriptive Statistics in AI.
AI/ML - Industries - Developed/ Created/ Expanded work
Projects in this section are listed according to Industry/Business Domain.
Agri
Airlines
Flightdelay-Analysis-Bigdata
Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations.
In this project a folder “\server\airlines” on the server has hundreds of files which contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years.
Activities (Pipeline) in project:
- Creating hive table (for storage) from the external files
- Create partition table schema
- Parition hive table based on the year and putting data in partition table.
- Performing sql querries on the partitioned table
Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.
My article on Hive
Github Code
BFSI
Credit-Fraud-Detection
DoeJones-Prediction-with-News
Loan-Approval
eCommerce
Amazon-Sentiment Analysis
Bigdata-AmazonReviews
Economics
Economy-Analysis
Prosperity-Clustering
Education
Electronics
Hand-Gesture-Recognition
Energy
UK-Energy-Consumption
Entertaintment
Movies-Recommendations
Health
Breast-Cancer-Prediction
Chest-XRay
Covid-worldwide-EDA
India-Covid-Graphs
Malaria-Detection_dep
pnemonia_prediction
Hospitality
Restaurant_Sales_Order_Forcasting
Infra
AirQuality-Prediction
House-Price-Prediction_dep
House-Price-Prediction_Docker
House-Prices-KCH
Surprise-House-Pricing
Law+Justice
Media+Pub
Fakenews-Detection
Olympic-QA-System-with-GPT
HBQAS
NewsClassification-20Groups
Podcast-Transcription
SDSHL
SpamFilter
Toxic-Comment
Twitter-Sentiment-Analysis
YELP-Review-Prediction
Misc
Restaurants
FoodDemand-Forcast
Tools-and-Food-Gradient-Identifcation
Retail
Online-Retail-Customer-Clustering
Sales
CarPrice
CarSales
Lead-Conversion
Sanskrit
Telecom
Telcom-Churn
Travel+Logistic
Bigdata-pySpark-NYC-Parking
Driver-Availablity-Prediction
Uber-Cancellation
Vehicle-Classification
Vehicle-Tracking
README.md
AI/ML - Technology Stack - Developed/ Created/ Expended work
Projects in this section are listed according to Technology/Tech Product/POC
0-Experiments
Analytics
bokeh
pandas
plotly
PyGWalker
tableau
Audio
Speech Recogntion
Bigdata
Bigdata-HiveScoop
Bigdata-mySQL
Cloud
AWS-Amazon-Bedrock-for-Serverless-LLM
AWS-SageMaker
GCP
CV
Flower-Prediction
ImageAugmentation
ImageProcessing
MNIST-Experiments
Object-Detection-InBrowser
DE
DataCleaning
Datacollection
PyWebScrapping
Machine Learning Frameworks
mobilenet-v2
Using deep learning model on mobile. Github Read me
Tensofrlow Lite for Regression
GAN
MusicGeneration
IOT
BOLTIOT
LLM
finetune-bloom-7b
finetune-llama2
finetune-llama3-8b
huggingface
Langchain
neo2.7b
openAI
openai-quickstart-python
quantization
RAG
GroqCloud
Misc
ML
Classification
Clustering
DataImbalance
ML-Retraining
Regression
ROC
MLOps
ML-Pipelines
naptune
NLP
embedding
LSTM.ipynb
NLP
NLP-Concepts
NLP-Hindi-Bible
NLP-Plugin20Event
NLP-rasa
NLP-SanskritTrans
Python-Automation
R-Projects
RL
Stats
Tech-Products
Hive
mongodb
PowerBI
TensorFlow-ImageRecognition
Timeseries
TS-multivariate
TS-Smoothing
Transfer-Learning
Utils-JypterNB
Readme.md
AI/ML - Forked
100Days-ML
automl
chroma
diffusers
evalml
gcp-python-docs-samples
GFPGAN
google-gemini-cookbook
intel-scikit-learn-intelex
langchain
langgraph
Learning-Pandas-Second-Edition
LeetCode-js
microsoft-generative-ai-for-beginners
ML+DL-Code-for-my-YouTube-Channel-Rohan
packages
stanford_alpaca
tensorboard
tensorflow
tensorflow-examples
tfjs-examples
Visualization
viz-github-repo
README-forked-repo.md
Management
Management-Main
- 11-PMO
- 00-General
- 01-Chemfab-PMO
- 02-ISCON-PMO
- 03-Tagros-PMO
- 04-FFI-PMO
- 05-BFL-PMO
- 06-TEAM-PMO
- 07-SignitySolutions
- 12-Projects-PM
- 01-Vikram-Solar-PMF
- 02-FFI-Agile-Consulting
- 03-AllSysServices-PMI-ACP
- 04-Astrowix-PMI-ACP
- 05-Colossal-Hibu-PM
- 06-TGroup-PMP
- 07-BirlaSoft-SageTech-PMI-ACP
- 08-Sagetech-Project-Estimation
- 09-VGL-PM-2days
- 10-CompetenceCurve-FTM
- 11-Sanofi-PM
- 12-Konsberg-Scrum-Agile
- 13-HRLEHR-Dubai
- 14-SEO-PMLOGY
- A01-ContractManagement
- P01-PRINCE2
- 12-Project-NGO
- S01-Rajiv-Malhotra
- S02-YFS
- S03-HSP
- S04-RKM-Ashram
- S05-RKM-Kankhal
- S06-RKM-Trivendram
- 14-Process-Courses
- Process-CMMI
- Process-ISMS
- Process-ISO
- Process-SixSigma
- Process-ZED
- 52-Work-PMI-Chapters
- 2012 LIMC Application Information
- CPC_Presentation_Foundations.ppt
- ITnTelecom-Webinars
- LIM-Brazil-2011
- OPM3-Package
- PMBoK-Hindi
- PMI-International
- PMI-Leadership-Component
- PMI-NC
- PMI-Team-India
- PMIEF
- PMIMC-BestPractices
- Regional-BP-Task-Force
- Work-PMICC
- Work-PMIMC
Management-PM-Courses
- PM-Agile
- PM-Customized
- PM-EPM
- PM-EVM-MSP
- PM-Microsoft-Project
- PM-Misc-Training
- PM-PMP-v5
- PM-PMP-v6
- PM-PRINCE2
- PM-RMP
- PM-SharePoint
- PM-SoftwareSizeEstimation
## Management-PMO
- PM-Templates
- PMO
Management-PMIPrep
Training-Feedbacks
Web+Mobile App Development - POC Work
- Android
- Falcon_Android
- ImageRecognition
- Java
- nodejs
- react
AI/ML Datasets
There is no dearth of datasets but during training sessions when I or my learners need some dataset that we need to struggle for these datasets. Either they are removed ore renamed or internet availablity/restriction etc issue waste lot of time. To avoid that I have created this github repo of datasets. These are for classical machine learning. They are not for deeplearning or LLM, until mentioned specifically.
- 50_Startups - 50_Startups.csv
- Abalone
- Accidental Drug Related Deaths in Connecticut, US
- airline-pass-stats.csv
- airline-passengers.csv
- AirQuality
- Amazon Product Reviews
- amazon_alexa.tsv
- application_train.csv
- Autism Screening Adult
- Auto MPG
- Banknote Authentication
- Beijing PM2.5
- Bike Sharing
- Birmingham Parking Dataset
- Blog_Article_Popularity
- Blood Transfusion Service Center
- Breast Cancer Wisconsin
- Car Evaluation
- CarPrice.csv
- CarPrice_DescribeData.csv
- Census Income
- childweight_SCA01.csv
- Concrete Compressive Strength
- Coronavirus
- Daily Demand Forecasting Orders
- daily-min-temperatures.csv
- Default of Credit Card Clients
- Dow Jones Index
- Echocardiogram
- EEG Eye State Dataset
- EEG Steady State Evoked Potential Dataset
- Energy Efficiency
- EU Population Poverty Status Dataset
- Fakenames
- FB.csv
- Fertility
- FIFA-Worldcup - World Cup.csv
- financial_crime_aylien_news_data.tar.gz
- fine_food_reviews_with_embeddings_1k.csv
- Flights
- Frequent_Names
- Glass Identification
- Heart Disease
- HelpInternational-Country-data.csv
- Hepatitis
- Hepatitis C Virus (HCV) Classification Dataset
- Immigrants
- Individual Household Electric Power Consumption
- Interstate-94 (I-94) Traffic Volume Dataset
- Istanbul Stock Exchange
- Liver Disorders
- Movie-Rating.zip
- Occupancy Detection
- OCR-Samples
- olympics_qa.csv
- olympics_search.jsonl
- olympics_sections.csv
- Online News Popularity
- Online_Retail
- pima_indian_diabetes.csv
- POIClassification.csv
- Population
- Portugal 2019 Election Dataset
- Qualitative Bankruptcy
- random-ocr-images
- Real Estate Valuation
- Risk Factors for Cervical Cancer
- spotfy-2000.zip
- Startup_Investment
- Suicide
- Telecom_Churn
- Travel Reviews
- Unemployment
- US Tuberculosis Dataset
- User Knowledge Modeling
- Wholesale Customers
- Wireless Indoor Localization
- README.md