Project Index Page
Project Index Page
This is “master project page” therefore it is linking to my different project categories. To know about my work in different domain you can click on the link of your interest.
Introduction
This page is about sharing my project work, capabilities, expertise, abilities, understanding about business domain, technology solutions and approaches.
- This page list all my Github Repo - (private + public).
- These github repositories are related to all my projects, consulting, courses and POC (proof of concepts), technology explorations.
- This also has listed some imported forked repositories. Some of these I forked to extend the existing one, some I forked for teaching, some were forked to build my solutions.
- These projects are either sharing my Project Management capabilities in different domains including IT.
- These projects ar also discussing my Technology capabilities specially around AI, Deep Learning, GenAI, NLP and Analytics.
- The purpose of this listing is,
- to help other’s knowing what is possible and what I have explored.
- to remind myself what I already have explored and worked vs what didn’t work during exploration.
Tech Skills
- LLM Expertise: Prompt Engineering, Finetuning & Deployment models
- Models: Llama, chatGPT, GPT4, Bard, LLaMA, LaMDA, PaLM, Gemma, Claude, Mistral, T5, Flan, BERT, Phi and various others
- Model UI: Ollama, LMStudio, OpenWebUI.
-
ML Model Development: Feature Engineering, Tuning, Evaluation, Cross-Validation, Classical ML, NLP metrics, egression/Classification/Clustering, Ensemble Trees, Decision Tree, Random Forest, SVM.
-
AutoML: Automated ML (PyCaret, TPOT).
-
MLOps/DevOps:
-
Deep Learning / NLP & Embedding: Huggingface, RNN, LSTM, GRU, Transformers, BERT, FastText, NLTK, SpaCy, Embedding, Keras, PyTorch, TensorFlow, OpenAI, Embedding Transfer, CV model evaluation, CNN, YOLO
-
Big Data & Cloud: Hadoop, Spark, PySpark, Kafka, NoSQL (Cassandra, MongoDB)
-
Cloud Platforms: AWS, GCP, Azure, AWS Sagemaker, Aure AutoML, VertexAI, Oracle AI
-
ML Frameworks: Tensorflow, Tensorflow lite/LiteRT, Tensorflow.js, Pytorch
-
Data Visualization: PowerBI, Tableau, Plotly, Seaborn, Matplotlib,
-
Mobile/Web App Dev: Flask, Gradio, Streamlit, Android Studio, Flutter
-
Programming Laguages: Python, R, Package Managers (pip, conda, npm), Dart
-
IDE/CLI/SDK: Visual Code, Cursor, Visual Studio, Eclipse, Android Studio, Flutter
-
Markup Language: Markdown, LaTex, HTML/CSS
- Statistics: Descriptive/Inferential Statistics, Prescriptive Statistics in AI.
AI/ML/DL, GenAI, LLM, Analytics, Technology Work Summary
My POC and Technology Stacks
Summary of My Project Management Work/Projects
AI/ML Datasets
There is no dearth of datasets but during training sessions when I or my learners need some dataset that we need to struggle for these datasets. Either they are removed ore renamed or internet availablity/restriction etc issue waste lot of time. To avoid that I have created this Github Repo - of datasets. These are for classical machine learning. They are not for deeplearning or LLM, until mentioned specifically.
AI/ML - Industries - Developed/ Created/ Expanded work
Projects in this section are listed according to Industry/Business Domain. Sometimes it is difficult for me to find out in what domain a particular project falls into, therefore I have created this page to decide the domain.
BFSI (Banking, Financial Services, and Insurance)
BFSI includes financial institutions, banks, insurance companies, investment firms, and other entities offering services such as lending, investment, wealth management, and financial protection. This sector is heavily regulated and technology-driven for security and risk management.
Credit-Fraud-Detection
DoeJones-Prediction-with-News
Loan-Approval
HR (Human Resources)
This domain covers employee management, recruitment, training, compensation, and workplace culture. It also includes HR technology and services related to people management and organizational development.
HR Analysis of Employee Attrition & Performance
- Github Repo
- Colab
- HR Analysis of Employee Attrition & Performance - R “- Objective: Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a fictional data set created by IBM data scientists.
- Github dataset
- HR Analysis of Employee Attrition & Performance - Python
Health
The health domain includes healthcare providers, hospitals, pharmaceuticals, health insurance, and healthcare technology focused on improving patient care, medical research, and public health initiatives. This vertical will not include project related to Health-Infra development.
Liver Patient Analysis
Breast-Cancer-Prediction
Chest-XRay - Effusion Segmentation
Chest-XRay - Effusion Classification
Covid-worldwide-EDA
India-Covid-Graphs
Malaria-Detection_dep
pnemonia_prediction
Energy
This domain involves the production, management, and distribution of energy, including fossil fuels, renewable energy (solar, wind, hydro), nuclear power, and energy conservation technologies, along with grid management.
UK-Energy-Consumption
AirQuality-Prediction
Climate
Climate and energy are inter-related, therefore to avoid confusion any project related to Energy will not come in climate vertical. This domain focuses on climate science, environmental monitoring, and sustainability initiatives, including research and development on climate change, renewable energy, environmental policy, and green technologies to reduce carbon footprints
Acea Smart Water Analytics & Prediction
Objective
- The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
- This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other.
- It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.
- Github Repo
- Colab
- Acea Water Prediction & Analysis - Kaggle
- Kaggle Dataset
eCommerce
The e-commerce domain comprises online platforms and businesses that facilitate buying and selling goods and services over the internet. It includes marketplaces, payment processing, logistics, and digital retailing.
Black Friday Sales Data Analysis Prediction
- About Dataset: This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset which has multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.
- Github Repo
- Colab
- Black Friday Sales Data Analysis Prediction - Kaggle
- Dataset
Amazon Sentiment Analysis
Bigdata-AmazonReviews
Online-Retail-Customer-Clustering
Recommendation System Amazon Electronics
Economics and International Trade
This field involves the study and application of economic theories, policies, and data analysis to understand markets, consumer behavior, global trade, and financial trends. It serves as the foundation for economic research, policy-making, and financial planning.
Economy-Analysis
Prosperity-Clustering
Marine Consultant - GOI
- Github Repo - This is a GPT on chatGPT prototype which helps them planning strategy for bilateral or multilateral engagements with other countries.
Electronics
The electronics domain includes the design, manufacturing, and distribution of electronic devices and components, such as semiconductors, consumer electronics, computing hardware, and embedded systems.
Hand Gesture Recognition
Industrial Safety
Industrial safety focuses on workplace safety standards, risk management, and protocols to protect employees and prevent accidents in industrial environments. It includes safety training, hazard assessments, and regulatory compliance.
Industrial Accident Cause Analysis
OSHA Accidents and Injury
Tourism, Hospitality, Hotel, Restaurant and Event Management
The hospitality domain involves businesses that provide accommodation, food, and leisure services, such as hotels, resorts, restaurants, and cafes, focusing on guest experiences and comfort.
Zomato Review
Indian Food Item Recommendations in Restaurants
FoodDemand Forcast
Travel & Logistic
The Travel & Logistics domain encompasses the movement of people and goods. It includes various industries such as transportation, warehousing, distribution, and supply chain management for both individuals and businesses. The focus in this domain is on efficient, timely, and cost-effective transport, as well as providing seamless travel experiences. This sector is heavily influenced by technology for tracking, route optimization, and resource management. This domain has some overlap with eCommerce and Sales.
Flight Delay Analysis using Hive
This dataset contains the 2004-2005 flights data from the 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the United States of America from October 1987 to April 2008 Activities (Pipeline) in project:
- Creating hive table (for storage) from the external files
- Create partition table schema
- Parition hive table based on the year and putting data in partition table.
- Performing sql querries on the partitioned table Links
- My article on Hive
- Github Repo
- Kaggle Dataset 1.34GB
Flight Delay Analysis - 2008 (Bigdata)
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website. Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations. Dataset contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years. Links
Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.
Flight Delay and Cancellation Analysis - 2015
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.
NYC Parking - 2008
NYC Parking - 2004-2005 (Bigdata and pySpark)
NYC Parking - 2015
NYC Parking - 2017
Driver Availablity Prediction
- Github Repo
- Colab
- Dataset & ping.csv
Uber Cancellation
Vehicle Classification
Vehicle Tracking
Entertainment, Games & Sports
This sector covers the creation/production, distribution, and consumption of media, including film, music, gaming, and live performances. It involves production houses, streaming services, and digital content platforms.
Movies-Recommendations
Olympic-QA-System-with-GPT
Media and Publication
This domain includes businesses involved in publishing content across print, digital, and broadcast formats. It covers books, news, newspapers, magazines, digital media platforms, and content creation and distribution.
Media+Publication-TWO - Talk with Osho
- Github Repo - This is an eduational GPT on ChatGPT. It is based on selected books of Osho. It is a prototype, because there is a limit of books loads on ChatGPT. In future when this constraints will be removed, this project will be updated with more books.
Media+Publication-TWSV - Talk with Swami Vivekananda
- Github Repo - This is Educational GPT on ChatGPT. It is based on the 8 volumes of complete works of Swami Vivekananda.
Summarising A Long Hindi Video into Different English Audio
- Github Repo - This project is designed to take a long Hindi Video (e.g., a YouTube video) and summarize it into one English Summary audio.
Fakenews-Detection
HBQAS
NewsClassification-20Groups
Multiclass classification. Overall Test accuracy: 0.88 6 classes of news are : rec.sport.hockey, rec.motorcycles, rec.sport.baseball, rec.autos, talk.politics.guns, talk.religion.misc, sci.med, sci.electronics, sci.space, sci.crypt, misc.forsale, comp.os.ms-windows.misc, comp.graphics, comp.sys.ibm.pc.hardware, comp.windows.x, comp.sys.mac.hardware, soc.religion.christian, talk.politics.mideast, alt.atheism, talk.politics.misc
Class 0:, P: 0.46, R: 0.50, F1: 0.48, support: 12.00 Class 1:, P: 0.83, R: 0.52, F1: 0.64, support: 29.00 Class 10:, P: 0.95, R: 0.99, F1: 0.97, support: 124.00 Class 11:, P: 1.00, R: 0.97, F1: 0.98, support: 31.00 Class 12:, P: 0.67, R: 0.73, F1: 0.70, support: 45.00 Class 13:, P: 0.75, R: 0.88, F1: 0.81, support: 43.00 Class 14:, P: 0.81, R: 0.92, F1: 0.86, support: 38.00 Class 15:, P: 0.92, R: 0.55, F1: 0.69, support: 20.00 Class 16:, P: 0.87, R: 0.97, F1: 0.92, support: 103.00 Class 17:, P: 0.93, R: 0.93, F1: 0.93, support: 14.00 Class 18:, P: 1.00, R: 0.39, F1: 0.56, support: 18.00 Class 19:, P: 0.91, R: 0.89, F1: 0.90, support: 81.00 Class 2:, P: 0.62, R: 0.69, F1: 0.66, support: 29.00 Class 3:, P: 0.67, R: 0.52, F1: 0.58, support: 31.00 Class 4:, P: 0.85, R: 0.68, F1: 0.76, support: 25.00 Class 5:, P: 0.89, R: 0.92, F1: 0.91, support: 26.00 Class 6:, P: 0.89, R: 0.69, F1: 0.77, support: 35.00 Class 7:, P: 0.90, R: 0.89, F1: 0.89, support: 108.00 Class 8:, P: 0.88, R: 0.97, F1: 0.93, support: 117.00 Class 9:, P: 0.94, R: 0.97, F1: 0.96, support: 121.00
SDSHL
Toxic-Comment
Twitter-Sentiment-Analysis
YELP-Review-Prediction
- Github Repo
- Colab - Fine_Tuning_Transformer_BERT_Customer_Review
- Colab - Yelp customer_review_classification
Sales
Sales is overlap of e-Commerce and Retail. To avoid the confusion anything which is related to sales of big items like Car, House or any Capital items are put in sales domain. They may online or via a physical shop.
House-Price-Prediction
CarPrice
CarSales
Lead-Conversion
Telecom
This domain covers telecommunications services and infrastructure, including phone networks, internet providers, satellite communications, and emerging technologies like 5G, enabling global connectivity and communication.
Telecom Churn
Public Safety and Security
This sector involves efforts to maintain public order, safety, and security in communities. It includes law enforcement, emergency services, disaster response, and security solutions for protecting people and assets.
Barcelona Accidents
Indian Judiciary - Verdict Dataset
Agriculture
This domain encompasses activities related to farming, crop production, livestock management, and the broader agricultural supply chain. It also includes agricultural technology (agritech), sustainable farming practices, and rural development.
Education
This sector covers educational institutions, e-learning platforms, and educational technology (edtech) designed to facilitate teaching, learning, and research. It spans K-12, higher education, corporate training, and continuous learning.
Infrastructure (Infra) Development
This domain encompasses the planning, design, and construction of physical facilities and systems, including transportation, telecommunications, water supply, and utilities essential for supporting economic activity.
Datasets
====================================================
My Technology Stack - Developed/ Created/ Expended work
Works in this section are listed according to Technology/Tech Product/POC
LLM/Agentic AI
CrewAI
Agentic-LLM
- Github Repo - Agentic-LLM/Bee-Agentic-Framework
- Github Repo - Agentic-LLM/Create-Custom-Audio-Summaries
BigScience
Cloud-AWS
- Github Repo - Cloud-AWS/AWS-Amazon-Bedrock-for-Serverless-LLM
- Github Repo - Cloud-AWS/AWS-Amplify
- Github Repo - Cloud-AWS/AWS-Runner
- Github Repo - Cloud-AWS/AWS-SageMaker
Cloud-GCP
Cloud-Groq
Deepseek
EleutherAI
Langchain
MLOps
- Github Repo - MLOps/ML-Pipelines
- Github Repo - MLOps/Pydantic+logfire
- Github Repo - MLOps/cog
- Github Repo - MLOps/comet.com
- Github Repo - MLOps/naptune
MetaAI
- Github Repo - MetaAI/finetune-llama2
- Github Repo - MetaAI/finetune-llama3-8b
- Github Repo - MetaAI/ollama
- Github Repo - MetaAI/ts.llamaindex
MistralAI
RAG
StabilityAI
Vercel
huggingface
lmstudio
openAI
- Github Repo - openAI/open-webui
- Github Repo - openAI/openAI
- Github Repo - openAI/openai-quickstart-python
otterai
quantization
AI/ML - Technology Stack
API-Development
- Github Repo - API-Development/Python-Jinja
- Github Repo - API-Development/curl
- Github Repo - API-Development/fastapi_example
Analytics
- Github Repo - Analytics/PowerBI
- Github Repo - Analytics/PyGWalker
- Github Repo - Analytics/bokeh
- Github Repo - Analytics/pandas
- Github Repo - Analytics/plotly
- Github Repo - Analytics/tableau
Audio
- Github Repo - Audio/GAN-MusicGeneration
- Github Repo - Audio/Speech-Recognition
- Github Repo - Music-Vocal-Separation
CICD
CMS-Content-Management-Systems
CUDA
CV
- Github Repo - CV/Flower-Prediction
- Github Repo - CV/ImageAugmentation
- Github Repo - CV/ImageProcessing
- Github Repo - CV/MNIST-Experiments
- Github Repo - CV/Object-Detection-InBrowser
DB-Bigdata
- Github Repo - DB-Bigdata/Bigdata-HiveScoop
- Github Repo - DB-Bigdata/Bigdata-mySQL
- Github Repo - DB-Bigdata/mongodb
DB-Databases
DE
Docker
- Github Repo - Docker/Dockerfile-golang1.21-alpine
- Github Repo - Docker/Dockerfile-php7.2-apache
- Github Repo - Docker/Dockerfile-python3.9-slim
- Github Repo - Docker/Dockerfile-tensorflow-latest
- Github Repo - Docker/Gemini-Docker-by-Hari
- Github Repo - Docker/Jupyter-from-Docker-with-GPU
- Github Repo - Docker/Jupyter-from-Docker-wo-GPU
- Github Repo - Docker/bindmount-apps
- Github Repo - Docker/colab-in-docker-in-local
- Github Repo - Docker/count-program
- Github Repo - Docker/demo1
- Github Repo - Docker/welcome-to-docker
IOT
LaTex
ML
- Github Repo - ML/Classification
- Github Repo - ML/Clustering
- Github Repo - ML/DataImbalance
- Github Repo - ML/ML-Retraining
- Github Repo - ML/ROC
- Github Repo - ML/Regression
MLFrameworks
- Github Repo - MLFrameworks/TensorFlow-ImageRecognition
- Github Repo - MLFrameworks/mobilenet-v2
- Github Repo - MLFrameworks/tensorflow-js
- Github Repo - MLFrameworks/tensorflow-lite
NLP
- Github Repo - NLP/Embedding
- Github Repo - NLP/NLP-Concepts
- Github Repo - NLP/NLP-Hindi-Bible
- Github Repo - NLP/NLP-Misc
- Github Repo - NLP/NLP-Plugin20Event
- Github Repo - NLP/NLP-SanskritTrans
- Github Repo - NLP/NLP-rasa
Python-Automation
- Github Repo - Python-Automation/1.Table-Extraction
- Github Repo - Python-Automation/2.Automate-The-News
- Github Repo - Python-Automation/3.Excel-Report
- Github Repo - Python-Automation/4.WhatsApp
- Github Repo - Python-Automation/datasets
- Github Repo - Python-Automation/spanish
- Github Repo - Python-Automation/test-folder
PythonicWay
- Github Repo - PythonicWay/8things-you-must-know
- Github Repo - PythonicWay/ManagingPythonProjects
- Github Repo - PythonicWay/python_pkg
R-Projects
RL
Stats
Timeseries
Transfer-Learning
UI-Design
Web+Mobile
- Github Repo - Web+Mobile/Android
- Github Repo - Web+Mobile/JS
- Github Repo - Web+Mobile/Java
- Github Repo - Web+Mobile/Nodejs
- Github Repo - Web+Mobile/nginx-web-app
AI/ML - Forked
- Github Repo - 100Days-ML
- Github Repo - AgenticAI-agent-zero
- Github Repo - automl-mljar-supervised
- Github Repo - automl-pycaret
- Github Repo - automl-tpot
- Github Repo - bolt.diy
- Github Repo - evalml
- Github Repo - gcp-python-docs-samples
- Github Repo - GFPGAN
- Github Repo - google-ai-edge-litert-samples
- Github Repo - google-gemini-cookbook
- Github Repo - intel-scikit-learn-intelex
- Github Repo - langchain
- Github Repo - langchain-ai-langchain-academy
- Github Repo - langgraph
- Github Repo - Learning-Pandas-Second-Edition
- Github Repo - LeetCode-js
- Github Repo - microsoft-generative-ai-for-beginners
- Github Repo - ML+DL-Code-for-my-YouTube-Channel-Rohan
- Github Repo - mlops-logfire
- Github Repo - packages
- Github Repo - Pyramid-Flow
- Github Repo - Sandhi_Prakarana
- Github Repo - stanford_alpaca
- Github Repo - supabase
- Github Repo - tensorboard
- Github Repo - tensorflow
- Github Repo - tensorflow-examples
- Github Repo - tensorflow-tfjs-examples
- Github Repo - tensorflow-tfjs-models
- Github Repo - vector-admin
- Github Repo - visualization-forks
- Github Repo - Browse-use
Web+Mobile App Development - POC Work
- Android
- Falcon_Android
- ImageRecognition
- Java
- nodejs
- react