Project Index Page
Project Index PagePermalink
This is “master project page” therefore it is linking to my different project categories. To know about my work in different domain you can click on the link of your interest.
IntroductionPermalink
This page is about sharing my project work, capabilities, expertise, abilities, understanding about business domain, technology solutions and approaches.
- This page list all my Github Repo - (private + public).
- These github repositories are related to all my projects, consulting, courses and POC (proof of concepts), technology explorations.
- This also has listed some imported forked repositories. Some of these I forked to extend the existing one, some I forked for teaching, some were forked to build my solutions.
- These projects are either sharing my Project Management capabilities in different domains including IT.
- These projects ar also discussing my Technology capabilities specially around AI, Deep Learning, GenAI, NLP and Analytics.
- The purpose of this listing is,
- to help other’s knowing what is possible and what I have explored.
- to remind myself what I already have explored and worked vs what didn’t work during exploration.
Tech SkillsPermalink
- LLM Expertise: Prompt Engineering, Finetuning & Deployment models
- Models: Llama, chatGPT, GPT4, Bard, LLaMA, LaMDA, PaLM, Gemma, Claude, Mistral, T5, Flan, BERT, Phi and various others
- Model UI: Ollama, LMStudio, OpenWebUI.
-
ML Model Development: Feature Engineering, Tuning, Evaluation, Cross-Validation, Classical ML, NLP metrics, egression/Classification/Clustering, Ensemble Trees, Decision Tree, Random Forest, SVM.
-
AutoML: Automated ML (PyCaret, TPOT).
-
MLOps/DevOps:
-
Deep Learning / NLP & Embedding: Huggingface, RNN, LSTM, GRU, Transformers, BERT, FastText, NLTK, SpaCy, Embedding, Keras, PyTorch, TensorFlow, OpenAI, Embedding Transfer, CV model evaluation, CNN, YOLO
-
Big Data & Cloud: Hadoop, Spark, PySpark, Kafka, NoSQL (Cassandra, MongoDB)
-
Cloud Platforms: AWS, GCP, Azure, AWS Sagemaker, Aure AutoML, VertexAI, Oracle AI
-
ML Frameworks: Tensorflow, Tensorflow lite/LiteRT, Tensorflow.js, Pytorch
-
Data Visualization: PowerBI, Tableau, Plotly, Seaborn, Matplotlib,
-
Mobile/Web App Dev: Flask, Gradio, Streamlit, Android Studio, Flutter
-
Programming Laguages: Python, R, Package Managers (pip, conda, npm), Dart
-
IDE/CLI/SDK: Visual Code, Cursor, Visual Studio, Eclipse, Android Studio, Flutter
-
Markup Language: Markdown, LaTex, HTML/CSS
- Statistics: Descriptive/Inferential Statistics, Prescriptive Statistics in AI.
AI/ML/DL, GenAI, LLM, Analytics, Technology Work SummaryPermalink
My POC and Technology StacksPermalink
Summary of My Project Management Work/ProjectsPermalink
AI/ML DatasetsPermalink
There is no dearth of datasets but during training sessions when I or my learners need some dataset that we need to struggle for these datasets. Either they are removed ore renamed or internet availablity/restriction etc issue waste lot of time. To avoid that I have created this Github Repo - of datasets. These are for classical machine learning. They are not for deeplearning or LLM, until mentioned specifically.
AI/ML - Industries - Developed/ Created/ Expanded workPermalink
Projects in this section are listed according to Industry/Business Domain. Sometimes it is difficult for me to find out in what domain a particular project falls into, therefore I have created this page to decide the domain.
BFSI (Banking, Financial Services, and Insurance)Permalink
BFSI includes financial institutions, banks, insurance companies, investment firms, and other entities offering services such as lending, investment, wealth management, and financial protection. This sector is heavily regulated and technology-driven for security and risk management.
Credit-Fraud-DetectionPermalink
DoeJones-Prediction-with-NewsPermalink
Loan-ApprovalPermalink
HR (Human Resources)Permalink
This domain covers employee management, recruitment, training, compensation, and workplace culture. It also includes HR technology and services related to people management and organizational development.
HR Analysis of Employee Attrition & PerformancePermalink
- Github Repo
- Colab
- HR Analysis of Employee Attrition & Performance - R “- Objective: Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a fictional data set created by IBM data scientists.
- Github dataset
- HR Analysis of Employee Attrition & Performance - Python
HealthPermalink
The health domain includes healthcare providers, hospitals, pharmaceuticals, health insurance, and healthcare technology focused on improving patient care, medical research, and public health initiatives. This vertical will not include project related to Health-Infra development.
Liver Patient AnalysisPermalink
Breast-Cancer-PredictionPermalink
Chest-XRay - Effusion SegmentationPermalink
Chest-XRay - Effusion ClassificationPermalink
Covid-worldwide-EDAPermalink
India-Covid-GraphsPermalink
Malaria-Detection_depPermalink
pnemonia_predictionPermalink
EnergyPermalink
This domain involves the production, management, and distribution of energy, including fossil fuels, renewable energy (solar, wind, hydro), nuclear power, and energy conservation technologies, along with grid management.
UK-Energy-ConsumptionPermalink
AirQuality-PredictionPermalink
ClimatePermalink
Climate and energy are inter-related, therefore to avoid confusion any project related to Energy will not come in climate vertical. This domain focuses on climate science, environmental monitoring, and sustainability initiatives, including research and development on climate change, renewable energy, environmental policy, and green technologies to reduce carbon footprints
Acea Smart Water Analytics & PredictionPermalink
Objective
- The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
- This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other.
- It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.
- Github Repo
- Colab
- Acea Water Prediction & Analysis - Kaggle
- Kaggle Dataset
eCommercePermalink
The e-commerce domain comprises online platforms and businesses that facilitate buying and selling goods and services over the internet. It includes marketplaces, payment processing, logistics, and digital retailing.
Black Friday Sales Data Analysis PredictionPermalink
- About Dataset: This dataset comprises of sales transactions captured at a retail store. It’s a classic dataset which has multiple shopping experiences. This is a regression problem. The dataset has 550,069 rows and 12 columns.
- Github Repo
- Colab
- Black Friday Sales Data Analysis Prediction - Kaggle
- Dataset
Amazon Sentiment AnalysisPermalink
Bigdata-AmazonReviewsPermalink
Online-Retail-Customer-ClusteringPermalink
Recommendation System Amazon ElectronicsPermalink
Economics and International TradePermalink
This field involves the study and application of economic theories, policies, and data analysis to understand markets, consumer behavior, global trade, and financial trends. It serves as the foundation for economic research, policy-making, and financial planning.
Economy-AnalysisPermalink
Prosperity-ClusteringPermalink
Marine Consultant - GOIPermalink
- Github Repo - This is a GPT on chatGPT prototype which helps them planning strategy for bilateral or multilateral engagements with other countries.
ElectronicsPermalink
The electronics domain includes the design, manufacturing, and distribution of electronic devices and components, such as semiconductors, consumer electronics, computing hardware, and embedded systems.
Hand Gesture RecognitionPermalink
Industrial SafetyPermalink
Industrial safety focuses on workplace safety standards, risk management, and protocols to protect employees and prevent accidents in industrial environments. It includes safety training, hazard assessments, and regulatory compliance.
Industrial Accident Cause AnalysisPermalink
OSHA Accidents and InjuryPermalink
Tourism, Hospitality, Hotel, Restaurant and Event ManagementPermalink
The hospitality domain involves businesses that provide accommodation, food, and leisure services, such as hotels, resorts, restaurants, and cafes, focusing on guest experiences and comfort.
Zomato ReviewPermalink
Indian Food Item Recommendations in RestaurantsPermalink
FoodDemand ForcastPermalink
Travel & LogisticPermalink
The Travel & Logistics domain encompasses the movement of people and goods. It includes various industries such as transportation, warehousing, distribution, and supply chain management for both individuals and businesses. The focus in this domain is on efficient, timely, and cost-effective transport, as well as providing seamless travel experiences. This sector is heavily influenced by technology for tracking, route optimization, and resource management. This domain has some overlap with eCommerce and Sales.
Flight Delay Analysis using HivePermalink
This dataset contains the 2004-2005 flights data from the 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the United States of America from October 1987 to April 2008 Activities (Pipeline) in project:
- Creating hive table (for storage) from the external files
- Create partition table schema
- Parition hive table based on the year and putting data in partition table.
- Performing sql querries on the partitioned table Links
- My article on Hive
- Github Repo
- Kaggle Dataset 1.34GB
Flight Delay Analysis - 2008 (Bigdata)Permalink
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website. Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations. Dataset contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years. Links
Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.
Flight Delay and Cancellation Analysis - 2015Permalink
The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.
NYC Parking - 2008Permalink
NYC Parking - 2004-2005 (Bigdata and pySpark)Permalink
NYC Parking - 2015Permalink
NYC Parking - 2017Permalink
Driver Availablity PredictionPermalink
- Github Repo
- Colab
- Dataset & ping.csv
Uber CancellationPermalink
Vehicle ClassificationPermalink
Vehicle TrackingPermalink
Entertainment, Games & SportsPermalink
This sector covers the creation/production, distribution, and consumption of media, including film, music, gaming, and live performances. It involves production houses, streaming services, and digital content platforms.
Movies-RecommendationsPermalink
Olympic-QA-System-with-GPTPermalink
Media and PublicationPermalink
This domain includes businesses involved in publishing content across print, digital, and broadcast formats. It covers books, news, newspapers, magazines, digital media platforms, and content creation and distribution.
Media+Publication-TWO - Talk with OshoPermalink
- Github Repo - This is an eduational GPT on ChatGPT. It is based on selected books of Osho. It is a prototype, because there is a limit of books loads on ChatGPT. In future when this constraints will be removed, this project will be updated with more books.
Media+Publication-TWSV - Talk with Swami VivekanandaPermalink
- Github Repo - This is Educational GPT on ChatGPT. It is based on the 8 volumes of complete works of Swami Vivekananda.
Summarising A Long Hindi Video into Different English AudioPermalink
- Github Repo - This project is designed to take a long Hindi Video (e.g., a YouTube video) and summarize it into one English Summary audio.
Fakenews-DetectionPermalink
HBQASPermalink
NewsClassification-20GroupsPermalink
Multiclass classification. Overall Test accuracy: 0.88 6 classes of news are : rec.sport.hockey, rec.motorcycles, rec.sport.baseball, rec.autos, talk.politics.guns, talk.religion.misc, sci.med, sci.electronics, sci.space, sci.crypt, misc.forsale, comp.os.ms-windows.misc, comp.graphics, comp.sys.ibm.pc.hardware, comp.windows.x, comp.sys.mac.hardware, soc.religion.christian, talk.politics.mideast, alt.atheism, talk.politics.misc
Class 0:, P: 0.46, R: 0.50, F1: 0.48, support: 12.00 Class 1:, P: 0.83, R: 0.52, F1: 0.64, support: 29.00 Class 10:, P: 0.95, R: 0.99, F1: 0.97, support: 124.00 Class 11:, P: 1.00, R: 0.97, F1: 0.98, support: 31.00 Class 12:, P: 0.67, R: 0.73, F1: 0.70, support: 45.00 Class 13:, P: 0.75, R: 0.88, F1: 0.81, support: 43.00 Class 14:, P: 0.81, R: 0.92, F1: 0.86, support: 38.00 Class 15:, P: 0.92, R: 0.55, F1: 0.69, support: 20.00 Class 16:, P: 0.87, R: 0.97, F1: 0.92, support: 103.00 Class 17:, P: 0.93, R: 0.93, F1: 0.93, support: 14.00 Class 18:, P: 1.00, R: 0.39, F1: 0.56, support: 18.00 Class 19:, P: 0.91, R: 0.89, F1: 0.90, support: 81.00 Class 2:, P: 0.62, R: 0.69, F1: 0.66, support: 29.00 Class 3:, P: 0.67, R: 0.52, F1: 0.58, support: 31.00 Class 4:, P: 0.85, R: 0.68, F1: 0.76, support: 25.00 Class 5:, P: 0.89, R: 0.92, F1: 0.91, support: 26.00 Class 6:, P: 0.89, R: 0.69, F1: 0.77, support: 35.00 Class 7:, P: 0.90, R: 0.89, F1: 0.89, support: 108.00 Class 8:, P: 0.88, R: 0.97, F1: 0.93, support: 117.00 Class 9:, P: 0.94, R: 0.97, F1: 0.96, support: 121.00
SDSHLPermalink
Toxic-CommentPermalink
Twitter-Sentiment-AnalysisPermalink
YELP-Review-PredictionPermalink
- Github Repo
- Colab - Fine_Tuning_Transformer_BERT_Customer_Review
- Colab - Yelp customer_review_classification
SalesPermalink
Sales is overlap of e-Commerce and Retail. To avoid the confusion anything which is related to sales of big items like Car, House or any Capital items are put in sales domain. They may online or via a physical shop.
House-Price-PredictionPermalink
CarPricePermalink
CarSalesPermalink
Lead-ConversionPermalink
TelecomPermalink
This domain covers telecommunications services and infrastructure, including phone networks, internet providers, satellite communications, and emerging technologies like 5G, enabling global connectivity and communication.
Telecom ChurnPermalink
Public Safety and SecurityPermalink
This sector involves efforts to maintain public order, safety, and security in communities. It includes law enforcement, emergency services, disaster response, and security solutions for protecting people and assets.
Barcelona AccidentsPermalink
Indian Judiciary - Verdict DatasetPermalink
AgriculturePermalink
This domain encompasses activities related to farming, crop production, livestock management, and the broader agricultural supply chain. It also includes agricultural technology (agritech), sustainable farming practices, and rural development.
EducationPermalink
This sector covers educational institutions, e-learning platforms, and educational technology (edtech) designed to facilitate teaching, learning, and research. It spans K-12, higher education, corporate training, and continuous learning.
Infrastructure (Infra) DevelopmentPermalink
This domain encompasses the planning, design, and construction of physical facilities and systems, including transportation, telecommunications, water supply, and utilities essential for supporting economic activity.
DatasetsPermalink
====================================================
My Technology Stack - Developed/ Created/ Expended workPermalink
Works in this section are listed according to Technology/Tech Product/POC
LLM/Agentic AIPermalink
CrewAIPermalink
Agentic-LLMPermalink
- Github Repo - Agentic-LLM/Bee-Agentic-Framework
- Github Repo - Agentic-LLM/Create-Custom-Audio-Summaries
BigSciencePermalink
Cloud-AWSPermalink
- Github Repo - Cloud-AWS/AWS-Amazon-Bedrock-for-Serverless-LLM
- Github Repo - Cloud-AWS/AWS-Amplify
- Github Repo - Cloud-AWS/AWS-Runner
- Github Repo - Cloud-AWS/AWS-SageMaker
Cloud-GCPPermalink
Cloud-GroqPermalink
DeepseekPermalink
EleutherAIPermalink
LangchainPermalink
MLOpsPermalink
- Github Repo - MLOps/ML-Pipelines
- Github Repo - MLOps/Pydantic+logfire
- Github Repo - MLOps/cog
- Github Repo - MLOps/comet.com
- Github Repo - MLOps/naptune
MetaAIPermalink
- Github Repo - MetaAI/finetune-llama2
- Github Repo - MetaAI/finetune-llama3-8b
- Github Repo - MetaAI/ollama
- Github Repo - MetaAI/ts.llamaindex
MistralAIPermalink
RAGPermalink
StabilityAIPermalink
VercelPermalink
huggingfacePermalink
lmstudioPermalink
openAIPermalink
- Github Repo - openAI/open-webui
- Github Repo - openAI/openAI
- Github Repo - openAI/openai-quickstart-python
otteraiPermalink
quantizationPermalink
AI/ML - Technology StackPermalink
API-DevelopmentPermalink
- Github Repo - API-Development/Python-Jinja
- Github Repo - API-Development/curl
- Github Repo - API-Development/fastapi_example
AnalyticsPermalink
- Github Repo - Analytics/PowerBI
- Github Repo - Analytics/PyGWalker
- Github Repo - Analytics/bokeh
- Github Repo - Analytics/pandas
- Github Repo - Analytics/plotly
- Github Repo - Analytics/tableau
AudioPermalink
- Github Repo - Audio/GAN-MusicGeneration
- Github Repo - Audio/Speech-Recognition
- Github Repo - Music-Vocal-Separation
CICDPermalink
CMS-Content-Management-SystemsPermalink
CUDAPermalink
CVPermalink
- Github Repo - CV/Flower-Prediction
- Github Repo - CV/ImageAugmentation
- Github Repo - CV/ImageProcessing
- Github Repo - CV/MNIST-Experiments
- Github Repo - CV/Object-Detection-InBrowser
DB-BigdataPermalink
- Github Repo - DB-Bigdata/Bigdata-HiveScoop
- Github Repo - DB-Bigdata/Bigdata-mySQL
- Github Repo - DB-Bigdata/mongodb
DB-DatabasesPermalink
DEPermalink
DockerPermalink
- Github Repo - Docker/Dockerfile-golang1.21-alpine
- Github Repo - Docker/Dockerfile-php7.2-apache
- Github Repo - Docker/Dockerfile-python3.9-slim
- Github Repo - Docker/Dockerfile-tensorflow-latest
- Github Repo - Docker/Gemini-Docker-by-Hari
- Github Repo - Docker/Jupyter-from-Docker-with-GPU
- Github Repo - Docker/Jupyter-from-Docker-wo-GPU
- Github Repo - Docker/bindmount-apps
- Github Repo - Docker/colab-in-docker-in-local
- Github Repo - Docker/count-program
- Github Repo - Docker/demo1
- Github Repo - Docker/welcome-to-docker
IOTPermalink
LaTexPermalink
MLPermalink
- Github Repo - ML/Classification
- Github Repo - ML/Clustering
- Github Repo - ML/DataImbalance
- Github Repo - ML/ML-Retraining
- Github Repo - ML/ROC
- Github Repo - ML/Regression
MLFrameworksPermalink
- Github Repo - MLFrameworks/TensorFlow-ImageRecognition
- Github Repo - MLFrameworks/mobilenet-v2
- Github Repo - MLFrameworks/tensorflow-js
- Github Repo - MLFrameworks/tensorflow-lite
NLPPermalink
- Github Repo - NLP/Embedding
- Github Repo - NLP/NLP-Concepts
- Github Repo - NLP/NLP-Hindi-Bible
- Github Repo - NLP/NLP-Misc
- Github Repo - NLP/NLP-Plugin20Event
- Github Repo - NLP/NLP-SanskritTrans
- Github Repo - NLP/NLP-rasa
Python-AutomationPermalink
- Github Repo - Python-Automation/1.Table-Extraction
- Github Repo - Python-Automation/2.Automate-The-News
- Github Repo - Python-Automation/3.Excel-Report
- Github Repo - Python-Automation/4.WhatsApp
- Github Repo - Python-Automation/datasets
- Github Repo - Python-Automation/spanish
- Github Repo - Python-Automation/test-folder
PythonicWayPermalink
- Github Repo - PythonicWay/8things-you-must-know
- Github Repo - PythonicWay/ManagingPythonProjects
- Github Repo - PythonicWay/python_pkg
R-ProjectsPermalink
RLPermalink
StatsPermalink
TimeseriesPermalink
Transfer-LearningPermalink
UI-DesignPermalink
Web+MobilePermalink
- Github Repo - Web+Mobile/Android
- Github Repo - Web+Mobile/JS
- Github Repo - Web+Mobile/Java
- Github Repo - Web+Mobile/Nodejs
- Github Repo - Web+Mobile/nginx-web-app
AI/ML - ForkedPermalink
- Github Repo - 100Days-ML
- Github Repo - AgenticAI-agent-zero
- Github Repo - automl-mljar-supervised
- Github Repo - automl-pycaret
- Github Repo - automl-tpot
- Github Repo - bolt.diy
- Github Repo - evalml
- Github Repo - gcp-python-docs-samples
- Github Repo - GFPGAN
- Github Repo - google-ai-edge-litert-samples
- Github Repo - google-gemini-cookbook
- Github Repo - intel-scikit-learn-intelex
- Github Repo - langchain
- Github Repo - langchain-ai-langchain-academy
- Github Repo - langgraph
- Github Repo - Learning-Pandas-Second-Edition
- Github Repo - LeetCode-js
- Github Repo - microsoft-generative-ai-for-beginners
- Github Repo - ML+DL-Code-for-my-YouTube-Channel-Rohan
- Github Repo - mlops-logfire
- Github Repo - packages
- Github Repo - Pyramid-Flow
- Github Repo - Sandhi_Prakarana
- Github Repo - stanford_alpaca
- Github Repo - supabase
- Github Repo - tensorboard
- Github Repo - tensorflow
- Github Repo - tensorflow-examples
- Github Repo - tensorflow-tfjs-examples
- Github Repo - tensorflow-tfjs-models
- Github Repo - vector-admin
- Github Repo - visualization-forks
- Github Repo - Browse-use
Web+Mobile App Development - POC WorkPermalink
- Android
- Falcon_Android
- ImageRecognition
- Java
- nodejs
- react