18 minute read

Project Index PagePermalink

This is “master project page” therefore it is linking to my different project categories. To know about my work in different domain you can click on the link of your interest.

IntroductionPermalink

This page is about sharing my project work, capabilities, expertise, abilities, understanding about business domain, technology solutions and approaches.

  • This page list all my Github Repo - (private + public).
  • These github repositories are related to all my projects, consulting, courses and POC (proof of concepts), technology explorations.
  • This also has listed some imported forked repositories. Some of these I forked to extend the existing one, some I forked for teaching, some were forked to build my solutions.
  • These projects are either sharing my Project Management capabilities in different domains including IT.
  • These projects ar also discussing my Technology capabilities specially around AI, Deep Learning, GenAI, NLP and Analytics.
  • The purpose of this listing is,
    • to help other’s knowing what is possible and what I have explored.
    • to remind myself what I already have explored and worked vs what didn’t work during exploration.

Tech SkillsPermalink

  • LLM Expertise: Prompt Engineering, Finetuning & Deployment models
    • Models: Llama, chatGPT, GPT4, Bard, LLaMA, LaMDA, PaLM, Gemma, Claude, Mistral, T5, Flan, BERT, Phi and various others
    • Model UI: Ollama, LMStudio, OpenWebUI.
  • ML Model Development: Feature Engineering, Tuning, Evaluation, Cross-Validation, Classical ML, NLP metrics, egression/Classification/Clustering, Ensemble Trees, Decision Tree, Random Forest, SVM.

  • AutoML: Automated ML (PyCaret, TPOT).

  • MLOps/DevOps:

  • Deep Learning / NLP & Embedding: Huggingface, RNN, LSTM, GRU, Transformers, BERT, FastText, NLTK, SpaCy, Embedding, Keras, PyTorch, TensorFlow, OpenAI, Embedding Transfer, CV model evaluation, CNN, YOLO

  • Big Data & Cloud: Hadoop, Spark, PySpark, Kafka, NoSQL (Cassandra, MongoDB)

  • Cloud Platforms: AWS, GCP, Azure, AWS Sagemaker, Aure AutoML, VertexAI, Oracle AI

  • ML Frameworks: Tensorflow, Tensorflow lite/LiteRT, Tensorflow.js, Pytorch

  • Data Visualization: PowerBI, Tableau, Plotly, Seaborn, Matplotlib,

  • Mobile/Web App Dev: Flask, Gradio, Streamlit, Android Studio, Flutter

  • Programming Laguages: Python, R, Package Managers (pip, conda, npm), Dart

  • IDE/CLI/SDK: Visual Code, Cursor, Visual Studio, Eclipse, Android Studio, Flutter

  • Markup Language: Markdown, LaTex, HTML/CSS

  • Statistics: Descriptive/Inferential Statistics, Prescriptive Statistics in AI.

AI/ML/DL, GenAI, LLM, Analytics, Technology Work SummaryPermalink

My POC and Technology StacksPermalink

Summary of My Project Management Work/ProjectsPermalink

AI/ML DatasetsPermalink

There is no dearth of datasets but during training sessions when I or my learners need some dataset that we need to struggle for these datasets. Either they are removed ore renamed or internet availablity/restriction etc issue waste lot of time. To avoid that I have created this Github Repo - of datasets. These are for classical machine learning. They are not for deeplearning or LLM, until mentioned specifically.

AI/ML - Industries - Developed/ Created/ Expanded workPermalink

Projects in this section are listed according to Industry/Business Domain. Sometimes it is difficult for me to find out in what domain a particular project falls into, therefore I have created this page to decide the domain.

BFSI (Banking, Financial Services, and Insurance)Permalink

BFSI includes financial institutions, banks, insurance companies, investment firms, and other entities offering services such as lending, investment, wealth management, and financial protection. This sector is heavily regulated and technology-driven for security and risk management.

Credit-Fraud-DetectionPermalink

DoeJones-Prediction-with-NewsPermalink

Loan-ApprovalPermalink

HR (Human Resources)Permalink

This domain covers employee management, recruitment, training, compensation, and workplace culture. It also includes HR technology and services related to people management and organizational development.

HR Analysis of Employee Attrition & PerformancePermalink

HealthPermalink

The health domain includes healthcare providers, hospitals, pharmaceuticals, health insurance, and healthcare technology focused on improving patient care, medical research, and public health initiatives. This vertical will not include project related to Health-Infra development.

Liver Patient AnalysisPermalink

Breast-Cancer-PredictionPermalink

Chest-XRay - Effusion SegmentationPermalink

Chest-XRay - Effusion ClassificationPermalink

Covid-worldwide-EDAPermalink

India-Covid-GraphsPermalink

Malaria-Detection_depPermalink

pnemonia_predictionPermalink

EnergyPermalink

This domain involves the production, management, and distribution of energy, including fossil fuels, renewable energy (solar, wind, hydro), nuclear power, and energy conservation technologies, along with grid management.

UK-Energy-ConsumptionPermalink

AirQuality-PredictionPermalink

ClimatePermalink

Climate and energy are inter-related, therefore to avoid confusion any project related to Energy will not come in climate vertical. This domain focuses on climate science, environmental monitoring, and sustainability initiatives, including research and development on climate change, renewable energy, environmental policy, and green technologies to reduce carbon footprints

Acea Smart Water Analytics & PredictionPermalink

Objective

  • The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).
  • This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other.
  • It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.
  • Github Repo
  • Colab
  • Acea Water Prediction & Analysis - Kaggle
  • Kaggle Dataset

eCommercePermalink

The e-commerce domain comprises online platforms and businesses that facilitate buying and selling goods and services over the internet. It includes marketplaces, payment processing, logistics, and digital retailing.

Black Friday Sales Data Analysis PredictionPermalink

Amazon Sentiment AnalysisPermalink

Bigdata-AmazonReviewsPermalink

Online-Retail-Customer-ClusteringPermalink

Recommendation System Amazon ElectronicsPermalink

Economics and International TradePermalink

This field involves the study and application of economic theories, policies, and data analysis to understand markets, consumer behavior, global trade, and financial trends. It serves as the foundation for economic research, policy-making, and financial planning.

Economy-AnalysisPermalink

Prosperity-ClusteringPermalink

Marine Consultant - GOIPermalink

  • Github Repo - This is a GPT on chatGPT prototype which helps them planning strategy for bilateral or multilateral engagements with other countries.

ElectronicsPermalink

The electronics domain includes the design, manufacturing, and distribution of electronic devices and components, such as semiconductors, consumer electronics, computing hardware, and embedded systems.

Hand Gesture RecognitionPermalink

Industrial SafetyPermalink

Industrial safety focuses on workplace safety standards, risk management, and protocols to protect employees and prevent accidents in industrial environments. It includes safety training, hazard assessments, and regulatory compliance.

Industrial Accident Cause AnalysisPermalink

OSHA Accidents and InjuryPermalink

Tourism, Hospitality, Hotel, Restaurant and Event ManagementPermalink

The hospitality domain involves businesses that provide accommodation, food, and leisure services, such as hotels, resorts, restaurants, and cafes, focusing on guest experiences and comfort.

Zomato ReviewPermalink

Indian Food Item Recommendations in RestaurantsPermalink

FoodDemand ForcastPermalink

Travel & LogisticPermalink

The Travel & Logistics domain encompasses the movement of people and goods. It includes various industries such as transportation, warehousing, distribution, and supply chain management for both individuals and businesses. The focus in this domain is on efficient, timely, and cost-effective transport, as well as providing seamless travel experiences. This sector is heavily influenced by technology for tracking, route optimization, and resource management. This domain has some overlap with eCommerce and Sales.

Flight Delay Analysis using HivePermalink

This dataset contains the 2004-2005 flights data from the 2009 ASA Statistical Computing and Graphics Data Expo consisted of flight arrival and departure details for all commercial flights on major carriers within the United States of America from October 1987 to April 2008 Activities (Pipeline) in project:

  • Creating hive table (for storage) from the external files
  • Create partition table schema
  • Parition hive table based on the year and putting data in partition table.
  • Performing sql querries on the partitioned table Links
  • My article on Hive
  • Github Repo
  • Kaggle Dataset 1.34GB

Flight Delay Analysis - 2008 (Bigdata)Permalink

The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website. Apache Hive is a data warehousing and SQL-like query engine built on top of Hadoop. Hadoop has Hadoop Distributed File System (HDFS). It can handle distributed storage and processing of the data in hand. Hive can handle billions of transactions. We can perform any kind of SQL Query without bothering whether aggregation functions or filter function will be ever completed or not. Hive can handle all CRUD operations. Dataset contains airlines daily flight information like Origin,Dest, Distance,DepTime, CRSDepTime, ArrTime, CRSArrTime, UniqueCarrier, FlightNum, TailNum, ActualElapsedTime, CRSElapsedTime, AirTime, ArrDelay, DepDelay, TaxiIn, TaxiOut, Cancelled, CancellationCode, Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay, LateAircraftDelay. Airlines wanted to analyze the of last 20 years. Links

Tech Stack: Hadoop/HDFS, Hive, SQL, HiveQL, ORC (Optimized Row Columnar) or Parquet, Python, Matplotlib/Seaborn.

Flight Delay and Cancellation Analysis - 2015Permalink

The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website.

NYC Parking - 2008Permalink

NYC Parking - 2004-2005 (Bigdata and pySpark)Permalink

NYC Parking - 2015Permalink

NYC Parking - 2017Permalink

Driver Availablity PredictionPermalink

Uber CancellationPermalink

Vehicle ClassificationPermalink

Vehicle TrackingPermalink

Entertainment, Games & SportsPermalink

This sector covers the creation/production, distribution, and consumption of media, including film, music, gaming, and live performances. It involves production houses, streaming services, and digital content platforms.

Movies-RecommendationsPermalink

Olympic-QA-System-with-GPTPermalink

Media and PublicationPermalink

This domain includes businesses involved in publishing content across print, digital, and broadcast formats. It covers books, news, newspapers, magazines, digital media platforms, and content creation and distribution.

Media+Publication-TWO - Talk with OshoPermalink

  • Github Repo - This is an eduational GPT on ChatGPT. It is based on selected books of Osho. It is a prototype, because there is a limit of books loads on ChatGPT. In future when this constraints will be removed, this project will be updated with more books.

Media+Publication-TWSV - Talk with Swami VivekanandaPermalink

  • Github Repo - This is Educational GPT on ChatGPT. It is based on the 8 volumes of complete works of Swami Vivekananda.

Summarising A Long Hindi Video into Different English AudioPermalink

  • Github Repo - This project is designed to take a long Hindi Video (e.g., a YouTube video) and summarize it into one English Summary audio.

Fakenews-DetectionPermalink

HBQASPermalink

NewsClassification-20GroupsPermalink

Multiclass classification. Overall Test accuracy: 0.88 6 classes of news are : rec.sport.hockey, rec.motorcycles, rec.sport.baseball, rec.autos, talk.politics.guns, talk.religion.misc, sci.med, sci.electronics, sci.space, sci.crypt, misc.forsale, comp.os.ms-windows.misc, comp.graphics, comp.sys.ibm.pc.hardware, comp.windows.x, comp.sys.mac.hardware, soc.religion.christian, talk.politics.mideast, alt.atheism, talk.politics.misc

Class 0:, P: 0.46, R: 0.50, F1: 0.48, support: 12.00 Class 1:, P: 0.83, R: 0.52, F1: 0.64, support: 29.00 Class 10:, P: 0.95, R: 0.99, F1: 0.97, support: 124.00 Class 11:, P: 1.00, R: 0.97, F1: 0.98, support: 31.00 Class 12:, P: 0.67, R: 0.73, F1: 0.70, support: 45.00 Class 13:, P: 0.75, R: 0.88, F1: 0.81, support: 43.00 Class 14:, P: 0.81, R: 0.92, F1: 0.86, support: 38.00 Class 15:, P: 0.92, R: 0.55, F1: 0.69, support: 20.00 Class 16:, P: 0.87, R: 0.97, F1: 0.92, support: 103.00 Class 17:, P: 0.93, R: 0.93, F1: 0.93, support: 14.00 Class 18:, P: 1.00, R: 0.39, F1: 0.56, support: 18.00 Class 19:, P: 0.91, R: 0.89, F1: 0.90, support: 81.00 Class 2:, P: 0.62, R: 0.69, F1: 0.66, support: 29.00 Class 3:, P: 0.67, R: 0.52, F1: 0.58, support: 31.00 Class 4:, P: 0.85, R: 0.68, F1: 0.76, support: 25.00 Class 5:, P: 0.89, R: 0.92, F1: 0.91, support: 26.00 Class 6:, P: 0.89, R: 0.69, F1: 0.77, support: 35.00 Class 7:, P: 0.90, R: 0.89, F1: 0.89, support: 108.00 Class 8:, P: 0.88, R: 0.97, F1: 0.93, support: 117.00 Class 9:, P: 0.94, R: 0.97, F1: 0.96, support: 121.00

SDSHLPermalink

Toxic-CommentPermalink

Twitter-Sentiment-AnalysisPermalink

YELP-Review-PredictionPermalink

SalesPermalink

Sales is overlap of e-Commerce and Retail. To avoid the confusion anything which is related to sales of big items like Car, House or any Capital items are put in sales domain. They may online or via a physical shop.

House-Price-PredictionPermalink

CarPricePermalink

CarSalesPermalink

Lead-ConversionPermalink

TelecomPermalink

This domain covers telecommunications services and infrastructure, including phone networks, internet providers, satellite communications, and emerging technologies like 5G, enabling global connectivity and communication.

Telecom ChurnPermalink

Public Safety and SecurityPermalink

This sector involves efforts to maintain public order, safety, and security in communities. It includes law enforcement, emergency services, disaster response, and security solutions for protecting people and assets.

Barcelona AccidentsPermalink

Indian Judiciary - Verdict DatasetPermalink

AgriculturePermalink

This domain encompasses activities related to farming, crop production, livestock management, and the broader agricultural supply chain. It also includes agricultural technology (agritech), sustainable farming practices, and rural development.

EducationPermalink

This sector covers educational institutions, e-learning platforms, and educational technology (edtech) designed to facilitate teaching, learning, and research. It spans K-12, higher education, corporate training, and continuous learning.

Infrastructure (Infra) DevelopmentPermalink

This domain encompasses the planning, design, and construction of physical facilities and systems, including transportation, telecommunications, water supply, and utilities essential for supporting economic activity.

DatasetsPermalink

====================================================

My Technology Stack - Developed/ Created/ Expended workPermalink

Works in this section are listed according to Technology/Tech Product/POC

LLM/Agentic AIPermalink

CrewAIPermalink

Agentic-LLMPermalink

BigSciencePermalink

Cloud-AWSPermalink

Cloud-GCPPermalink

Cloud-GroqPermalink

DeepseekPermalink

EleutherAIPermalink

LangchainPermalink

MLOpsPermalink

MetaAIPermalink

MistralAIPermalink

RAGPermalink

StabilityAIPermalink

VercelPermalink

huggingfacePermalink

lmstudioPermalink

openAIPermalink

otteraiPermalink

quantizationPermalink

AI/ML - Technology StackPermalink

API-DevelopmentPermalink

AnalyticsPermalink

AudioPermalink

CICDPermalink

CMS-Content-Management-SystemsPermalink

CUDAPermalink

CVPermalink

DB-BigdataPermalink

DB-DatabasesPermalink

DEPermalink

DockerPermalink

IOTPermalink

LaTexPermalink

MLPermalink

MLFrameworksPermalink

NLPPermalink

Python-AutomationPermalink

PythonicWayPermalink

R-ProjectsPermalink

RLPermalink

StatsPermalink

TimeseriesPermalink

Transfer-LearningPermalink

UI-DesignPermalink

Web+MobilePermalink

AI/ML - ForkedPermalink


Web+Mobile App Development - POC WorkPermalink

  1. Android
  2. Falcon_Android
  3. ImageRecognition
  4. Java
  5. nodejs
  6. react

Updated: