Big Data Analytics
#

Big Data Systems
#

What is Big Data
Data Warehouse, Data Lakes
Hadoop – Components
Storage – HDFS, Hbase
Resource Manager (MapReduce, YARN)
Types of data formats (JSON, ORC, Parquet, AVRO)
Scripting (Hive, Pig)
Stream Processing
Massive Parallel Processing (Spark, Imapala, Mahout)
RDDs in Spark
Data Migration (Scoop/ Flume)
Schedular (Oozie)
Resource Negotiator (Zookeeper)
RDBMS Database
Columnar Database
Multimodel Database
NoSQL (HBase, Cassandra, MongoDB, DynamoDB)
RDBMS (MySQL, PostgreSQL)
CosmoDB
In memory database (Redis)
Spark SQL
Case Study

Stream Processing & Analytics
#

Real Time Streaming Architecture
Service Configuration and Coordination
Data Flow Management, Storing and Processing Streaming Data
Visualization Techniques for Real Time Streaming Data
Aggregation (Timed Counting, Multi Resolution Time Series Aggregation)
Statistical Approximation
Approximating with sketches

PySpark
#

Overview & Installation.
RDD
Dataframe.
Architecture.
MLLib
NLP
Linear regression
Logistic regression
Decision tree
Naive Bayes
XGBoost
Timeseries
Spark Job automation with Scheduler
NYC Parking Case Study: Apache Spark

Follow Me

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Big Data

On This Page

Big Data Analytics
#

Big Data Systems
#

Stream Processing & Analytics
#

PySpark
#

Dr. Hari Thapliyaal

Comments:

Related

On This Page

Big Data Analytics#

Big Data Systems#

Stream Processing & Analytics#

PySpark#

Dr. Hari Thapliyaal

Comments:

Related

Big Data Analytics
#

Big Data Systems
#

Stream Processing & Analytics
#

PySpark
#