Data Science Python

Great skills are necessary to undertake an intensive and challenging Course that is taught and examined with Multiple Tools. To help students gain the necessary tool skills, we can provide Data Science course training with tools that can help them improve their course proficiency. Students can find course details here Python-programming with hands on 15 + case studies List of Case Studies. Altogether 250+ hours of programme taught by expertise with 20 years of through experience in the field of data science.

Module1 1

Foundations of Date Science: Data Visualization and Interpretation

Part -1 Referential details for Data science Business Analytics

 Scope & Fact of Data Science and Business analytics

 SWOT Analysis of Data Science Business Analytics

 Introduction to Advanced Data Analytics

 Journey Mathematics-Statistics-Econometrics

 Flow chart for Data Science and Business Analytics

 Data wherehouse conceptual discussions

 Hadoop for Data Science

 OLTP OLAP for Data information

 Web Application report

Part-2: Descriptive Statistics:

 Descriptive Statistical

 Inferential Statistics

 Types of Variables

 Measures of central tendency

 Data Viability Dispersion

 Five number Summary Analysis

 Data Distribution Techniques

 Exploration Techniques for Numerical data

 Exploration techniques for Character Data

 Visualization Exploration

 Summary Exploration

 Chebychev’s Inequality.

Part-3: Basic Probability for Business Issues:

 Simple Probability

 Marginal Probability

 Joint Probability

 Conditional probability (linked with decision Tress Algorithms)

 Bayes’ Theorem probability (linked with Naïve Bayes Algorithms)

 Discrete Distributions

 Binomial Distribution

 Hypergeomatric Distributions

 Poisson Distribution

 Continuous Distributions

 Normal Distribution and Properties

 Scandalized Distributions

Part-4: Sampling Techniques Big Data

 Sampling Distributions

 Simple Random

 Systematic Sample

 Stratified sample

 Cluster Sample

 Standard Error of the Mean

 Skewed Std. Error

 Kurtosis Std. Error

 Central Limit Theorem,

 Sampling from Infinity

 Sampling Distributions for Mean

 Sampling Distributions for proportions

Module 2

Data Preprocessing and Imputation

Part-5: Data Validation Data Normality

 Unvariate normality techniques

 Bivariate techniques

 Multivariate techniques

 Q-Q probability plots

 Cumulative frequency

 Explorer analysis

 Steam and leaf analysis

 Histogram

 Box plot

 Scores for Normality Check

 Kolmogorov Smirnov test

 Shapiro Wilks test

 Anderson darling test

Part – 6 Data Cleaning process Quality check

 PCA for Big Data Analysis or Unsupervised data

 PCA Regression Scores for Supervised aata

 Noise Data detecting

 Data cleaning with Regression Residual

 Data Scrubbing with statistical sense

Part-7: Data Imputation and outlier treatment

 Outlier treatment with robust measurements

 Outlier treatment with central tendency Mean

 Outlier with Min Max Likelihood methods

 Outlier Detection with Density Based

 Visualize Outlier Treatment

 Outlier with Residual Analysis

 Outlier Detection with PCA Analysis

 Data Imputation with series Central Tendency

Part-8: Test of Hypothesis

 Null Hypothesis formulation

 Alternative Hypothesis

 Type I and Type II errors

 Power Value

 One tail and Two tail

 One Sample T-TEST

 Paired T-TEST

 Independent Sample T-TEST

 Analysis of Variance ( ANOVA),

 MANOVA

 Chi Square Test

 Kendall Chi Square

 Kruskal-Wallis Rank Test Chi Square

 Mann-Whitney, Chi Square

 Wilcoxon, Chi Square

 McNemar test Chi Square

Part-9: Data Transformation

 Log transformation

 Box- Cox transformation

 Square root transformation

 Inverse transformation

 Min Max Data normalization

Module 3

Predictive Analytics: Supervised Learning Algorithms

Part-10: Predictive modeling & Diagnostics

 Correlation

 SLR Regression

 MLR Regression

 Examination Residual analysis

 Auto Correlation

 Test of ANOVA Significant

 VIF Analysis

 Test of Ttest Significant

 CP Indexing

 Eigen Value for PCA Analysis

 Homoscedasticity

 Heteroskedasticity

 Stepwise regression

 Forward Regression

 Backward Regression

 Multicollinearity

 Cross validation

 MAPE

 Check prediction accuracy

 Standized regression

 Transformed Regression

 Dummy Variables Regression

Part-11 Logistic Regression Analysis

 Logistic Regression

 Discriminate Regression Analysis

 Multiple Discriminant Analysis

 Stepwise Discriminant Analysis

 Logit function

 Test of Associations

 Chi-square strength of association

 Binary Regression Analysis

 Profit and Logit Models

 Estimation of probability using logistic regression,

 Wald Test statistics for Model

 Hosmer Lemshow

 Nagurkake R square

 Pseudio R square

 Maximum likelihood estimation

 Model Fit

 Model cross validation

 Discrimination functions

 AIC

 BIC (Bayesian information criterion)

 Kappa Statistics

 AIC

 BIC

 Error/ Confusion matrices

 ROC

 APE

 MAPE

 Lift Curve

 Sensitivity

 Misclassification Rating

 Specificity

 Maximum Absolute Error

 Recall

 Miss classification

 Root Final Prediction Error

 Gini Coefficient

 Schwarz’s Bayesian Criterion

Module 4

(Advanced Analytics 1) unsupervised Learning Algorithms

Part-12: Dimension Reduction Analysis

 Introduction to Factor Analysis

 Principle component analysis

 Reliability Test

 KMO MSA tests, Eigen Value Interpretation,

 Rotation and Extraction steps

 Varmix Models

 Conformity Factor Analysis

 Exploitary Factor Analysis

 Factor Score for Regression

Part-13: Cluster Analysis

 Introduction to Cluster Techniques

 Hierarchical clustering

 K Means clustering

 Wards Methods

 Agglomerative Clustering

 Variation Methods

 Maximum distance Linkage Methods

 Centroid distance Methods

 Minimum distance Linkage Method

 Cluster Dengogram,

 Ecludin distance method s

Module 5

Forecasting and Operations  Analytics

NAvie Forecsting

Moving Average

Exponecial smoothing

ARIMA

REfere Time series ppt

Auto-Regressive Integrated

Moving Average (ARIMA) models,

ARIMAX.

Conjoint analysis,

Discriminant analysis.

Module 5

(Advanced Analytics 3) Machine Learning Algorithms

 Prediction

 Support Vector Machines (SVM)

 Binary Regression/Logit Model

 Probit Model

 Na¨ıve Bayes

 Na¨ıve Bayes Multinomial

 Ordinal Regression

 Multinomial Regression

 k-Nearest Neighbor Classification

 Decision Stump

 CHAID Analysis

Recommender Systems,

Collaborative Filtering

Bootstrap Aggregating (Bagging),

Random forest,

Support vector machine

Neural Network

 C4.5 / C5.0

 J48 Pruning, Uprunning

 Decision trees

Module 6

(Advanced Analytics 4) Artificial Intelligence (3 Days)

Introduction to neural networks; rule

based expert systems

Introduction to artificial neural

networks (ANN); Neuron as

computing element; Perceptron:

McCullogh-Pitts model; Backpropagation

algorithm; Multi-layer

Neural Networks

Deep learning algorithms:

Convolutional networks; Recurrent

nets; Auto-encoders;

Deep Learning Platform: H2O.ai;

Dato GraphLab; Tensor Flow

Module 7

Suvervial Analysis

 Mantel—Haenszel Test

 Kaplan-Meier (Product- Limit) Estimator

 Cox’s Proportional Hazards Model

 Cox—Snell Residual

 Hazard Functions

 Proportional Hazards Assumption

Module 8

Big Data Analytics

Introduction to BigData

sources of Big Data

Big Data technologies: Hadoop distributed

file system; Employing Hadoop

Statistical Analysis of Big Data.

Pig

Hive

MapReduce

NoSQL