# Data Science New Batch Starts on 9 July, 2018

Data Science

Data Science
(With SAS, R, WEKA, and SPSS & Excel)*

Module1: Introduction Data Science

Part -1 Referential details for Data science Business Analytics

 Scope & Fact of Data Science and Business analytics
 SWOT Analysis of Data Science Business Analytics
 Introduction to Advanced Data Analytics
 Journey Mathematics-Statistics-Econometrics
 Flow chart for Data Science and Business Analytics
 Data wherehouse conceptual discussions
 OLTP OLAP for Data information
 Web Application report

Module 2: Data Visualization and Summarization

Part-2: Descriptive Statistics:

 Descriptive Statistical
 Inferential Statistics
 Types of Variables
 Measures of central tendency
 Data Viability Dispersion
 Five number Summary Analysis
 Data Distribution Techniques
 Exploration Techniques for Numerical data
 Exploration techniques for Character Data
 Visualization Exploration
 Summary Exploration
 Chebychev’s Inequality.

Part-3: Basic Probability for Business Issues:

 Simple Probability
 Marginal Probability
 Joint Probability
 Conditional probability (linked with decision Tress Algorithms)
 Bayes’ Theorem probability (linked with Naïve Bayes Algorithms)
 Discrete Distributions
 Binomial Distribution
 Hypergeomatric Distributions
 Poisson Distribution
 Continuous Distributions
 Normal Distribution and Properties
 Scandalized Distributions

Part-4: Sampling Techniques Big Data

 Sampling Distributions
 Simple Random
 Systematic Sample
 Stratified sample
 Cluster Sample
 Standard Error of the Mean
 Skewed Std. Error
 Kurtosis Std. Error
 Central Limit Theorem,
 Sampling from Infinity
 Sampling Distributions for Mean
 Sampling Distributions for proportions

Module 3: Data Preparation and Quality Check

Part-5: Data Validation Data Normality

 Unvariate normality techniques
 Bivariate techniques
 Multivariate techniques
 Q-Q probability plots
 Cumulative frequency
 Explorer analysis
 Steam and leaf analysis
 Histogram
 Box plot
 Scores for Normality Check
 Kolmogorov Smirnov test
 Shapiro Wilks test
 Anderson darling test

Part – 6 Data Cleaning process Quality check

 PCA for Big Data Analysis or Unsupervised data
 PCA Regression Scores for Supervised aata
 Noise Data detecting
 Data cleaning with Regression Residual
 Data Scrubbing with statistical sense
Part-7: Data Imputation and outlier treatment
 Outlier treatment with robust measurements
 Outlier treatment with central tendency Mean
 Outlier with Min Max Likelihood methods
 Outlier Detection with Density Based
 Visualize Outlier Treatment
 Summarized Outlier Treatment
 Multivariate Outlier Detection Mahalanobis Distance
 Multivariate Chi-square statistics
 Outlier with Residual Analysis
 Outlier Detection with PCA Analysis
 Data Imputation with series Central Tendency

Part-8: Test of Hypothesis

 Null Hypothesis formulation
 Alternative Hypothesis
 Type I and Type II errors
 Power Value
 One tail and Two tail
 One Sample T-TEST
 Paired T-TEST
 Independent Sample T-TEST
 Analysis of Variance ( ANOVA),
 MANOVA
 Chi Square Test
 Kendall Chi Square
 Kruskal-Wallis Rank Test Chi Square
 Mann-Whitney, Chi Square
 Wilcoxon, Chi Square
 McNemar test Chi Square

Part-9: Data Transformation

 Log transformation
 Arcsine transformation
 Box- Cox transformation
 Square root transformation
 Inverse transformation
 Min Max Data normalization

Module 4: Predictive & Estimation Models (Supervised earning)

Part-10: Predictive modeling & Diagnostics

 Correlation – Pearson, Kendall, Wilcox
 SLR Regression
 MLR Regression
 Examination Residual analysis
 Auto Correlation
 Test of ANOVA Significant
 VIF Analysis
 Test of Ttest Significant
 CP Indexing
 Eigen Value for PCA Analysis
 Homoscedasticity
 Heteroskedasticity
 Stepwise regression
 Forward Regression
 Backward Regression
 Multicollinearity
 Cross validation
 MAPE
 Check prediction accuracy
 Standized regression
 Transformed Regression
 Dummy Variables Regression

Part-11 Logistic Regression Analysis

 Logistic Regression
 Discriminate Regression Analysis
 Multiple Discriminant Analysis
 Stepwise Discriminant Analysis
 Logit function
 Test of Associations
 Chi-square strength of association
 Binary Regression Analysis
 Profit and Logit Models
 Estimation of probability using logistic regression,
 Wald Test statistics for Model
 Hosmer Lemshow
 Nagurkake R square
 Pseudio R square
 Maximum likelihood estimation
 Model Fit
 Model cross validation
 Discrimination functions
 AIC
 BIC (Bayesian information criterion)

Module 5: Advanced Big Data Analytics

Part-12: Dimension Reduction Analysis

 Introduction to Factor Analysis
 Principle component analysis
 Reliability Test
 KMO MSA tests, Eigen Value Interpretation,
 Rotation and Extraction steps
 Varmix Models
 Conformity Factor Analysis
 Exploitary Factor Analysis
 Factor Score for Regression

Part-13: Cluster Analysis

 Introduction to Cluster Techniques
 Hierarchical clustering
 K Means clustering
 Wards Methods
 Agglomerative Clustering
 Variation Methods
 Centroid distance Methods
 Cluster Dengogram,
 Ecludin distance method s

Module 6: Data Mining (Machine Learning)

Part -14: Data Mining Machine Learning / Artificial Intelligence Functional Models

 Prediction
 Support Vector Machines (SVM)
 Gaussian Models
 Neural Network
Classification Models
 Binary Regression/Logit Model
 Probit Model
 Na¨ıve Bayes
 Na¨ıve Bayes Multinomial
 Ordinal Regression
 Multinomial Regression
 Discriminate analysis
Clustering Models
 DBSCAN
 EM (Expectation Maximization)
 K-Means Clustering
 Simple Cluster
 Hierarchical Cluster
 k-Nearest Neighbor Classification
Tree Models
 Random Forests :Bagging & Boosting
 Decision Stump
 CHAID Analysis
 C4.5 / C5.0
 J48 Pronning, Unproning
 Decision trees
Suvervial Analysis
 Mantel—Haenszel Test
 Kaplan-Meier (Product- Limit) Estimator
 Cox’s Proportional Hazards Model
 Cox—Snell Residual
 Hazard Functions
 Proportional Hazards Assumption

Part-15 Time series

Auto Regression Models
Moving Average Model
Multiplicative model
ARMA Model

Part-16 Model Validation and Testing

 Kappa Statistics
 AIC
 BIC
 Error/ Confusion matrices
 ROC
 APE
 MAPE
 Lift Curve
 Sensitivity
 Misclassification Rating
 Specificity
 Maximum Absolute Error
 Root Final Prediction Error
 Gini Coefficient
 Schwarz’s Bayesian Criterion

Part-17 Hadoop Ecosystem (Big Data Handling)
Pig
Hive
MapReduce
Mahount
NoSQL

Note: * Open source Tools are available, commercial tools(SAS SPSS )we are using trail versions

77-A Journalist Colony, Andhra jyothy Lane , Jubilee Hills, Hyderabad – 500033, India
www.robaservices.com, www.reachoutanalytics.com

Land line +91 40 32910202,

Mobile +91 9700213845