INTRO TO DATA SCIENCE, ML AND AI
1.1. What is Data Science?
1.2. What is ML? Parametric , non parametric
1.3. What is AI?
1.4. What a Data Scientist can provide solutions?
1.4.1. Predict Number(Regression Analysis)
1.4.2. Predict Category(Categorical Analysis)
1.4.3. Group things(Clustering Analysis)
1.4.4. Find Odds out(Anomaly Detection)
1.4.5. Automate Decision(Reinforcement Learning)
1.5. Types of Machine Learning
1.5.1. Supervised Learning
1.5.2. Unsupervised Learning
1.5.3. Reinforcement Learning
1.6. Files Types
1.6.1. Structured, Semi-Structured, Unstructured
PYTHON
2.1. Data Types
2.1.1. int, float,boolean, list, dictionary, tuple,set, string
2.2. Conditional and Looping Statements
2.2.1. If, If..Else, For, While, For each
2.3. Range, Enumerate, Lambda functions, List Comprehension
2.4. Python for Data Science(Numpy and Pandas)
STATISTICS AND EXPLORATORY DATA ANALYSIS
3.1. Types of data
3.1.1. Numerical
3.1.2. Categorical
3.2. Exploratory Data Analysis(Univariate, Bivariate Analysis)
3.2.1. Data summarization methods; Tables, Graphs, Charts, Histograms,
Frequency distributions, Relative frequency measures of central tendency
and dispersion; Box Plot, etc
3.2.2. Numeric
3.2.2.1. Measure of central tendency
3.2.2.1.1. mean, median, mode, midrange, weighted mean
3.2.2.2. Measure of variation
3.2.2.2.1. range, variance, standard deviation, mean
deviation,coefficient of variation
3.2.2.3. Measures of position
3.2.2.3.1. percentile, quartiles, Interquartile Range, decile, outliers
3.2.2.4. Five Point Summary
3.2.2.4.1. Min,1st Quartile, median, 3rd Quartile, max
3.2.2.5. Data distribution
3.2.2.5.1. Continuous and discrete distributions, Transformation of
random variables
3.2.2.5.2. distribution, Skewness[symmetry] and kurtosis[peak]
3.2.2.6. Charts
3.2.2.6.1. Scatter Plot, Box Plot, Histogram
3.2.3. Categorical
3.2.3.1. Measurements
3.2.3.1.1. Frequencies(total count), , likelihood table, Levels, Group
count, proportion, percentage
3.2.3.2. Charts
3.2.3.2.1. Pie Chart, Bar Chart
3.2.4. Null
3.2.4.1. NA, NaN frequency
3.2.4.2. NAs count, Empty Values Count, NULL values count
3.3. Advanced Statistics
Central Limit Theorem, Random Variable, Probability Density Function,
Probability Mass Function, Distribution – Normal, Binomial, Uniform, P – Value,
T- test, F- Statistics, Student Test, Chi-Square Test, Hypothesis Testing, A/B
Testing, Correlation and Covariance
PROBABILITY
4.1.Trail, Experiment
4.2. Odds & Events – Dependent Events, Independent Events
4.3. Conditional Probability
4.4. Bayes Theorem
4.5. Probability Density Function
MATHEMATICS
5.1. Algebra, Linear Algebra, Vector and Matrix Algebra, Eigenvalues, Eigenvectors,
Calculus, Set Theory
MACHINE LEARNING
6.1. Data Preprocessing
6.1.1. Data Cleaning:
6.1.1.1. Filling in missing values
6.1.1.2. Smoothing the noisy data
6.1.1.3. Resolving inconsistencies in the data.
6.1.1.4. Treatment of Outliers
6.1.2. Data Transformation:
6.1.2.1. Log Transformation, Cube Root, Square Root(skewness)
6.1.2.2. Min-Max Normalization(scaling)
6.1.2.3. Z-Score Standardization (scaling)
6.1.3. Data Reduction:
6.1.3.1. PCA
6.1.4. Data Discretization:
6.1.4.1. Binning
6.1.5. Handling Imbalanced dataset
6.2. Regression
6.2.1. Simple Linear Regression
6.2.2. Multiple Linear Regression
6.2.3. Decision Tree Regression
6.2.4. Random Forest Regression
6.2.5. Support Vector Regression
6.2.6. K-NN for Regression
6.3. Classification
6.3.1. Naive Bayes
6.3.2. Decision Tree
6.3.3. Random Forest(type of ensemble ML model, bagging or bootstrapping
aggregation)
6.3.4. K-NN for Classification(using distance and angle)
6.3.5. Logistic Regression
6.3.6. Linear Discriminant Analysis
6.3.7. Support Vector Machine
6.4. Clustering
6.4.1. Centroid Model – K-Means clustering
6.4.2. Connectivity Model- Hierarchical clustering
6.4.3. Distribution Model – Expectation-Maximization Algorithm
6.4.4. Density Model – DBSCAN
6.5. Time Series Analysis and Forecasting
6.6. Association Rules
6.6.1. Apriori – Market Basket Analysis
6.7. Dimensionality Reduction
6.7.1. PCA
6.7.2. LDA
6.7.3. Kernel PCA
6.8. Model Evaluation Metrics
6.8.1. Classification
6.8.1.1. Confusion Matrix
6.8.1.2. ROC and AUC Curve
6.8.1.3. Log Loss
6.8.2. Regression
6.8.2.1. MAE
6.8.2.2. MSE
6.8.2.3. RMSE
6.8.2.4. MAPE
6.8.2.5. MPE
6.9. Model Tuning
6.9.1. Bias – Variance Trade off(overfit, underfit, best fit)
6.9.2. K-Fold Cross Validation
6.9.3. Algorithm Parameter Tuning
BUSINESS USE CASES – PROJECTS
7.1. Natural Language Processing – Text Mining, NLP
7.2. Recommendation Engine(Amazon, Netflix)
7.3. Click Through Rate Prediction(Digital Marketing)
7.4. Spam Mail Detector
7.5. Diabetes Prediction
7.6. Automobile Data – Exploratory Data Analysis
DEPLOYMENT AND PRODUCTIONIZATION
8.1. Deploy Machine Learning models in Production as APIs using Flask
8.2. Deploy Machine Learning models in Production as APIs Azure ML
BIG DATA
9.1. Hadoop Architecture, HDFS
9.2. Hive
9.3. Spark SQL
9.4. SparkML