What are your career goals this year?
- Master analytical skills.
- Learn from the best & get mentored by an industry expert.
- Land your dream job in the dynamic analytics industry.
- Be able to Code in Python and solve Data Analysis problem.
- Participate in Kaggle competitions & hackatons.
NumPy: First step towards Data Analysis using Python, we will cover the following topics
- Why to use NumPy –> Motivating Examples
- NumPy arrays – creation, methods, and attributes
- Basic math with arrays
- Manipulation with arrays
- Using NumPy for simulations
Data Analysis with Pandas
- Pandas Series & all operations with it.
- Pandas DataFrames & all operations with it.
Matplotlib – Needed for visualizing data.
In this course we are not going to plot with Matplotlib because we will use higher level libraries for plotting: Seaborn and Pandas. However since both of these libraries are built on top of Matplotlib we need to acquire the basic terminology and concepts of Matplotlib because frequently we will need to make modifications to the objects and plots produced by those higher level libraries. Therefore this lesson is not a complete introduction to Matplotlib, we will learn just enough so we can get started visualizing data.
Exploratory Data Analysis with Seaborn and Pandas
Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It is used to understand the data, get context about it, understand the variables and the relationship between them, and formulate hypothesis that could be useful when building predictive models.
Assignment_1:EDA with Python Pandas (30-45 mins)
In this Assignment, you will do-&-learn:
- importing datasets, dealing with missing values, changing data types.
- filtering, sorting, selecting specific column(s).
- dealing with duplicate values, dropping and adding rows and columns.
- counting values, counting unique values.
Assignment_2: Data Analysis of Customer Attrition(45 – 60 mins)
Telephone service companies, Internet service providers, pay TV companies, insurance firms, and alarm monitoring services, often use customer attrition analysis and customer attrition rates as one of their key business metrics because the cost of retaining an existing customer is far less than acquiring a new one. Companies from these sectors often have customer service branches which attempt to win back defecting clients, because recovered long-term customers can be worth much more to a company than newly recruited clients.
Assignment_3: Data Analysis of Customer Churn rate at a Telecom Co.
Analyzing a dataset on the churn rate of telecom operator clients.
Note : This dataset is different from the one’s used in the earlier Assignment
Module 5 : Doing Statistics using SciPy
The SciPy package contains various toolboxes dedicated to common issues in scientific computing. Its different submodules correspond to different applications, such as interpolation, integration, optimization, image processing, statistics, special functions, etc. SciPy is the core package for scientific routines in Python; it is meant to operate efficiently on NumPy arrays, so that NumPy and SciPy work hand in hand.We would be learning SciPy .stats only.
Module 6_A: Time Series Analysis – Introduction & terminology
Module 6_B: Time Series Analysis – Application in real-time
Module 6_C: Time Series Forecasting – Concepts & Problem Definition
Module 6_D: Time Series Forecasting – Problem Solving
Module_6deals with predicting the electricity consumption of a household for the next three months, estimating traffic on roads at certain periods, and predicting the price at which a stock will trade on the BSE or NSE.
They all fall under the concept of time series data. You cannot accurately predict any of these results without the ‘time’ component. And as more and more data is generated in the world around us, time series forecasting keeps becoming an ever more critical technique for a data analyst or business analyst to master.
Compulsory Projects to be done as a part of course.
Project 1 :
Analyze the dataset of “Census Income” of year 1994, by solving many queries.
(Data set source: https://archive.ics.uci.edu/ml/datasets/Adult)
Limited Googling is allowed. Max time limit for this project 60 min.
Project 2 :
Analyze the dataset of “IPL matches” for last 10 years, by solving many queries.
(Data set source: provided in class)
Limited Googling is allowed. Max time limit for this project 90 min.
Project 3 :
In this Project, we’ll be working with daily time series of Open Power System Data (OPSD) for Germany, which has been rapidly expanding its renewable energy production in recent years.
The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017.