Worked on executing Exploratory Data analysis and LDA – Rishi Kalpa Mukherjee, PGP DSBA


Great Learning Blog: Free Resources what Matters to shape your Career!

I am Rishi Kalpa Mukherjee. I graduated with a degree in English Literature B.A. English Honors from Gurudas College, Calcutta University. I have pursued a B.Sc IT diploma from NIIT. I’m currently pursuing a Post Graduate Program in Data Science and Business Analytics from Great Learning, Great Lakes Institute of Management. I’m working as an SME on a CITIBANK project. I’m a team player with a commitment to customer service who possesses a long track record of working in customer service and operations roles with good process-related and domain skills and the ability to communicate confidently at all levels.

The problem statement is based upon the Citibank Credit Card Default Status Analysis, where I managed to get the relevant dataset provided by the operation manager of our CitiBank Security Operation Project, which contains required attributes like Gender, Loan offered Job, WorkExp, Credit Score, EMI Ratio, Status, Credit History, Purpose, and Dependents. I needed to analyze that data by doing and executing Exploratory Data analysis and used the Linear Discriminant Analysis model to get to know whether this model would be relevant and efficient enough for this particular dataset and to get to know about the default. 

Initial Exploratory Data Analysis:

First, we load the dataset by importing the required libraries, and here the data is shown.

Top 10 rows of the dataset:

Bottom 10 rows of the dataset:

Checking the Descriptive Summary of the dataset:

These are the observations made after executing initial data exploration: 

  1. The dataset contains 781 Rows and 11 Columns. 
  2. The size of the dataset is 8591. 
  3. One variable in the dataset is shown as ‘float64’ data type, 5 variables as ‘int64’ and 5 variables as ‘object’ data type.
  4. There are 0 missing values found in the dataset. 
  5. There are no duplicate records in the dataset. 
  6. There is no bad data and no anomalies in the dataset. 

Own House variable being converted into Object Data-type. 

Dataset being cleaned and corrected and being restructured for Credit History variables. 

Numbers of Default and Non-Default for the Target Column: 

We can see that there are 125 defaults as this is our area of interest for data analysis.

Univariate Plot:

Bivariate Plot: 

Bivariate Plot of Work Exp in comparison to Loan Offered shown. 

Strip-Plot Strip-plot of Status variable in comparison to Work Exp variable. We can see most of the strip plot diagram is of Non default as compared to default. 

Distribution of Dependent Variable Categories: 

We can see that the percentage of No default is 84% whereas Default is 16%. 

We can see the plot here of Dependents compared to Status variables and we can easily distinguish that there is less amount of default value.

Linear Discriminant Analysis Model:

Build LDA Model and fit the data. 

CONFUSION

CONFUSION MATRIX

DATA ACCURACY 

By using the LDA Model, it has performed very well for this problem on the basis of accuracy, precision, recall, f1-score and dataset received for providing recommendation and data driven decision ideas to the management and stakeholders. 

INSIGHTS: 

Based upon the analysis, important factors and areas for credit card data analysis are Work Experience, Status, and Loan Offered. 

RECOMMENDATIONS:

The necessary details and information of the Credit Card need to be given to the customer’s defaulters regarding combo offer.

LOAN:

Package offer with new goodies to get rid of default. This would help to retain the customers and would become strong contributors for business growth.



Source link

Great Learning

#Worked #executing #Exploratory #Data #analysis #LDA #Rishi #Kalpa #Mukherjee #PGP #DSBA

By bpci

Leave a Reply