One of the goals of this course is to develop students’ facility to analyze data to study economic problems using econometric methods. To that end, you will complete an empirical project that demonstrates your ability to work with statistical software and data and interpret the results of econometric models.
Your “client” for this project is a mortgage lender that wishes to engage in risk-based pricing for its mortgage loans. The first step in establishing risk-based pricing is the construction of a mortgage “scorecard” that predicts the probability that a loan defaults. You will develop such a scorecard using data that I provide you derived from the Freddie Mac Single Family Loan-Level Dataset; the full version of this data “covers approximately 22.19 million fixed-rate mortgages originated between January 1, 1999 and June 30, 2015 [that were purchased or guaranteed by Freddie Mac].” I will be providing you with a small random sample of loans from a particular origination cohort. You will then be tasked with developing an econometric model that predicts the probability that a given loan defaults, where a default is defined as going 60 days-pastdue or entering foreclosure at any point within 4 years of origination. In what follows a “good” account is one that does not default, whereas a “bad” account is one that does default.