
 2022-06-23 20:05:30


摘 要




关键词:机器学习,信用评级,支持向量机,决策树,Logistic 回归


The bank-oriented credit rating method for individuals is currently valued by major banks. However, the core algorithm of each model has its limitations. This paper attempts to optimize the selection of the model core algorithm based on the capacity of the data set, so as to obtain a solution based on the current situation.

This paper uses a Python-based model construction to simulate the bank's credit rating process for ordinary users, and tests the time cost and prediction accuracy of the model through a 10-fold crossover test and ROC curve. Finally, it is found in the initial stage, that is, when there is no training set. The use of decision tree method is the most efficient and accurate, because the method is most consistent with the traditional business logic to facilitate conversion, and does not need to consider the implicit relationship between data; when the data set capacity is within a certain range (in this article, 0-50) The use of support vector opportunities has a higher prediction accuracy, but the time consumption is significant; when the data collection capacity is large, the use of logistic regression will appear even better.

According to the comparative study of the three models, it is found that the core algorithm can be selected by selecting the optimal training core capacity. Through this method, the phenomenon of over-fitting and the like can be effectively avoided.

KEY WORDS: Machine Learning, Credit Rating, Support Vector Machine, Decision Tree, Logistic Regression


摘要 I

Abstract II

第一章 引言 1

1.1 研究背景及意义 1

1.1.1 研究背景 1

1.1.2 研究意义 1

1.2 基本概念 2

1.2.1 信用 2

1.2.2 信用评价 2

1.2.3 普通用户 3

1.3 研究方法 3

第二章 信用评价发展概述及文献综述 5

2.1 信用评价理论发展历程 5

2.1.1 古典信用评价方法 5

2.1.2 统计方法 5

2.1.3 现阶段信用评分方法 6

2.2 国内外文献综述 6

2.2.1 银行面向普通用户的评级系统 6

2.2.2 机器学习方法 7

第三章 三类用户信用评价机器学习算法的比较 9

3.1 计算机语言和相关的库 9

3.1.1 Python语言 9

3.1.2 Scikit-learn库 9

3.1.3 Pandas库 9

3.2 数据集选取及介绍 10

3.3 数据集预处理 12

3.4 支持向量机 13

3.4.1 算法介绍 13

3.4.2 算法数学解释 13

3.4.3 核函数 16

3.5 逻辑回归 16

3.5.1 算法的数学解释 16

3.5.2 过拟合现象 17

3.6 决策树方法 17

3.6.1 算法简介 17

3.6.2 局限性和解决方法 17

3.7 基于scikit-learn库的违约判定模型构建 18

3.8 结果分析 18

3.8.1 ROC曲线比较 18

3.8.2 十折交叉检验结果和运行时间比对 22

第四章 基于复合算法的信用评价模型构建 23

4.1 复合算法原理 23

4.2 复合算法实现途径 23

4.3 复合算法实例验证 23

第五章 结论与展望 25

致 谢 26

参考文献 27







您需要先支付 80元 才能查看全部内容!立即支付
