China's auto industry has developed with the rapid development of the economy. China has become the world's number one auto consumer market and the world's largest consumer potential market. The growth of automobile production and sales has a significant driving effect on auto finance. China's auto finance penetration rate has increased from 13% five years ago to nearly 40%. With the increase of automobile financial penetration rate, the requirements for risk control ability are getting higher and higher, and related enterprises have begun to combine technology and automobile financial risk control. Due to the high amount of the car loan, the driving fraud agency falsified the information or swindled the creditless recorder to obtain the vehicle through the phased purchase of the car, and then transferred the car through illegal means. Therefore, the identification of anti-fraud risk for car buyers has become the key point of current risk control.
This paper uses the sample data of Internet finance company users to explore the application of machine learning technology on the anti-fraud model, study the relevant characteristics of customers, and propose the establishment of anti-fraud rules. Establish a logistic regression model that is more recognized in industry and use it as a standard. Establish support vector machine, Adaboost, XGBoost anti-fraud model, introduce the main parameters of various models, and explore the optimal combination of parameters to optimize the model effect. Combined with the evaluation in
dex of machine learning classification model, the performance of each model is compared. Finally, it is concluded that XGBoost has the best model effect in this study, and the performance of the model is greatly improved compared with logistic regression. Finally, the feature analysis and anti-fraud model are used to provide reference for the establishment of anti-fraud system, and the future anti-fraud system is expected to develop.
Keywords: Auto Finance,Anti-fraud,Machine Learning,Logistic regression,XGBoost
摘要..................................................................................................................................... I ABSTRACT .......................................................................................................................... I I 第一章绪论.. (1)
1.1研究背景 (1)
1.2研究意义 (1)
1.3文献综述 (2)
1.3.1 国外研究文献综述 (2)
1.3.2 国内研究文献综述 (2)
1.3.3 问题的提出 (3)
1.4研究思路与方法 (3)
1.5研究技术路线图 (4)
第二章相关理论 (5)
2.1汽车金融及反欺诈相关理论 (5)
2.2机器学习相关算法 (5)
2.2.1 逻辑回归 (6)
2.2.2 支持向量机 (7)
2.2.3 决策树 (9)
2.2.4 Bagging与Boosting (11)
2.2.5 随机森林 (11)
2.2.6 AdaBoost (12)
2.2.7 XGBoost (13)
2.2.8 机器学习分类模型的评价指标 (16)
第三章数据介绍与特征工程 (20)
3.1数据介绍 (20)
3.2特征工程 (20)
3.2.1 数据清洗 (21)
3.2.2 连续变量的处理 (21)
3.2.3 分类变量的量化处理 (21)
3.2.4 特征选择 (22)
3.2.5 重要特征分析 (23)
第四章模型训练与优化 (26)
4.1模型建立与模型选择 (26)
4.1.1 逻辑回归模型 (26)
4.1.2 支持向量机模型 (30)
4.1.3 Adaboost模型 (32)
4.1.4 XGBoost模型 (34)
4.2模型对比与分析 (36)
第五章总结与展望 (40)
5.1总结与建议 (40)
5.2不足与展望 (40)
参考文献 (42)
附录1 部分数据 (44)
致谢 (45)
1.1 研究背景
中国的汽车产业随着经济的飞速发展而发展,中国汽车工业协会的调查数据显示,2017年我国汽车销量为2912.25万量,已经连续九年位居世界第一。与2016年相比,我国在2017年汽车销售增长了3.90%。我国的汽车销量从2014年以来就一直保持在2300万以上,销量增长平均速度保存在4%以上。2018年前8个月汽车产销量达到1813万辆,同比增长2.77 %,中国已成为世界第一的汽车消费市场和世界最大的消费潜力市场。
1.2 研究意义