

  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回


 2021-11-09 21:13:51  

摘 要









With the continuous development of the Internet and the continuous penetration of the Internet into various fields, there are more and more users on social platforms, and the accompanying information expansion and fragmentation have changed the way people get information and reading habits. The more you pursue the least time to get the most and most valuable information, obtaining information from peers and related field experts through the online Q amp; A community has become the most popular way of information exchange. Therefore, research on users in the Q amp; A community can help community operators to accurately locate users, provide different services for different user groups, and promote the development of knowledge payment activities according to their consumption-related behavioral characteristics

This article takes Zhihu as the research object, divides users with different behavior characteristics, and applies visualization technology to analyze users. The main work is as follows:

(1) Designed a multi-threaded crawler based on the Scrapy framework, using a breadth-first strategy, starting with a seed user to crawl basic user information, using MongoDB to store crawler data, a total of nearly 100,000 user data was collected.

(2) According to the behavioral characteristics of users, use the K-means algorithm to cluster users, explain according to the characteristics of each clustered data, perform group analysis on users, and tap potential consumers and potential providers of paid knowledge .

(3) Using basic information about the user, from the perspective of data analysis, analyze the behavior of the user based on the attributes of the region, universities, industries, and the number of followers.

(4) Use visualization technology to visually display the user group, its basic information, and social relationships.

Key Words:Social network; Zhihu Users; Group analysis; Clustering; Web Crawler

目 录

第1章 绪论 1

1.1 研究目的及意义 1

1.2 国内外研究现状 1

1.3 研究内容以及主要工作 2

1.4 论文组织结构 2

第2章 知乎数据采集与数据集构建 3

2.1 Scrapy框架结构 3

2.2 数据采集 4

2.2.1 知乎站点及页面解析 4

2.2.2 数据爬取过程 6

2.2.3 反爬机制应对 7

2.2.4 数据存储 7

2.3 数据预处理 8

2.3.1 数据清洗 8

2.3.2 数据变换 9

2.4 本章小结 9

第3章 基于行为特征的知乎用户分析 10

3.1 特征选取 10

3.2 实验设计与实现 10

3.2.1 聚类算法分类 10

3.2.2 K-means聚类算法 11

3.2.3 实验过程 12

3.3 实验结果分析与解释 15

3.3.1 聚类评估标准 15

3.3.2 聚类结果分析 16

3.4 付费知识潜在消费者与潜在提供者 17

3.5 本章小结 18

第4章 知乎用户分析系统设计与实现 19

4.1 用户分析系统技术概述 19

4.1.1 前端技术 19

4.1.2 后台技术 20

4.2 用户分析系统设计 21

4.2.1 系统总体设计 21

4.2.2 功能模块设计 22

4.3 用户分析系统实现 24

4.3.1 总览分析模块 24

4.3.2 用户群体分析模块 25

4.3.3 用户影响力分析模块 27

4.3.4 人际拓扑关系模块 29

4.4 本章小结 29

第5章 总结与展望 30

5.1 主要结论 30

5.2 研究展望 30

致 谢 32

参考文献 33

附 录 34





您需要先支付 80元 才能查看全部内容!立即支付
