论文总字数:20516字
摘 要
卷积神经网络是一种常见的神经网络,并且在图像处理和语言识别等领域中扮演越来越重要的角色。大多数深层卷积神经网络都运行在多块GPU组成的并行运算平台上。而对于一些要求小尺寸低功耗的场合,如一些嵌入式平台中,GPU由于体积大功耗大不是很适合。
现场可编程门阵列(Field programmable gate array,FPGA)可以应对上述情况。FPGA可以充分利用卷积神经网络运算中的并行性,并且体积小低功耗。高层综合技术(High-level Synthesis,HLS)支持用户采用C/C 等高级语言进行设计,具有描述层次高,有利于提高开发效率以及可移植性强等优点。对此,本文主要做了如下工作:
1)在caffe平台中构建一个经典的卷积神经网络模型Lenet-5,训练得到用于FPGA实现的卷积神经网络的参数(权重和偏置值);
2)在SDSoC(Software Defined System on a Chip)中,设计完成卷积神经网络的各个层,包括卷积层、池化层、全连接层和激励函数层等基本模块;
3)利用HLS所具有的一些优化指令,对于卷积神经网络运算中并行性进行充分的优化。在资源占用和运行速度与优化深度之间寻找到均衡;
4)运用zedBoard开发板实现设计的卷积神经网络,并进行手写数字的测试。
本文最终在保持准确率不降低的情况下,通过FPGA相对于软件提高大约3倍的运算速度,并且充分利用了zedboard的硬件资源。
关键词:现场可编程门阵列,高层综合技术,卷积神经网络,并行结构优化
Abstract
Convolution neural network is a kind of neural networks, which is good at processing data with similar network structure, such as time series and image data, so it is widely used in image recognition and speech recognition. Because convolution neural network is a kind of arithmetic intensive network, most deep convolution neural networks are running on a parallel computing platform composed of many blocks of GPU. For some occasions requiring small size and low power consumption, such as some embedded platforms, GPU is not suitable for large volume and large power consumption.
Field programmable gate array (Field Programmable) can cope with the above situation. FPGA can make full use of the parallelism in the operation of the convolution neural network, and the volume is small and the power consumption is low, so it can be used in the situation where the volume and power are limited. High-level Synthesis (HLS) supports users to design with advanced languages such as C/C , which has the advantages of high level of description, improvement of development efficiency and good portability. In this regard, the main work of ZHANG yi is as follows:
1) a classical convolution neural network model Lenet-5 is constructed in the Caffe platform, and the parameters (weight and bias value) of the convolution neural network for FPGA implementation are obtained.
2) in SDSoC, each layer of convolution neural network is designed, including convolution layer, pool layer, full link layer and excitation function layer.
3) make full use of software development kit (SDK) of HLS to optimize the parallelism of convolution neural network. Balance is found between resource occupation and operation speed and optimization depth.
4)The design of convolution neural network is realized by using zedBoard development board, and handwritten numeral test is carried out.
In the end, this paper improves the computing speed about 3 times relative to the software and makes full use of the hardware resources of zedboard in the case of keeping the accuracy of the FPGA.
KEY WORDS: Field programmable gate array ,HLS, Convolution neural network,
Parallel structure optimization
目录
摘要 III
Abstract iv
第一章 绪论 6
1.1 课题背景与意义: 6
1.2 相关研究现状 7
1.3 本文的研究内容与组织结构 7
第二章 卷积神经网络与FPGA 9
2.1 卷积神经网络简介 9
2.2 FPGA之于卷积神经网络 11
2.2.1 FPGA加速卷积神经网络卓有成效的原因 11
2.2.2 SDSoC简要介绍 12
2.3 本章小结 12
第三章 基于FPGA的关键模块的设计 13
3.1 整个设计框架 13
3.2 Lenet-5在caffe中的构建 13
3.2.1 Lenet-5网络结构 13
3.2.2 caffe中构建模型 14
3.3 各个层在FPGA中的实现 16
3.3.1卷积层: 16
3.3.2池化层: 17
3.3.3全连接层: 17
3.3.4激活函数层: 18
3.4 本章小结 18
第四章 优化与参数 19
4.1量化 19
4.2并行性优化 20
4.2.1 卷积神经网络中的并行性 20
4.2.2 优化操作 21
4.3本章小结 23
第五章 测试 24
5.1 硬件测试方法以及测试集 24
5.2 硬件平台介绍 24
5.3 SDSoC开发流程说明 26
5.4 硬件资源使用与调整参数 27
5.5 硬件结果 32
第六章 结论与展望 34
6.1本文工作总结 34
6.2未来展望 34
致谢 36
参考文献 37
第一章 绪论
课题背景与意义:
深度学习是一种通向人工智能的途径。具体的说,深度学习是机器学习的一种,而卷积神经网络最为常见。卷积神经网络被认为是一受到生物学中的感受野的概念而提出的。 和普通的神经网络相似,卷积神经网络由相互连接的神经元和可学习的权重和偏置值构成。 卷积神经网络是一种擅长于处理具有类似网络结构的数据的神经网络[1],例如时间序列和图像数据。
图1-1 一种典型的卷积神经网络结构示例
剩余内容已隐藏,请支付后下载全文,论文总字数:20516字
该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;