基于FPGA的高速FIR数字滤波器设计的改进方法*

2015-03-24颜毅华陈林杰刘东浩陈志军

天文研究与技术 2015年1期

关键词：正则滤波器冲击

赵岸，颜毅华，陈林杰，刘东浩，王威，陈志军

(1. 中国科学院国家天文台太阳活动重点实验室，北京 100012; 2. 中国科学院大学, 北京 100049)

基于FPGA的高速FIR数字滤波器设计的改进方法*

赵岸1,2，颜毅华1，陈林杰1，刘东浩1，王威1，陈志军1

(1. 中国科学院国家天文台太阳活动重点实验室，北京 100012; 2. 中国科学院大学, 北京 100049)

在高速有限冲击响应(Finite Impulse Response, FIR)数字滤波器的设计中，随着滤波器阶数的增加，保持数据流速率和有效使用硬件资源成为设计的一个重点和难点。基于高速并行有限冲击响应数字滤波器的基本原理，提出了一种将位平面法、正则有符号系数(Canonical-Signed Digit, CSD)编码算法和抽取算法应用于并行有限冲击响应数字滤波器的改进方法。设计通过Matlab仿真，在Quartus II中编译、仿真、综合后下载到现场可编程门阵列(Field Programmable Gate Array, FPGA)中进行测试，结果显示，这种改进方法较好地解决了滤波器阶数和数据流速率与硬件资源之间的关系。

高速并行滤波器；位平面法；正则有符号系数编码算法；抽取算法

随着数字信号处理在理论及技术上的不断发展，数字技术以其在设计及实现上的可移植性和高可靠性，正在迅速地取代部分模拟技术[1]。在射电望远镜的接收设备中，越来越多的数字处理方法被应用到数字接收机中，成为射电望远镜接收机的重要组成部分[2-3]。其中，有限冲击响应数字滤波器的设计是一种基本数字技术，至关重要。对于射电天文观测信号，在数字处理部分目前采用GHz量级的模数转换器(Analog-to-Digital Converter, ADC)，这样形成的高速率数据流在存储上形成瓶颈，使得数字滤波器硬件设计必须考虑数据降速或分流。传统的直接型滤波器运算速度过慢，随之提出的改进型的分布式算法(Distributed Arithmetic, DA)结构的滤波器提高了运算速度[4]，但是需要过高的芯片面积，消耗大量的逻辑资源，从而很难达到运算速度以及逻辑资源节约的整体优化。而基于简化加法器图(Reduced Adder Graph, RAG)算法的有限冲击响应滤波[5]使资源得到优化，但是在速度上又慢于改进型的分布式算法结构滤波器。并行滤波器在处理速率上比串行滤波器有明显的优势，但在现场可编程门阵列设计上，传统的并行滤波器的滤波器阶数、接收机带宽和通道数等参数都受到硬件资源(比如乘法器单元、存储单元数目)的限制[6]。本文提出了一种改进的高速并行有限冲击响应数字滤波器的设计方法，使得硬件资源占用率大幅降低，数据速率降为MHz量级，较好地解决了滤波器阶数和数据流速率与硬件资源之间的关系，并且通过仿真和实验证实了设计的可行性。

1 高速并行有限冲击响应滤波器

有限冲击响应数字滤波器具有以下两个特点：一是具有很好的稳定的线性相位特性，从而保证信号在传输过程中不会失真；二是具有有限长的单位冲击响应，因而可用快速傅里叶变换算法实现处理信号，可大大提高运算效率[7]。因为目前常用的模数转换器已达到GHZ量级，且位数也达到16 bit，这就需要提高数据传输速度，高速有限冲击响应滤波器就是需要在算法上进行改进以使滤波器能够处理高速率数据。有限冲击响应数字滤波器的输出y(n)可以看作是滤波器系数与输入信号的卷积和，设x(n)为输入信号，h为滤波器的冲击响应，其表达式如(1)式[8]：

(1)

对于一个长度为N的有限冲击响应滤波器，可以用L个长度为N/L的有限冲击响应滤波器实现，称之为并行有限冲击响应滤波器[9-10]。以8路并行滤波器为例，h(n)、x(n)、y(n)对应的z变换形式分别为

(2)

(3)

(4)

8路并行滤波器的实现形式为

(5)

展开为

Y0(z8)+Y1(z8)z-1+Y2(z8)z-2+Y3(z8)z-3+Y4(z8)z-4+Y5(z8)z-5+Y6(z8)z-6+Y7(z8)z-7

=[H0(z8)+H1(z8)z-1+H2(z8)z-2+H3(z8)z-3+H4(z8)z-4+H5(z8)z-5+H6(z8)z-6+H7(z8)z-7] (6)

×[X0(z8)+X1(z8)z-1+X2(z8)z-2+X3(z8)z-3+X4(z8)z-4+X5(z8)z-5+X6(z8)z-6+X7(z8)z-7]，

整理上式，其中：

Y0(z8)=H0(z8)X0(z8)+[H1(z8)X7(z8)+H2(z8)X6(z8)

+H3(z8)X5(z8)+H4(z8)X4(z8)+H5(z8)X3(z8)

+H6(z8)X2(z8)+H7(z8)X1(z8)]z-8.

(7)

其余几项类推，在此不详细列出。图1直观地表示上述公式关系，其中D代表一个单位时间的延时，左侧的X以及X的延时值作为输入，经过中间的矩阵运算得到右侧的Y输出。

图1 并行滤波输入输出关系矩阵
Fig.1 The matrix relating input and output in the parallel filter

2 改进的高速并行有限冲击响应滤波器

2.1 正则有符号系数编码

浮点数x用有符号二进制数表示如(8)式：

(8)

字长M+1的二进制数中最多包含有L个非零位。正则有符号系数编码则是具有最少非零元素的编码方法[11]。这种编码方法相比二进制补码系统平均减少33%的非零项。

2.2 位平面法

位平面法能够重新排列滤波器中加法和乘法的顺序[12]。图2是量化位数为M的N抽头有限冲击响应滤波器的位平面结构。将所有滤波器系数转换为正则有符号系数码，在第1个位平面，各位正则有符号系数码与对应输入信号各位相乘，然后相加。第1位平面的输出结果进行移位，等待其他位平面的结果相加。其他位平面的处理方法类似，只是移位位数不同。

2.3 应用正则有符号系数编码和位平面法的抽取式高速并行有限冲击响应滤波器

根据8路并行滤波器的推导公式，输出的8路信号并串转换结果等价于串行滤波器的结果。直接使用并行结果某一相位的输出，相当于对串行结果进行了8倍的抽取，对于射电天文望远镜中Gpsp量级采样率的模数转换器，这样倍数的抽取完全符合奈奎斯特准则，不会造成结果失真。这样进一步减少了运算量，同时还完成了抽取功能，极大地降低了硬件资源的占用。

图2N抽头FIR滤波器的位平面结构

Fig.2 Illustration of the bit-plane structure in anN-tap FIR filter

在高速并行有限冲击响应滤波器的设计中，将系数使用正则有符号系数编码，同时采用位平面法和抽取的方法，在硬件上能减少加法和乘法的次数，有效提高运算速度，减少资源的占用。具体设计过程如下：

(1)将输入信号进行8倍的抽取；

(2)将滤波器系数进行8倍的抽取；

(3)将滤波器系数量化、转化成为11位正则有符号系数编码；

(4)对输入信号和滤波器系数依据(7)式进行运算，运算时，应用位平面法，先将输入信号各点与对应滤波器系数的各平面的值相乘，再进行加减运算，最后移位(位数对应位平面的序号)；

(5)将移位后的结果相加即为滤波器输出的结果。

3 仿真与测试

为了验证上述设计，对其进行仿真和测试。首先确定滤波器参数，综合考虑现场可编程门阵列的硬件资源的限制，为了获得相对较好的幅频性能，采用等纹波法设计127阶滤波器，对其系数进行10位量化，幅频响应和相位特性如图3。通带范围设计为 [0.02, 0.03] Fnyq (奈奎斯特频率)，最高旁瓣抑制水平为-60 dB。

其次，对上述127阶滤波器系数应用改进的并行滤波算法进行Matlab仿真，输入信号为高斯白噪声，仿真结果如图4，从上至下依次为输入信号、滤波器系数、串行滤波器输出结果、并行滤波器输出结果、改进的并行滤波器输出结果和串并滤波器误差信号，从图中可以看出，串并滤波器误差信号低至10-16数量级，表明上述算法的可行性。

图3 带通滤波器幅频响应和相位特性

Fig.3 The characteristic curve of amplitude vs. frequency (upper panel) and that of phase vs. frequency (lower panel) for the response of the bandpass filter

图4 串行、并行、改进并行滤波器仿真结果比较
Fig.4 Comparison between simulated responses of the serial filter, the parallel filter, and the improved parallel filter

具体到现场可编程门阵列的实现，在CycloneII EP2C50F672上进行综合和测试，采用的参数配置见表1。

表1 滤波器的现场可编程门阵列设计参数配置Table 1 The parameter values of the FPGA design of a filter

表2和表3分别给出了并行滤波和改进并行滤波的QuartusII软件综合结果。对比可以看出改进并行滤波算法能够明显减少现场可编程门阵列资源的占用，使其他参数水平得到有效提高。

表2 并行滤波器现场可编程门阵列资源占用情况Table 2 The resources taken by the parallel filter in the FPGA

表3 改进的并行滤波器现场可编程门阵列资源占用情况Table 3 The resources taken by the improved parallel filter in the FPGA

采用QuartusII进行仿真，滤波器幅频响应如图5，最大旁瓣抑制比达到-50 dB。Matlab和QuartusII仿真结果基本一致，验证了算法实现的正确性。

图5 改进的并行滤波器的幅频响应
Fig.5 The characteristic curve of amplitude vs. frequency for the response of the improved parallel filter

将编译结果下载到硬件中进行测试，输入信号为50～450 MHz的中频白噪声信号，加入了25 MHz的单频信号，图6表示输入信号(上图)和滤波抽取后信号(下图)的频谱。

图6 基于现场可编程门阵列的改进并行滤波算法测试结果
Fig.6 Test results of the improved parallel filter based on the FPGA

从测试结果可以看出，输入的单频信号经过改进的并行滤波器得到了与理论相符的输出，信号的功率输出理论值输入输出差10 dB，实测输入输出差9.88 dB，相对误差1.2%，这证明了改进的并行滤波器的正确性，且达到了1 Gpsp的处理速度。

4 结论

本文通过对有限冲击响应并行滤波算法进行分析，提出了一种改进的高速并行滤波算法，通过将位平面法、正则有符号系数编码算法和抽取算法应用于并行有限冲击响应数字滤波器，提高了滤波器处理数据的速度，较好地解决了滤波器阶数增加与数据流速率和硬件资源之间的关系，并且利用Altera公司的Cyclone器件实现了一个127阶改进的高速并行有限冲击响应滤波器，通过仿真和测试验证了设计的可行性。在实际中，应根据滤波器性能要求，在现场可编程门阵列资源的占用与数据流速率之间找到一个平衡点。总之，改进型高速并行滤波器设计为射电望远镜中数字滤波器的设计提供了一种新的思路。

[1] 窦玉江, 颜毅华, 王威, 等. CSRH的数字光传输[J]. 天文研究与技术——国家天文台台刊, 2013, 10(1): 13-16. Dou Yujiang, Yan Yihua, Wang Wei, et al. Digital fiber-optic transmission of the CSRH[J]. Astronomical Research & Technology——Publications of National Astronomical Observatories of China, 2013, 10(1): 13-16.

[2] 王威, 颜毅华, 陈志军. CSRH 阵列设计研究[J]. 天文研究与技术——国家天文台台刊, 2013, 10(1): 17-21. Wang Wei, Yan Yihua, Chen Zhijun. Array configuration design for CSRH[J]. Astronomical Research & Technology——Publications of National Astronomical Observatories of China, 2013, 10(1): 17-21.

[3] 王威, 窦玉江, 颜毅华, 等. CSRH灵敏度分析[J] . 天文研究与技术——国家天文台台刊, 2013, 10(1): 22-25. Wang Wei, Dou Yujiang, Yan Yihua, et al. Analysis of the sensitivity of the CSRH[J]. Astronomical Research & Technology——Publications of National Astronomical Observatories of China, 2013, 10(1): 22-25.

[4] 曲仕茹, 彭纪昌. 一种在FPGA上实现的FIR滤波器的资源优化算法[J]. 电子设计工程, 2013, 21(14): 147-150. Qu Shiru, Peng Jichang. A resource optimizing algorithm in FPGA based FIR digital filters[J]. Electronic Design Engineering, 2013, 21(14): 147-150.

[5] 崔亮, 张芝贤. 基于FPGA设计的FIR滤波器的实现与对比[J]. 电子设计工程, 2012, 20(20): 168-170. Cui Liang, Zhang Zhixian. Realization and comparison of the FIR based on FPGA[J]. Electronic Design Engineering, 2012, 20(20): 168-170.

[6] 彭宇, 姜红兰, 杨智明, 等. 基于DSP和FPGA的通用数字信号处理系统设计[J]. 国外电子测量技术, 2013, 32(1): 17-21. Peng Yu, Jiang Honglan, Yang Zhiming, et al. Design of general digital signal processing system based on DSP and FPGA[J]. Foreign Electronic Measurement Technology, 2013, 32(1): 17-21.

[7] 鲁迎春, 李祥, 汪壮兵. 高速FIR滤波器设计与FPGA实现[J]. 合肥工业大学学报(自然科学版), 2007, 30(12): 1705-1707. Lu Yingchun, Li Xiang, Wang Zhuangbing. Design of high-speed FIR filters and implementation based on FPGA[J]. Journal of Hefei University of Technology: Natural Science, 2007, 30(12): 1705-1707.

[8] 李泽明, 李锦明, 杨燕姣. 基于FPGA的高阶FIR滤波器设计[J]. 科学技术与工程, 2013, 13(23): 6903-6906. Li Zeming, Li Jinming, Yang Yanjiao. The design of high-order FIR filter based on FPGA[J]. Science Technology and Engineering, 2013, 13(23): 6903-6906.

[9] 张维良, 张彧, 杨再初, 等. 高速并行FIR滤波器的FPGA实现[J]. 系统工程与电子技术, 2009, 31(8): 1819-1822. Zhang Weiliang, Zhang Yu, Yang Zaichu, et al. FPGA implementation of high speed parallel FIR filters[J]. Systems Engineering and Electronics, 2009, 31(8): 1819-1822.

[10]翟海涛, 杨军, 朱江. 一种基于FPGA的高速FIR滤波器的设计[J]. 信息化研究, 2009, 35(4): 26-28. Zhai Haitao, Yang Jun, Zhu Jiang. Design of high-speed FIR filter based on FPGA[J]. Informatization Research, 2009, 35(4): 26-28.

[11]丁伟. 基于FPGA的FIR低通滤波器的设计与实现[J]. 舰船电子工程, 2013, 33(10): 117-119. Ding Wei. Design and realization of FIR low-pass filter based on FPGA[J]. Ship Electronic Engineering, 2013, 33(10): 117-119.

[12]王一海, 俞筱楠, 姜志鹏. 并行分布式算法FIR滤波器的FPGA实现[J]. 电子器件, 2012, 35(5): 545-548. Wang Yihai, Yu Xiaonan, Jiang Zhipeng. FPGA design of FIR digital filter based on distributed arithmetic[J]. Chinese Journal of Electron Devices, 2012, 35(5): 545-548.

CN 53-1189/P ISSN 1672-7673

A Design of an Improved High-Speed FIR Digital Filter Based on the FPGA

Zhao An1,2, Yan Yihua1, Chen Linjie1, Liu Donghao1, Wang Wei1, Chen Zhijun1

(1. Key Laboratory of Solar Activity, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China;2. University of Chinese Academy of Sciences, Beijing 100049, China, Email: azhao@bao.ac.cn)

With steady theoretical and technological development of digital signal processing, digital devices are rapidly replacing some analog devices due to their portability and highly reliable designs/implementations. In radio astronomy digital-processing techniques have been increasingly applied in receivers, and have become important parts of receivers. The design of an FIR digital filter is critical in implementing digital techniques. In digital-processing modules for signals in radio-astronomy observation Analog-to-Digital Converters of operating frequencies at a few GHz are usually used. At so high frequencies high-rate data flows can form bottlenecks in data-storage processes. To avoid bottlenecks the hardware design of a digital filter needs to limit the data speed or to create diversions of data flows. The operating speed of a conventional filter is too slow though. Distributed Arithmetic (DA) algorithms have been proposed to improve speeds of conventional filters, but it is very difficult to achieve the optimal balance between the operating speed and the required resource of logic units in a conventional filter. As a result a conventional filter generally takes a large fraction of the chip area and uses a large amount of logic units. An FIR filter based on the Reduced Adder Graph algorithm can reduce the needed resources of logic units, but is slower than an improved DA filter. The issue of achieving a balance between data-rate performance and hardware-resource requirement becomes increasingly important and yet also increasingly difficult in designing high-speed FIR digital filters, as filters tend to have more taps. In this paper we present a new design of a parallel FIR digital filter by using the basic theory of high-speed parallel FIR digital filters, the bit-plane construction method, the CSD coding technique, and a signal-extraction algorithm. After having been simulated in the Matlab, the design was complied, simulated, and synthesized in the Quartus II; it was finally loaded into an FPGA device for test measurements. Our simulation and test results demonstrate the capability of our design in solving issues of achieving balances between the filter order, data-flow rate, and requirement of hardware resources. In practice, such balances can be realized using settings tailored to specific requirements on filter performance. In conclusion, our design of an improved high-speed FIR digital filter provides a new idea for designing digital filters to be used in radio telescopes.

High-speed parallel filter; Bit-plane method; CSD coding; Signal-extraction algorithm

国家自然科学基金 (11003028)；国家重大装备研制项目 (ZDYZ2009-3) 资助.

2014-02-18；修定日期：2014-03-21

赵岸，女，博士. 研究方向：天文技术与方法. Email: azhao@bao.ac.cn

P111.44

1672-7673(2015)01-0109-08