碎纸片拼接复原的算法研究
2015-04-26崔艳星
郭 伟,崔艳星
(长治学院 数学系,山西 长治 046011)
1 引言
破碎文件的拼接在司法物证复原、历史文献修复以及军事情报获取等领域都有着重要的应用。传统上,拼接复原工作需由人工完成,准确性较高,但效率较低。当碎片数量巨大,人工拼接很难在短时间内完成任务。随着计算机技术的发展,人们试图开发研究碎纸片的自动拼接技术,以提高拼接复原效率。现对以下问题进行讨论:对给定的来自同一页印刷文字文件的碎纸机破碎纸片(仅纵切),建立碎纸片拼接复原模型和算法,并对附件1给出的一页文件的碎片数据进行拼接复原。复原结果表达以图片形式表达。
2 复原算法思想描述
对于给定的来自同一页印刷文字文件的碎纸机破碎纸片(仅纵切);首先,通过观察这些字的特点,可以通过人工干预[1]将纸条左边全是整字、符号的碎纸条找出来,这就找出了这些纸条中的第一条。同样,我们可以确定出最后一条碎片。因为每条碎纸片中离边缘最近的那些字,有的是整字,有的是半字,有的是符号,有的是空格,所以对碎片的拼接有一定的难度.为了使问题简单化,我们联想到把文字转化为数字,其中这些数字用“0”或者“1”来代替;半个字用0表示,整个字、符号、空格都用1表示。
其次,我们把纸条中最左边转化成的数字按所在行形成左向量,同样最右边的那些文字形成一个右向量。这样每个碎纸条都可以用两个向量来表示。进而我们就把碎纸片文字的拼接问题转化为向量是否相等的问题了。如果一个纸条的左向量和另一个纸条的右向量相等,说明这两条纸条可拼接到一起,这个过程可利用c#编程完成排序[2]。
最后,根据上述排序结果,利用MATLAB程序可以检验c#编程运行结果并实现纸片的复原[3,4]。
3 复原算法的实现
首先,从对于给定的来自同一页印刷文字文件的碎纸机破碎纸片(仅纵切)的向量中找出初始向量。因为破碎前纸片的最左边的字都是整字,我们可以通过人工干预,很容易把都是整字的碎纸条找出来,即左向量是[0,0,…,0,0]的纸条找出来。这样我们就找出了初始向量。假如找到的初始向量为Li,则初始向量所在纸条的右向量为。
其次,寻找与Ri相等的向量Lj(其中i 不等于j)。假如Ri-Lj=0,即Ri与Lj相等,则第i-1条碎片与第j-1条碎片拼接成功.成功后再用Rj寻找下一条。假如Rj-Lk≠0,则说明Rj与Lk不相等,则第j-1条碎片与第k-1条碎片拼接不成功,不成功的话再用Rj-Lk是否等于0来寻找下一条,{其中,k≠j};假如Ri-Lj≠0则假如有Ri-Lh=0,则拼接成功.假如Ri-Lh≠0,则依次类推。
最后,由Rm-Ln=0.则碎纸片排序完成。对于上述过程。我们建立了如下的流程图:
例如:下图是给定碎片,共19支(从左至右编号000~018),现将其复原。
首先,找到其中左侧为整字的碎片,向量表示结果见附录1,上图中第一条碎片是第8条碎片,最后一条碎片是第6条碎片,所以在用该流程时,i=9,n=7。然后,利用MATLAB程序,检验结果并将碎纸片拼接复原。具体程序见附录2.复原图片结果分别见附录3。
4 附录
附录1
L1=[1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1]
R1=[0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0,0,0,0,0,1]
L2=[1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,1,0,0]
R2=[0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0]
L3=[0,0,0,0,0,0,1,1,0,0,1,0,1,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1]
R3=[0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0]
L4=[0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0]
R4=[0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0]
L5=[0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0]
R5=[0,0,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,1,1,1,1,0,0,0,0,0,0,0]
L6=[0,0,0,0,1,1,0,1,0,1,0,0,0,1,0,1,0,1,1,1,1,0,0,0,0,0,0,0]
R6=[0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,1,0,0]
L7=[0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0,0,0,0,0,1]
R7=[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
L8=[1,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0]
R8=[0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1]
L9=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
R9=[1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0]
L10=[0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,1,0,0]
R10=[0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0]
L11=[0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0]
R11=[0,0,0,0,0,0,1,1,0,0,1,0,1,1,0,0,1,0,1,0,0,0,1,0,1,0,0,1]
L12=[0,0,1,1,1,1,1,0,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,1,0,0,1,0]
R12=[1,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0]
L13=[1,0,0,1,0,0,0,1,0,1,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1,0,0,1]
R13=[0,0,1,0,1,1,1,0,1,0,1,1,0,1,1,0,0,1,0,1,1,1,0,1,1,1,1,1]
L14=[0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0]
R14=[1,0,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0]
L15=[1,0,1,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0]
R15=[1,0,0,1,0,0,0,1,0,1,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1,0,0,1]
L16=[0,0,1,0,1,1,1,0,1,0,1,1,0,1,1,0,0,1,0,1,1,1,0,1,1,1,1,1]
R16=[0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0]
L17=[0,1,0,0,0,0,0,0,1,1,0,1,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0]
R17=[1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,1,0,0]
L18=[0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,1]
R18=[1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1]
L19=[1,0,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0]
R19=[0,0,1,1,1,1,1,0,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,1,0,0,1,0]
附录2
I1=imread ('008.bmp');
I2=imread ('014.bmp');
I3=imread ('012.bmp');
I4=imread ('015.bmp');
I5=imread ('003.bmp');
I6=imread ('010.bmp');
I7=imread ('002.bmp');
I8=imread ('016.bmp');
I9=imread ('001.bmp');
I10=imread ('004.bmp');
I11=imread ('005.bmp');
I12=imread ('009.bmp');
I13=imread ('013.bmp');
I14=imread ('018.bmp');
I15=imread ('011.bmp');
I16=imread ('007.bmp');
I17=imread ('017.bmp');
I18=imread ('000.bmp');
I19=imread ('006.bmp');
I=[I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,I11,I12,I13,I14,I15,I16,I17,I18,I19];imshow(I)
附录3
[1]韩中庚,数学建模方法及其应用,第二版[M].北京:高等教育出版社,2009.
[2]王聚丰,杨启帆.数学建模案例集[M].北京:高等教育出版社,2003.
[3]司守奎,孙玺菁.数学建模算法与应用[M].北京:国防工业出版社,2011.
[4]贾海燕,碎纸自动拼接,http://www.docin.com,2013年9月15日[DB/OL].