AB - The mathematical formula information retrieval system -MFIRS is designed and implemented, and the architecture of the system is discussed. A similarity indexing method based on the mathematical sub-formula of representation MathML is proposed. The system has the characteristics of mathematical perception. The mathreteval dataset was created using more than 4,500,000,000 arXiv documents and 158,106,118 mathematical formulas, and on this dataset, The scalability of the system is verified. The front end of the system uses a web interface that allows users to retrieve complex queries consisting of plain text and mathematical formulas that can be written in TEX or MathML. When a user queries with TEX, the system is able to instantly convert it into a MathML tree representation and index it. The system is a mathematical formula information retrieval engine with mathematical perception characteristics, which can be retrieved by sub-formula similarity and the index of adjacent mathematical formula is realized.
