Abstract [eng] |
In practice document formats are increasingly used one XML file or a set of XML files. XML provides a good document data readability. This format is widely used in the exchange of information between used application and between systems. In this case, it is expected that both sent and received documents will have uniform and unchanging specifications. The problem comes when in the exchange of data, originating side does not give specifications for the structure on which the data will be sent and has no obligation to maintain the latter. Another occurring problem – data reading data in the documents that have XML based structure (MS Office, Open Office suite documents, Magic Draw project files). Specifications for these files are not published publicly, are incomplete or are voluminous. The creation of software components that work with the documents in question, takes a lot of time. The aim of this thesis is to suggest ways to help automate creation of these libraries and their regeneration. In this paper, several approaches are offered on how, using information gathered about incoming XML file structure, to be able to generate code for serialization and how to load this document processing library into .NET environment without interrupting running software. Analysis showed that all consideration of methods can be applied depending on the goals and the knowledge of incoming documents. XSD evolution based approach is appropriate when you need to maintain all possible versions of the documents. However it increases the size of specification. Weight-based approach provides specifications that are not much different from the real document specifications. This method could be applied if the document specification does not change the substance. Blending these methods disables full version support; however generated document specification is able to abandon obsolete items. When using some ideas of genetic (evolutionary) algorithm, method manages to improve the readability of incoming documents in the cost of differences between generated specification and the original document specification. |