Abstract
puters are widely used in various domain and the rapid development of , more and more event information are stored and processed as the form of an electronic document in puter. The is slowly ing the main carrier of information munication platform, it has e the largest collections of the various informations. As the times of big ing, 80% of the information data is stored on work as unstructured data (natural language, images, videos, etc.).
As the chinese text has the characteristics of unstructure, untandardize and uncertainties, it adoptes the technology roadmap of “text description - normalized expression - structured extraction
- pattern mining” to focus on the temporal attribute information extraction, the classification methods, the resolvelation methods and the extractions methods of the incident event field. It has made a solid theoretical foundation for the study of the extraction of the event information and servived a viable solution for the constructions of the national geographic-based information services.
Firstly, based on the study of the emergencies structured expression, several extraction methods of the Chinese text event property information was proposed, to make it exact to extract the information. For chinese text classification, the SVM model was applied for Chinese text classication and achieve good results. For the non-temporal property information of the emergencies, the rules model and the statistical model was proposed and applied,. Not only the rule model but the statistical model were studied that they can bring different results in the field of natural language processing, so bination methed of the both can be effective to achieve the extraction of the chinese event text in oriented domains. bine method of HMM model and syntax analysis model were finally used in this thesis for text attribute extraction, experiments showed that the method has better results. Finally, the feasibility of the method was proved through the rea
面向领域的文本信息抽取方法分析word论文 来自淘豆网www.taodocs.com转载请标明出处.