Data is generally divided into structured and unstructured data. Structured data refers to the normative and predictable organization of entities and relationships. Most of the data that needs to be processed belongs to unstructured data.
Information extraction has been used in many fields, such as business intelligence, resume harvesting, media analysis, sentiment detection, patent search, and email scanning. A particularly important area of current research is the extraction of structured data from electronic scientific literature, particularly in the biological and medical fields.
The biological/medical literature often involves a variety of complex experimental systems and specific experimental factor processing, as well as the results of various composite experiments produced after these treatments.
At present, the core task of the researchers can be regarded as a three-step cycle: 1) designing the results of the experimental results (based on the previous experimental results) 2) generating these experimental results (including Prepare relevant pre-materials/equipment or other resources) 3) Interpret and communicate with other researchers to produce experimental results (face-to-face format, text (scientific papers)/video and other non-direct contact media formats)
For most researchers, it is required to takes a lot of time to read these scientific papers containing various complex reaction systems and experimental factors (often including misleading or fraudulent interferences). Extract some structured and unstructured data. The experimental results produced in this article correspond to the reaction systems (cell lines, animal models or humans), the specific reaction system and the reaction conditions constitute the number and type of experimental treatment factors (genetic function studies such as the introduction of mutations, Overexpression genes, silencing genes, etc.), the originality of these experimental processing factors under various testing techniques (such as image data; cell morphology; original format of high-throughput sequencing data; cell line survival time, proliferation ability; survival time of mice; five-year survival time of tumor patients); experimental results after treatment of excavation (correlation and interaction of various factors; regulatory network; degree of enrichment, etc.).
Natural language processing technology is rapidly evolving, making it possible for us to automatically organize and summarize all kinds of information expressed or implied in the scientific literature. Especially in the biological and medical fields, related needs are becoming more and more urgent.
Based on the blueprint of openbiox determined practical projects, the natural language processing and scientific literature information extraction technology learning/practice group will be formally established, and will eventually be responsible for and complete a specific medical literature information extraction practice project.
基于前期 openbiox 的实践项目设想，现正式建立自然语言处理与科学文献信息提取技术学习/实践小组，最终将负责并完成某个特定医学文献信息抽提实践项目。