In this report we present a brief overview of Information Extraction, which is an area of natural languageprocessing that deals with finding factual information in free text. The automatic extraction of information from unstructured sources has opened up new avenues for querying, organizing, and analyzing databy drawing upon the clean semantics of structured databases and the abundance of unstructured data.In formal terms, facts are structured objects, such as database records. Such a record may capture areal-world entity with its attributes mentioned in text, or a real-world event, occurrence, or state, withits arguments or actors: who did what to whom, where and when. Information is typically sought in aparticular target setting, e.g.
, corporate mergers and acquisitions. Searching for specific, targeted factualinformation constitutes a large proportion of all searching activity on the part of information consumers.There has been a sustained interest in Information Extraction for over two decades, due to its conceptualsimplicity on one hand, and to its potential utility on the other. We presents various dimensions derivedfrom the nature of the extraction task, the techniques used for extraction, the variety of input resourcesexploited, and the type of output produced.Information Extraction refers to the automatic extraction of structured information such as entities,relationships between entities, and attributes describing entities from unstructured sources. This enablesmuch richer forms of queries on the abundant unstructured sources than possible with keyword searchesalone. When structured and unstructured data co-exist, information extraction makes it possible tointegrate the two types of sources and pose queries spanning them.
The recent decades witnessed a rapid proliferation of textual information available in digital form ina myriad of repositories on the Internet and intranets. A significant part of such information e.g., onlinenews, government documents, corporate reports, legal acts, medical alerts and records, court rulings, andsocial media communication is transmitted through unstructured, free-text documents and is thus hardto search in. This resulted in a growing need for effective and efficient techniques for analyzing free-textdata and discovering valuable and relevant knowledge from it in the form of structured information, andled to the emergence of Information Extraction technologies.
The task of Information Extraction (IE) isto identify a predefined set of concepts in a specific domain, ignoring other irrelevant information, wherea domain consists of a corpus of texts together with a clearly specified information need. In other words,IE is about deriving structured factual information from unstructured text. For instance, consider asan example the extraction of information on violent events from online news, where one is interested inidentifying the main actors of the event, its location and number of people affected.