The idea you wish to view belongs to a community that requires acceptance of terms and conditions.
The high volume of documents currently produced by UN organisations as well as the amount of non-UN documents that have to be processed is posing a significant challenge to the UN system. Effective and efficient information management and accessibility is well beyond the “human processing” capabilities of UN organisations.
To address these challenges, UN organisations must move from the paper paradigm, where documents are “designed for humans to read, not for computer programs to manipulate meaningfully”, to a new form of content “meaningful to computers [that] will unleash a revolution of new possibilities” (Sir Tim Berners-Lee, 2001).
The process of turning documents into “machine-readable” and identifying the information and knowledge to make it possible to deliver innovative information services, has been both highly specialised and labour intensive. UN organisations are not in position to make available the human and financial resources that would be required to produce machine-readable documents in the “traditional” labour intensive way. In any case, due to the rapid growth of documents to be processed, the “manual way” would not be in any case a sustainable approach for legacy documentation.
The only viable option is to exploit natural language processing technologies with the proper level of maturity that can be used to semantically analyse and process information and data contained in textual documents. The use of machine learning and artificial intelligence has the potential to greatly reduce the cost of carrying out structural and semantic analysis and effectively deal with the considerable volume of information that need to be processed daily.
To enable this transition, HLCM has been promoting a system-wide approach in the critical domain of information management. In March 2017, HLCM approved the UN Semantic Interoperability Framework (UNSIF), composed of Akoma Ntoso for the United Nations System (AKN4UN) and the United Nations System Documentation Ontology (UNDO) as the first building blocks of the seamless machine readability of textual documents across the UN system.
UNSIF is meant to create a UN-wide ecosystem of machine-readable documents that will foster collaboration and reduce costs in information management by transforming machine-unreadable documents into a web of information that can be processed by computers to deliver significant benefits in terms of governance, accountability and transparency.
A seamless UN-wide ecosystem of machine-readable documents and data will prove to be a considerable asset for the implementation of the 2030 Agenda for Sustainable Development, which requires a robust review mechanism and a solid framework for evidence-based policies and accountability.
This challenge is focused on automatic generation of machine-readable documents with rich semantic mark-up by making use of open source natural language processing and text mining applications to carry out automatic entity extraction and content analysis.
These semantically enhanced documents will be ideally suited to effectively support smart decision tracking and document retrieval tools as well as query the advanced metadata and descriptions to create innovative information services for end users.
Specifically, this challenge aims to pilot open source tools carrying out automatic entity extraction and content analysis, showcasing the semiautomatic generation of machine-readable documents with rich semantic mark-up up. The purpose of these enhanced documents is to support decision tracking and document retrieval thanks to semantics-driven machinery able to query the advanced metadata and exploit its descriptions to create additional decision management systems for the end users.
The challenge is focused on the analysis and categorization of information contained in UN General Assembly (UNGA) resolutions.
UN General Assembly resolutions are formal expressions of the will and opinions of the Members States: they provide policy recommendations, assign mandates, and adopt codes, guidelines, procedures, recommendations, amendments to codes, conventions, etc. They are at times articulated in hierarchical structures in which the text is segmented into higher and lower subdivisions. Generally, they include a preamble part (in rare case missing) and operative paragraphs, always present, made of one or more paragraphs. More in details:
The objective of the challenge is to carry out automatic entity extraction and content analysis to identify the following elements in UN General Assembly resolutions:
Structures:
By the end of the challenge, the following functional deliverables should be released:
Throughout all phases below, individual and teams are encouraged to interact with the core team to discuss the requirements for this project.
As previously stated in the Expected outcomes section, all the inputs and outputs of this project must be covered by the GNU GPL v3.0 Affero, GNU GPL v3.0, Creative Commons Attribution 4.0 International or CC0 licences, depending on the nature of the resource, unless the participants justify the use of other free/open source/copyleft licence. You will be asked to accept terms and conditions prior to submitting any content.
It is encouraged that teams leverage and extend existing open source frameworks.
Qualified submissions will be judged on a combination of the following criteria:
The winner team/individual and the winning solution will:
Get started!
Click on "Post Idea", register to the Unite Ideas website, and then post your draft or even just the title of your preliminary idea. You will be able to edit your idea until the last day of the submission phase.
For any questions regarding the challenge, please contact Francesco Sansoni by creating your account on Unite Ideas or by email francesco.sansoni@un.org
Help to Improve This Idea.