Your browser does not support JavaScript. Please to enable it.

Terms & Conditions

The idea you wish to view belongs to a community that requires acceptance of terms and conditions.


    Help to Improve This Idea.


    Prev | Next

    Leveraging Language Technologies to extract Information and Metadata from GA Resolutions

    by Hussein Ghaly 02/08/2019 05:04 PM GMT

    • {{:upVoteCount}}
    Username * ()

        Move idea from "Expert Review" stage to:


          Which workspace template do you wish to use?

          I accept the terms and conditions (see side bar). I understand all content I am submitting must be licensed under an open-source software or Creative Commons license as described in the Terms and Conditions:



          United Nations documents contain data and information that pertain to their procedural and substantive function in the organization. Much of this data is intended for human consumption, such as how the documents are written, stored, hosted, described, annotated, and formatted.  Therefore, the challenge is to allow machine readability of such documents, by both using the regularities of the patterns of preparing and publishing these documents, such as the standard formats, and the metadata stored both within the document and where it is hosted. It will also require the use of Natural Language Processing (NLP) techniques to identify more structured information from the text.

          This combined approach involves a pipeline for processing documents, with a view to adding more structure and improve their machine readability. We would start crawling into the repository of the documents to retrieve the documents and their descriptive information available online. This can be achieved with basic libraries in programming languages, such as the Python requests library. Then the retrieved word files are processed into more machine readable formats, such as HTML or XML, using software packages such as Tika or Antiword. The structure of these processed documents would make it possible to tag specific elements in the text (e.g. titles, paragraphs, tables,  etc.), and then parse its content. Further processing can be achieved using NLP packages, such as NLTK, coreNLP, spaCy, with built-in functionalities for parts of speech tagging, parsing, named entity recognition, among others.

          Using this data structure, it would be possible to set up search/filtering and visualization interfaces that would allow users to view linkages between resolutions, compare similar resolutions or display other information related to specific queries based on the available data types.

          Co-authors to your solution

          Jose Garcia-Verdugo

          Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)

          Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):

          Link to a video or screencast of your solution or prototype (Required by the final day of the Submission & Collaboration phase):

          Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):


          Move this Idea

          Select a Category

          Close this idea

          When closing an idea, you must determine whether the idea has exited successfully or unsuccessfully.

          Copy idea to another community

          Add Team Members

            Maximum number of team members allowed: 5

            Help to Improve This Idea.

            User Tasks ?
            Required for graduation.
            Task Assigned to Due Date Status
            Judge review 05/24/2019 Completed
            on 05/28/2019
            No ideas found!
            No activities yet.