Your browser does not support JavaScript. Please to enable it.

Terms & Conditions

The idea you wish to view belongs to a community that requires acceptance of terms and conditions.

RejectAccept

    Help to Improve This Idea.

    What's New

    Search

     
    Prev | Next

    Leveraging Language Technologies to extract Information and Metadata from GA Resolutions

    by Hussein Ghaly 02/08/2019 05:04 PM GMT

    • {{:upVoteCount}}
    • {{:downVoteCount}}
    Username * ()

      I accept the terms and conditions (see side bar). I understand all content I am submitting must be licensed under an open-source software or Creative Commons license as described in the Terms and Conditions:

      on

      Description

      United Nations documents contain data and information that pertain to their procedural and substantive function in the organization. Much of this data is intended for human consumption, such as how the documents are written, stored, hosted, described, annotated, and formatted.  Therefore, the challenge is to allow machine readability of such documents, by both using the regularities of the patterns of preparing and publishing these documents, such as the standard formats, and the metadata stored both within the document and where it is hosted. It will also require the use of Natural Language Processing (NLP) techniques to identify more structured information from the text.

      This combined approach involves a pipeline for processing documents, with a view to adding more structure and improve their machine readability. We would start crawling into the repository of the documents to retrieve the documents and their descriptive information available online. This can be achieved with basic libraries in programming languages, such as the Python requests library. Then the retrieved word files are processed into more machine readable formats, such as HTML or XML, using software packages such as Tika or Antiword. The structure of these processed documents would make it possible to tag specific elements in the text (e.g. titles, paragraphs, tables,  etc.), and then parse its content. Further processing can be achieved using NLP packages, such as NLTK, coreNLP, spaCy, with built-in functionalities for parts of speech tagging, parsing, named entity recognition, among others.

      Using this data structure, it would be possible to set up search/filtering and visualization interfaces that would allow users to view linkages between resolutions, compare similar resolutions or display other information related to specific queries based on the available data types.

      Co-authors to your solution

      Jose Garcia-Verdugo

      Link to your concept design and documentation (Required by the final day of the Submission & Collaboration phase)

      Link to an online working solution or prototype (Required by the final day of the Submission & Collaboration phase):

      Link to a video or screencast of your solution or prototype (Required by the final day of the Submission & Collaboration phase):

      Link to source code of your solution or prototype above. (If you submitted a link to an online solution or prototype, or to a video of your solution of prototype, you must provide a link to the source code. This item is required by the final day of the submission phase):

      naturalLanguageProcessing

      Attachments

        Help to Improve This Idea.

        0%
        100%
        0%
        User Tasks ?
        Required for graduation.
        Task Assigned to Due Date Status
        Judge review 05/24/2019 Completed
        on 05/28/2019
        No ideas found!

        Request to become a member

        Type a short message to the owner of this idea.:

        Invite Team Members

          Message
          *Required