Your browser does not support JavaScript. Please to enable it.

Terms & Conditions

The idea you wish to view belongs to a community that requires acceptance of terms and conditions.

RejectAccept

    Help to Improve This Idea.

    Search
     

    In 2015, UN Member States unanimously adopted the Sustainable Development Goals (SDGs) and, in doing so, pledged that “no one will be left behind.” The more than 40 million men, women, and children displaced within their countries of residence as a result of conflict, disasters, development projects and other causes are among those most likely to be excluded from social and economic opportunities for development. Many face increased vulnerability to further cycles of displacement when durable solutions that reduce the risks they face are not found.Background: Internal displacement and the Sustainable Development Goals (SDGs)

    Displacement is commonly addressed as a humanitarian problem, but it is also a sustainable development challenge. It is closely associated with poverty, inequality, insecurity, environmental degradation, exposure to hazards and the vulnerability of populations whose governments are unable or unwilling to protect them. In fact, it is often both a cause and a consequence of such issues. Livelihoods, economic activity, and capacities that strengthen communities’ resilience are seriously compromised when people are forced to flee their homes as a result of a crisis.

    Internal displacement poses a large – and growing – problem with more people displaced today than in years past by conflict, violence, and disasters. It will be difficult for the Member States to make progress on the SDGs without addressing the associated challenge of internal displacement. One factor making this challenge more difficult is the incomplete picture of internal displacement due to the inability to identify and account for all displacement events around the world in a systematic manner. This picture of internal displacement can come into clearer focus by leveraging data science, and by using techniques that have already proven effective in addressing similar challenges, such as disease detection and surveillance.
     

    We challenge you to:

    Create a tool that will be used to monitor internal displacement as a result of natural-hazard induced disasters, armed conflicts, generalized violence and development projects. The tool will make the monitoring of internal displacement more efficient and comprehensive. It will also provide the humanitarian community with an easy way to extract and analyze facts from any type of documents (news, field reports, social media and any other relevant source).

    1. Filtering and tagging

    Filtering: 

    The filtering step should be a binary classification of the URLs contained in the input dataset. It should exclude:

    ‣ documents not in English

    ‣ broken URLs

    ‣ documents not reporting on human mobility (see example below).

    Only the information tagged as “Relevant” should be retained for the next analysis steps

    Tagging: 

    Relevant documents should be classified in three categories representing different triggers of displacement:

    ‣ “Disasters”,

    ‣ “Conflict and violence”,

    ‣ “Other”.

    The tagging should be based on the training dataset provided. The training dataset is extracted from the Global Internal Displacement Database. It consists of a list of URLs to documents (mainly web pages and pdf documents) already tagged as either “Disasters” or “Conflict and violence” by IDMC’s team of monitoring experts. “Other” should include all the documents not tagged as either “Disasters” or “Conflict and violence” but containing relevant information. Tags are not mutually exclusive as the same document can report on multiple triggers of displacement.

    Once integrated into the IDMC’s information system, the tool should be able to learn from new documents as they are added to the Global Internal Displacement Database (through online learning or by updating the training dataset).

     

    2. Natural Language processing analysis

    Using Natural Language Processing algorithms the #IDETECT should automatically extract “facts” from the documents. A fact is a displacement figure reported in the document. Each fact should include:

    ‣ The date of publication of the document the fact is extracted from;

    ‣ The location where displacement happened;

    ‣ The reporting term used in the document (see table below);

    ‣ The reporting unit. There are different reporting units used to identify displaced population should be grouped in two main reporting units: people and households (see table below); and

    ‣ The displacement figure (i.e. the number of people/households reported displaced).

    Note: Multiple facts can originate from the same document.

    3. Visualization and quantitative analysis of facts

    The tool should provide a platform to visualize and analyze facts:

    Visualization:  The visualization tool should allow analysts to dive into the data, explore the facts extracted using NLP and uncover new knowledge on internal displacement. The visualization tool can include:

    ‣ An interactive map to easily help the humanitarian community identify “hotspots” on the map. The map should possibly be browsable and should allow information analysts to explore trends as a function of time.

    ‣ A histogram to analyze trends in a selected region and time range.

    Quantitative analysis:

    In order to make sure that “no one will be left behind” the #IDETECT should provide a platform to analyze, compare and explore the displacement figures contained in the facts. Analysts should be able to select which facts and displacement figures are visualized based on location and time. This quantitative analysis tool will allow information analysts to go from the number of facts to the displacement figures in the facts.

    From the displacement figure analysts should be able to go back to the document (URL) reporting that displacement figure and possibly visualize the excerpts of the documents where the information was reported.

    Download the graphical representation of the #IDETECT challenge workflow here.

     

    Deliverables

    ‣ A web link (URL) to a working (live) demo (suggested: GitHub Pages or BitBucket Pages, or similar).

    ‣ A repository of the original open source code, data files, and other electronic files (include GNU license). This package can be hosted in a public repository and should allow IDMC or UN Agencies to run the tool on local servers. Only original, open source work will be accepted. It is acceptable that your solution uses other existing open source libraries.

    ‣ A .csv or .xls answer file containing the analysis of the test dataset. One week before the submission deadline a test dataset will be uploaded on the Unite Ideas web page, it will consist of a list of URLs without tags. The answer file should contain the result of the analysis using the same algorithm developed for the challenge. This file will be used to evaluate the performance of the algorithm. The answers will be:

    ● the output of the filtering (i.e. whether the document contains relevant information or not)

    ● the tag(s) assigned to the document.

    ● the facts extracted from the document, in particular:

         ◦ the the displacement figure;

         ◦ the reporting unit (i.e. “people” or “households”)

         ◦ the location (at the country level using the Country Codes - ISO 3166)

    ‣ A brief document describing the functionalities, such as a user guide.

    ‣ A document describing the steps to maintain and update the tool with further features, such as an admin guide.

    ‣ The tool will be integrated into the information system of IDMC and should send data validated by the information analysts to the backend of the Global Internal Displacement Database. No restrictions are imposed on the technology used to tackle this challenge. However, teams should keep in mind that:

    ● the tool should be independent and run as external service

    ● it should send data in a standard format (.json, .csv or others ) with all the facts extracted from the documents selected by the information analysts

     

    Datasets

    1. Input dataset ( ~ 79 MB)

    The input dataset contains a list of URLs (~ 600 thousand articles) in English extracted from the GDELT GKG database. The input dataset should be used as input for the analysis. It consists of a .csv file with three fields:

    ‣ “GKGRECORDID”, a unique identifier of the document, 

    ‣ “DATE”, the publication date-time of the document, 

    ‣ “DocumentIdentifier”, a fully-qualified URL that can be used to access the document on the web.

    2. Training dataset (Fully labelled version here

    The training dataset consists of a list of URLs tagged by our monitoring experts. We ask teams to mainly focus on “Conflict and violence” and “Disasters” and tag all the remaining documents as “Other”. The training dataset may contain documents not in English or URLs to videos which should be identified and not used for the training. 

    3. Test dataset

    Prizes & Recognition

    The winner (or group of winners) and the winning solution will:

    ‣ receive a letter of recognition from IDMC;

    ‣ be featured and referenced in the 2017 edition of Global Report on Internal Displacement and on the IDMC website;

    ‣ be invited to write a blog post on the project on the Unite Ideas website;

    ‣ be offered the opportunity to have an advisory role in the further development of the submitted code; and

    ‣ have the possibility to present the solution to partner organizations such as IOM, UNHCR, OHCHR and ICRC.

     

    Review Process

    The judges will review the submitted solutions within three weeks after the closure of the challenge. Qualified submissions will be judged on a combination of the following criteria:

    1. Usability - the ease of use and user-friendliness of the submission.

    2. Accuracy - the degree the tagging and the information extraction of the tool are correct.

    3. Insights - the degree the results and visualization by the tool are useful to detect displacement events and the corresponding displacement figures and presented in a creative manner.

    4. Modularity - the ease of customization enabled by the solution.

    5. Elegance - the elegance of the code written and the quality of documentation provided.

    6. Documentation - the quality of the documentation provided alongside with the code.

     

    Judging Panel

    • Justin Ginnetti, Head of Data and Analysis Dept, IDMC
    • Leonardo Milano, Senior Data Scientist, IDMC
    • Sarah Telford, project manager, HDX
    • Luca Vernaccini, INFORM Index for Disaster Management
    • Nuno Nunes, Global CCCM Cluster Coordinator, IOM
    • Andrew Palmer, Coordinator of the Early Warning and Information Support Unit, OHCHR

    Submission guidelines

    • This challenge is open to the general public. Public, private, and academic organizations are also invited to take part.
    • Only original, open source work will be accepted. It is acceptable that your solution uses other existing open source libraries.
    • There are no limitations on the number of submissions per participant/participating team.
    • The participants are required to agree on the terms and conditions.

    View winning solution

    Data4Democracy Internal Displacement

    1st place

    Data4Democracy team: Aneel Nazareth, George Richardson, Simon Bedford, Wendy Mak, James Allen, Yane Frenski, Domingo Hui, Charles Neiswender, Daniel Forsyth, Joshua Arnold, and Alex Rich

    This solution has been worked on by a team of volunteers from around the world as part of the Data4Democracy (D4D) initiative.

    D4D team have attempted to build a complete end-to-end solution for scraping, classifying, processing, extracting reports from, storing and visualizing URLs for articles that may or may not pertain to the displacement of people.

    The initial approach to report extraction was to be as broad as possible, including identifying terms and units beyond the scope of the project. This approach has since been constrained and optimized based on the nature of the testing data provided.

    All of code, detail and results are available via the links shared below, key components of this solution include:

    Data Sources: the team used additional labelled training data that was obtained via the crowdflower platform: Dropbox

    » Launch Demo

    » View Code

    » Read the UN Press Release (English)

    » Read the UN Press Release (French)

    » Read IDMC's blog post on the #IDETECT challenge

    Human Displacement Analyzer

    Honorary Mention

    Mr. Samuel Bollier

    The Human Displacement Analyzer analyzes texts to determine whether they pertain to disaster-driven or conflict-driven human displacement. It also determines the number of people displaced and the location of displacement. Finally, it creates visualizations based on this analysis, in both chart and map form.

    Note: In the Link box below, I have included two Dropbox links: one to the results of the project's tagging component, and one to the results of the project's natural language processing component. Both files are saved in tab-delimited form, and can be viewed in Microsoft Excel.

    » Launch Demo

    » View Code

    » Read the UN Press Release (English)

    » Read the UN Press Release (French)

    » Read IDMC's blog post on the #IDETECT challenge

    Challenger organization: 

    IDMC - The Internal Displacement Monitoring Centre

    The Internal Displacement Monitoring Centre (IDMC) is the leading source of information and analysis on internal displacement worldwide. Since 1998, IDMC’s role has been recognised and endorsed by United Nations General Assembly resolutions, which call upon governments the UN and others to collaborate with it. For the millions of people displaced within their own country, IDMC plays a unique role as an independent and impartial global monitor and analyst to inform and influence policy and action by governments, UN agencies, donors and INGOs. IDMC monitors and analyses internal displacement caused by conflict, generalised violence, human rights violations and natural hazard-induced disasters. In particular, IDMC obtains and reports information on:

    • The scope and trends of new, evolving and protracted situations of displacement worldwide
    • Obstacles to durable solutions to displacement - Drivers of future displacement risk
    • Policy, legal and institutional frameworks for protecting people affected by displacement or at risk of being displaced