Over the past 25 years, the world has seen a rise in the frequency of natural disasters in rich and poor countries alike. Today, there are more people at risk from natural hazards than ever, with those in developing countries particularly at risk. This essay series explores measures that have been taken, and could be taken, in order to improve responses to the threat or occurrence of natural disasters in the MENA and Indo-Pacific regions. Read more ...
Natural disasters are complex events. Responders require accurate, reliable, and timely data to ensure the decision-making process is efficient and that their efforts are effective. As discussed in the essay Data for Disaster Management: Mind the Gap, big data emerges at a fast-paced rate in our digital age and may contribute to enhancing situational awareness. However, there are still large differences between regions in terms of information flows and levels of access to Information and Communication Technologies. This digital divide exacerbates the information and collaboration gaps between responders and affected communities even more. This essay tries to provide a realistic assessment of the role that big data could play in improving response in a high data poverty context by focusing on a case study of a sudden onset natural disaster.
Case Study: Hindu Kush Earthquake and Digital Landscape in Pakistan and Afghanistan
The case study presented in this essay is the 7.5 magnitude earthquake that struck the Hindu Kush mountains on October 26, 2015 and resulted in the deaths of 280 people in Pakistan, 115 in Afghanistan, and four in India. A complicating factor in the response was that the remote northeastern area of Afghanistan where the disaster struck was affected by a Taliban-led insurgency. Furthermore, the earthquake happened in remote and mountainous areas where there was high data poverty. In terms of big data, we focused on social media as well satellite and aerial imagery. We did not come across the availability and use of other big data sources, such as Call Detail Records or financial transaction data during the response.
In order to understand the digital context, we used a mix of nationally available governmental statistics (National Statistics Office, Telecom Regulatory Authority), internationally available institutional statistics (ITU), commercially available statistics (We are social) and studies commissioned by for example development agencies (GIZ, USAID, Norwegian Peacebuilding Resource Center). We found data mostly at the national level; not much data was available at a more granular level, describing specifically the region that was hit by the earthquake. Culled from this data are the following findings:
► Both Afghanistan and Pakistan have relatively low mobile penetration at just over 60 percent, well below Asia’s regional average of 82 percent.
► About 10 percent of Pakistanis use the internet, while only 5 percent of Afghans do.
► Barriers to internet usage include lack of electricity, education, and capacity; the absence of an internet connection; and cultural restrictions, especially for women.
► In Pakistan only 4 percent of the population uses Facebook, where 53 percent of these are in the top 10 percent households by income.
► Facebook is by far the most popular social media platform in both countries.
► Only 4 percent of the social media users in Afghanistan use Twitter, compared to about 25 percent in Pakistan.
► In both countries, it is mainly men who have social media profiles and are the ones actively using social media. In Afghanistan 76 percent of the social media users use smartphones for access, while in Pakistan 30 percent use a smartphone to access the internet.
It is important to mention that services are available that connect those with smartphone and internet access with those who have only low tech mobiles and no internet. So-called Social SMS allows each participant to send a single message to multiple contacts and to receive messages from multiple contacts, all within a single platform, connected to Facebook and Twitter, among others. In Pakistan, Pring is one of the biggest; it began operating in 2009 and to date has reached more than 4.5 million users. In Afghanistan, Paywast processes 30 percent of all SMS messages through their social SMS platform. In terms of OpenStreetMap (OSM) volunteers, both countries have a very low number of active members per population. (See figure below.)
Case Study: Social Media Analytics
We used social media analytics through SCRAAWL, desk research, and a limited number of expert interviews to assess the potential of big data for improving response in the context described above. Although SCRAAWL cannot be used to analyze public Facebook postings or social SMS, it can be employed to analyze Twitter, whose API is more open and accessible than other social media platforms. Twitter is also fast-moving, meaning that up-to-date information is often available faster on Twitter than through other mediums. We monitored the Twitter stream with three search windows consisting of different combinations of event hashtags and geographical names of the affected area. Each monitoring job was stopped after approximately 100,000 tweets were collected. Two reached this level after one day and one only after six days. A monitoring job with more generic terms (e.g., earthquake, Pakistan) resulted in a large number of tweets to be collected in a very short time. Obviously, the search results with more generic terms got—after a limited number of tweets in relation to the search objective directly after the earthquake—“diluted” with other events not in relation to the earthquake.
The essay on Information Filtering in Social Media during Disasters by T. Nazer et al. explains that social media content consists to a large extent of unwanted content. The share of relevant and informative messages in social media will vary widely depending on context but will in most cases be below 10 percent. The monitoring window with hashtags and user handles will result in a first reduction of the overall Twitter API data stream, but still leave in much of the unwanted content. Imran et al.  define different categories to describe the content of a tweet, i.e. Personal Only, Informative (Direct), Informative (Indirect) and Other. Informative (direct) is usually a tweet written by an eyewitness. It is easy to understand that finding those latter tweets is difficult. If one is able to filter out only the informative tweets, still a next step is required, i.e. a further classification and extraction of those “information nuggets” that contribute to increased situational awareness. Imran et al. defined information nuggets in terms of Caution and advice, Information source, Donation, and Casualties and damage. They tested both manual classification and extraction via crowdsourcing and automatic via machine learning with pre-specified ontologies. The social media analytics software we use has a set of predefined basic and advanced analytics that offer only to a limited degree the possibility to do a similar analysis. One can look at Top url’s for example that are mentioned in tweets, giving one an idea of the point of origin of indirect information. One can detect bots (and subsequently decide to remove them) to reduce the unwanted content. Lists of Top Words and Top Users give an impression of the type of information, and if there are mainly official or also individual accounts.
It is not straightforward to identify tweets from people who were eyewitnesses to the earthquake and/or directly affected. Not many people in the affected areas will have Twitter, given the low social media penetration and if they did, their tweets will not show up in the top of rankings given their low signal to noise ratio. This means one needs ways to get rid of the accounts with many followers. Their tweets will otherwise prevail. It is therefore beneficial to include in the search window hashtags in local languages and accounts of local civil society organizations. One can do this via the translate keywords function in SCRAAWL (which uses Google translate), whereby all tweets will be returned that match those keywords regardless of the language of the Tweet. However, Google translate does not include Dari (one of the two official languages in Afghanistan). More advanced analytics uses the body of the tweet, also beyond the keywords. Sentiment analysis obviously performs better with English and also the matching of people, locations, and organizations against a white list. One can make a Heatmap to plot geo-coded posts (GPS coordinates recorded when posted), geo-referenced posts (locations identified by analyzing their contents (can have multiple locations per post), and geo-profiled posts (locations associated with the posting user's profile).
Results: Social Media
The Twitter analysis shows that the majority of the monitored tweets are in relation to Pakistan with #Pakistan being the most influential hashtag. Top mentions and retweets come from politicians or their parties, provincial administrations, celebrities, and international organizations such as UNICEF. (See figures below.)
These are all accounts with many followers. The Indian Prime Minister Narendra Modi tweeted “We stand ready for assistance where required, including Afghanistan & Pakistan” and received more than 3,500 retweets from his 22.4 million followers. Prominent Pakistani politicians from across the spectrum have joined Twitter as well with the exception of Pakistan Prime Minister Nawaz Sharif who does not have a Twitter account, though his daughter did tweet in reaction to the earthquake. Imran Khan, the former cricket star who now heads the Tehreek-e-Insaf (PTI) party, boasts nearly 300,000 followers. He created a trending hashtag to express his compassion with the earthquake victims #IkstandsWithEQvictims. A top mention is the account from the PTI Social Media Team that gives news on the Khyber Pukhtunkhwa province where PTI rules. Shahid Afridi is a cricket celebrity who received many followers.
As the above observations show, Twitter can be very helpful to get a quick overview of the social media networks of politicians, gatekeepers, and trusted users. It can be used to contextualize and complement a prototypical list of humanitarian actors. The digital disaster response NGO Humanity Road pre-staged in this way a list of important Twitter accounts. It consisted of accounts from both individuals (such as well-informed journalists) and organizations. The Named Entities functionality in the social media analytics tool ranks organizations whose name shows up the most often in the data set. This functionality was used to further complement the pre-staged list. Following tweets from these professional accounts was very beneficial in getting rapid notifications for when organizations disclosed new situational overviews and initial assessments of the affected areas. Tweets referred mostly to url’s from Western news sources. The non-exhaustive list below gives an impression of the many organizations involved in providing response-related information:
► National organizations from the countries affected: National Disaster Management Authorities (NDMAs), police and several ministries such as the Ministry of National Health Services, Regulation and Coordination (in Pakistan).
► International organizations: UN OCHA, UNITAR-UNOSAT, IOM, Global Disaster Alert and Coordination System (GDACS), EU Joint Research Centre (JRC).
► National organizations outside the countries affected: USAID, United States Geological Survey (USGS).
► International NGOs: Assessment Capacities Project (ACAPS), Humanity Road, Information Management and Mine Action Programs (iMMAP).
► Red Cross and Red Crescent family: IFRC, The Afghan Red Crescent Society (ARCS), Indian Red Cross Society (IRCS), and Pakistan Red Crescent (PRC).
► Digital volunteers: Humanitarian OpenStreetMap Team (HOT), Google crisis maps.
Some of these organizations collected data themselves, such as the USGS and JRC, using their own sensors. Government agencies also collected data directly, though it was not clearly stated how in the very early reports. It was mentioned that helicopters were being sent to get an overview and local government officials were contacted. After the very first days, Standard Operating Procedures are executed for damage and needs assessments. Other organizations focused on collating and analyzing data sources, such as ACAPS. With the days the information got more detailed and accurate. “Badakhshan is a remote, mountainous province, where access is often a challenge,” said IOM Humanitarian Assistance Program Manager Gul Mohammed Ahmadi.. “It may take some time before a full picture of the damage emerges.” The National Disaster Management Authority of Pakistan issued warnings as well as response related information through SMS in the areas of Shangla, Upper Dir, Chitral, and Bajaur through the Pakistan Telecommunications Authority (PTA). If one looks at the occurrence of pictures, one sees that pictures of politicians (in a few cases visiting the affected areas but also more general pictures) and pictures of casualties and damage (such as pictures of people in hospital) prevail.
A heatmap of the tweets shows that tweets are coming from all over the world. Pakistan had roughly three times as many tweets as Afghanistan. Within Europe a peak can be seen in the UK, likely due to the large number of Pakistani migrants in the United Kingdom. There were no geo-profiled tweets. Within Pakistan and Afghanistan, the geo-coded and geo-referenced tweets came from mostly from the capitals and other large cities in Pakistan and Afghanistan and much less from the affected areas. (See figures below.)
The good thing about this is that we could analyze these tweets manually by opening them in the heatmap one by one. The content consists mostly of direct eye witnesses of Casualties and Damages. (See figure below left.)
For example, there is footage of the earthquake by a police civil servant and a picture of a damaged bridge by a student photographer. These valuable tweets were repeatedly retweeted and hence showed up at the same location. Also many geo-referenced alerts from different global earthquake alert accounts clogged the relevant ones. A very large number of geo-coded tweets came from near Multan in Pakistan, in relation to a local election. English was the main language used in those tweets analyzed, with Urdu and Indonesian a distant second and third.
Results: Satellite and Aerial Imagery
The Pakistan Air Force Aerial carried out an aerial photography survey and the Pakistan Space and Upper Atmosphere Research Commission (SUPARCO) provided satellite imagery. Pakistan has started developing a National Spatial Data Infrastructure but it is not clear whether this was already used to share the satellite data. UNITAR-UNOSAT on behalf of UN OCHA requested a charter activation from The International Charter Space and Major Disasters. The activation was accepted roughly eight hours later on the day of the earthquake and satellite data became available from Digital Globe’s Worldview-1, 2, and 3 satellites. The geospatial data also became available through the Humanitarian Data Exchange (HDX). The Pakistan NDMA uses mostly SUPARCO imagery in the preparedness and risk assessment phase; in the response phase the International Charter data is also vital. In both cases in-house SUPARCO expertise is used for image processing (for example for damage assessments). There was no reference to the use of Unmanned Aerial Vehicles (UAVs) in any of the situational reports.
Immediately after the disaster, HOT started annotating satellite imagery from DigitalGlobe provided to them via MapBox. The Activation for the earthquake on the Afghanistan and Pakistan border, was not a typical response as HOT did not have much of a local community, although a HOT Member is originally from Afghanistan and assisted in connecting with local organizations. Nor did HOT receive specific information needs from responding organizations. (See figure above left.) The HOT Activation Working Group members determined as their first priority—given the lack of baseline OSM data—to map the road network towards remote and mountainous areas and to identify residential areas polygons (in relation to identifying populated places). The HOT team did not map the affected area in Pakistan, because there were concerns as to whether Pakistani OSM volunteers would be compliant with Pakistani regulations. 50 percent of the affected area in Afghanistan was mapped by remote volunteers. Since there was no local OSM community to help with ground verification and mapping priority, it was difficult to hold the attention of the remote mapping community for very long. There is another, also OSM-based volunteer mapping initiative in Pakistan (i.e., MapGive-Pakistan); however, it did not seem to be involved in mapping activities for the earthquake response. (MapGive is a global initiative of the U.S. Department of State’s Humanitarian Information Unit with chapters in a number of countries.)
Conclusions and Discussion
We performed a preliminary study of the added value of big data analysis for disaster response in a data poor setting. We analyzed two forms of big data, Twitter and satellite and aerial imagery. Lessons learned from this single case study are as follows.
Twitter analysis can improve situational awareness but only to a limited degree. It is most helpful in terms of information management. One can identify information sources and (networks of) stakeholders (including for example information about their donations or services offered). One is rapidly notified when organizations disclose situational overviews and assessments of the affected areas. It was a time-consuming task to retrieve eyewitness reports on casualties and damages among the many tweets. The best way to identify them turned out to be to use geofencing in a heatmap and subsequent manual classification of the geo-coded and geo-referenced tweets in the affected areas.
Overall, social media conversations did not reflect the voices of the most vulnerable and showed only a very limited number of eyewitness reports due to lack of access to technology. This is complicated by the technical complexity of finding a very limited number of relevant tweets in the haystack. Twitter analysis run by locals will most likely give additional key information, given that they have more direct access to the Twitter firehose of tweets from their region, understand the local languages and culture, and can do cross analysis with other local media channels..
Satellite and aerial imagery
Digital Humanitarian Networks (DHN) provide an interface between formal professional humanitarian organizations and informal yet skilled-and-agile Volunteer & Technical Communities (VTC). DHN created a number of guidance notes for both groups.. The interfacing between these two groups has proven to work well in several disasters since the first time that digital humanitarians contributed during the Haiti Earthquake in 2010. However, if one examines how many of the requests for collaborative mapping in the tasking manager of the Humanitarian OSM Team (HOT) come from national or local organizations, this seems to be a very finite number. A majority of the tasks results from requests by international NGOs and organizations. This case study clearly showed a disconnect between well-meant international data initiatives and national and local realities in Pakistan. Often such a disconnect can occur when digital teams develop tools that never get integrated into "traditional" systems, because no organizations or persons can be found that are willing to continually pursue the partnership.. In Pakistan there is a more profound reason. In May 2014, Pakistan’s National Assembly passed “The Surveying and Mapping Act, 2014” which aims to regulate the production of geo-spatial data by making the Survey of Pakistan as the nodal agency. One of the objectives given for the act is to “stop unqualified/unregistered firms to take part in Surveying and Mapping activities that can pose a security risk to the state.” It states: “No public or private organization, private firm or individual, national or international, shall undertake any geospatial data collection, production or analysis work and surveying and mapping activities unless they are registered with Survey of Pakistan for such purpose as may be prescribed.” To this end, the OSM community in Pakistan has shifted its attention to mapping for crises outside of Pakistan, which is allowed. As far as the author is aware, the digital response network in Pakistan did not yet try to register as such or to reach an agreement with the government on if and which type of mapping activities they were allowed to do inside Pakistan. The Pakistan NDMA acknowledged that up till now, digital volunteerism is not part of their strategy; however, its future inclusion may be considered. Local mappers are key for analyzing satellite imagery and for OSM. People are needed who care and are passionate about making sure their town or region is represented on the map, kept up-to-date and ready before disaster happens.
It should not be forgotten that social media and satellite imagery can never replace other forms of assessments, but are complementary. Social media monitoring alone can never provide a comprehensive situational overview, especially in areas where at risk communities severely lack access to technology. One has to understand its limitations and biases and contrast the findings with ground-truth data. One has to develop a coherent data strategy, a digital roadmap, of how to include social media into the disaster management cycle. NDMAs can play a pivotal role in developing such a comprehensive digital strategy, including also a way forward of how to engage with VTCs. A strong focus on Data Preparedness is essential. For both social media analysis and satellite imagery it is important to establish a baseline beforehand. This means for social media compiling a list of important @accounts and #hashtags and running analyses to get a feel for the volume of tweets and trends on these accounts and hashtags. Authorities can promote the use of standards in social media, such as OCHA did in the Ebola response and The Filipino Government with their Official Strategy on Crisis Hashtags. The national and international community has to develop new capabilities in geoinformatics, information management, and big data. National staff play a key role with their linguistic and cultural understanding of the local situation, familiarity with the local digital and non-digital media landscape, and the local geography.
Acknowledgements: The author gratefully acknowledges Dr. Rebecca Goolsby for her very valuable and stimulating support and for generously providing access to social media analytics tooling. The author is thankful to Russell Deffner, lead of the HOT activation for the Eastern Afghanistan Earthquake; Dr. Sabina Durrani, NDMA Pakistan; Roxanne Moore, UN OCHA and DHN; and Usman Latif, OSM volunteer in Pakistan and marcom specialist, for making time available to answer questions and share their views.
 S. Vieweg, A.L. Hughes, K. Starbird, and L. Palen, “Microblogging during two natural hazards events: what twitter may contribute to situational awareness.” In Proceedings of the 28th international conference on Human factors in computing systems (2010), 1079-1088.
 Mark Graham, “Time machines and virtual portals: The spatialities of the digital divide,” Progress in Development Studies 11, 3 (2011): 211-227.
 The Data Poverty Index is determined by internet speeds, computer owners, internet users, mobile phone ownership, network coverage, and higher education.
 ITU Statistics, http://wearesocial.com/sg/special-reports/social-digital-mobile-pakistan-jan-2013, http://ez-afghanistan.de/fileadmin/content/news/Social_Media_251114.pdf; and Michael Kugelman, “Social media in Pakistan: catalyst for communication, not change,” Norwegian Peacebuilding Resource Center (NOREF), 2012.
 J.L. Williams and A. Gilchrist, “SMS Engagement in Pakistan: A Practical Guide for Civil Society, the Humanitarian Sector, and Government,” Popular Engagement Policy Lab, 2011.
 One search window consisted of: #AFG, #Afghanistan, #Pakistan, #GB, #HunzaValley, #Gilgit, #KPK, #KPUpdates, #KPKUpdates, #ANDMA, NDMA and Pakistan.
 Tina Comes, World Risk Report 2016, Social Media in Disasters.
 Muhammad Imran et al., “Extracting Information Nuggets from Disaster-Related Messages in Social Media,” ISCRAM 2013, accessed September 6, 2016, http://www.iscram.org/legacy/ISCRAM2013/files/129.pdf.
 Confusingly, the Pakistan NDMA had two Twitter handles: @ndmapk and @PakistanNDMA, with the latter counting 160 tweets since 2010. There is no Twitter account from the Afghanistan NDMA (ANDMA).
 Humanitarian Response, Afghanistan/Pakistan: Earthquake - Oct 2015, accessed September 6, 2016, https://www.humanitarianresponse.info/en/operations/pakistan/afghanista…
 International Organization for Migration, “IOM Responds to Massive Earthquake in Afghanistan, Pakistan,” October 27, 2015, accessed September 6, 2016, https://www.iom.int/news/iom-responds-massive-earthquake-afghanistan-pa….
 “NDMA response to earthquake, 29 Oct 2015,” Reliefweb, October 29, 2015, accessed September 6, 2016, http://reliefweb.int/report/pakistan/ndma-response-earthquake-29-oct-2015.
 Ali Asmat and Munir Ahmad, “Geospatial data sharing in Pakistan: Possibilities and problems,” Global Geospatial Conference, UNECA Conference Center, At Addis Ababa, Ethiopia, 2013
 International Charter Space and Major Disasters, “Earthquake in Afghanistan,” October 26, 2015, accessed September 6, 2016, https://www.disasterscharter.org/web/guest/activations/-/article/earthq….
 Wiki, “2015 Eastern Afghanistan Earthquake,” accessed September 6, 2016, http://wiki.openstreetmap.org/wiki/2015_Eastern_Afghanistan_Earthquake; and Russell Deffner, “Eastern Afghanistan Earthquake Update,” November 2, 2015, accessed September 6, 2016, https://lists.openstreetmap.org/pipermail/hot/2015-November/010384.html.
 Communication with Russell Deffner, Activation Lead Humanitarian OpenStreetMap Team (HOT) - Eastern Afghanistan Earthquake.
 “Lessons Learned Social Media Monitoring during Humanitarian Crises, a case study on Nepal,” ACAPS, 2015.
 Guidance for creating a local Digital Response Network (DRN), http://digitalhumanita
 Private communications with Roxanne Moore.
 Devirupa Mitra, “Pakistan Objects to India’s Map Bill But its Own 2014 Law Regulates Geospatial Data Too,” The Wire, May 18, 2016, accessed September 6, 2016, http://thewire.in/37028/pakistani-complaint-to-un-over-geospatial-bill-….