
Machine learning systems create rules based on the data they are given. If the data is skewed or incomplete, the rules will be fundamentally flawed. These types of data bias can sabotage machine learning models: confirmation bias: outcomes confirm existing assumptions and prejudices; correlation bias: misrepresenting variables results in erroneous inferences; sample bias: flaws in the training set cause errors in data-driven models; stereotype bias: unrepresentative training data causes false correlations in models; insufficient-data bias: the sample data set is too small to produce accurate assumptions; systematic value distortion: errors in collecting and organizing data lead to measurement bias.
Technology continues to transform our world, but in itself is neither good nor bad. Whether data science and other technological innovations benefit humanity or make our lives more difficult is entirely up to us.
Data scientists are taking a lead role in applying advanced analytics tools and techniques in ways that can help people in need. The data science field plays an increasingly important role in all aspects of modern life. Like people in other professions, data scientists feel a duty to contribute their skills and knowledge to serve people in their communities and around the world.
Data analytics’ potential to find practical solutions to the serious problems that threaten the health, safety, and well-being of diverse populations increases by the day. This guide examines the people, companies, organizations, and government agencies that are using advanced data analytics to make the world a better, healthier, safer place.
What Types of Data Are Used for Social Good?
Data used to serve the public need has to be accessible to scientists without copyrights, patents, or other restrictions on its use for noncommercial purposes. Towards Data Science describes “open data” characteristics:
- The data is structured using internationally accepted classifications, such as ISO 3166 from the International Organization for Standardization (ISO).
- The data uses nonproprietary file formats, such as comma-separated values (CSV) and JavaScript Object Notation (JSON).
- The data can be accessed using standards-based communication channels, including the JSON-based Statistical Data and Metadata eXchange (SDMX-JSON).
- The data is accompanied by metadata that fully and completely describes it.
Many types of data that are free on the internet aren’t open because their reuse is subject to copyright and other restrictions that their creators have applied. However, nonprofits and others applying data for good causes can often receive unfettered access upon request.
How Data Scientists Apply Data to Serve the Public
Bloomberg points out that for many technologists, all science is considered “good” in and of itself, yet the good of that science is often distributed unevenly. For example, machine learning in the form of algorithmic advertising has generated billions of dollars in profits for private companies, but the technology has accomplished far less in the public sector.
The goal of data science for social good is to focus the power of new analytics technologies on the serious problems that people who lack the “significant market power” of private firms face. Rather than relying on data gathered to address a single social issue, data science allows data collected for other purposes to be applied to model problems related to public health and welfare.
- Sharing Data Resources. Nonprofit organizations such as the Data Science for Social Good Foundation provide researchers with open data sets applicable to problems related to health care infrastructure, school enrollment, air quality, and other matters of public interest, as described in Stanford Social Innovation Review.
- Creating a Holistic Data Ecosystem. Sharing data is the first step in creating an open platform to ensure that the data has the desired social impact. The data ecosystem for social research includes policies for securing the data, the skills required to perform effective analyses, and the abilities and limitations of the public organizations that are the intended beneficiaries of the research.
- International Aid Transparency Initiative. Established in 2016, the International Aid Transparency Initiative (IATI) is a nonprofit that promotes transparency and openness in the use of data resources to help developing countries. Its members include governments, multilateral institutions, private businesses, and development and humanitarian organizations that collect and use data for the benefit of at-risk populations.
Types of Data That Are Used to Promote Public Health and Welfare
Data scientists must distinguish truly open data from data shared with use restrictions. “Open Data and Public Health,” an article in the Pan American Journal of Public Health, explains that much of the data from government agencies and public health departments can’t be modified, for example. In particular, before sharing open public health data, researchers must weigh the research’s potential benefits against the risks of private health data becoming public.
- Nonprofit Data. Nonprofit organizations can apply data analytics techniques to the information they collect in the course of their operations just as private businesses do. Frye Institute of Education describes the types of data that are most useful to nonprofits, which include internal performance metrics and project management efficiency.
- Public Sector Data. Primary responsibility for applying public data in ways that benefit the public falls to the chief data officers (CDOs) of government agencies, as Deloitte explains. The data that government agencies collect relates to housing, health care, education, and national security; it includes census data, information about the workforce and employment, financial information, weather data, and geographic information.
- Organizing and Standardizing Data. Modern data analytics techniques improve social and economic forecasting accuracy, but government policymakers struggle to access the data they need for their forecast models. The journal Technological Forecasting and Social Change presents a framework to help agencies locate and apply socioeconomic data from reliable sources in formats that are compatible with their modeling architecture.

Add This Infographic to Your Site
<p style="clear:both;margin-bottom:20px;"><a href="https://onlinemasters.ohio.edu/blog/data-science-for-social-good/" rel="noreferrer" target="_blank"><img src="https://s3.amazonaws.com/utep-uploads/wp-content/uploads/sparkle-box/2020/12/07114548/OU-MBAn-2020-Q3-Skyscraper-Replacement-How-Data-Science-Can-Be-Used-for-Social-Good-miniIg1-v2.jpg" alt="Six types of data bias that can sabotage machine learning models." style="max-width:100%;" /></a></p><p style="clear:both;margin-bottom:20px;"><a href="https://onlinemasters.ohio.edu" rel="noreferrer" target="_blank">Ohio University </a></p>
Types of Organizations That Benefit from Data Analytics
Data science has improved the refugee placement process; helped water districts in drought-stricken areas of California save money; and connected people in need to food, shelter, health care, and other programs they qualify for. The Rockefeller Foundation spearheads a group that’s committed $50 million over five years to promote the use of data science for social impact projects.
The group’s initial $20 million investment was awarded to DataKind, whose goal is to give organizations working in service to humanity the same access to advanced data analytics that large businesses have. The following organizations are among the beneficiaries of data science projects designed to help people in need:
- Immigration officials in Switzerland used an algorithm that researchers at Stanford University and ETH Zurich developed to improve the process of placing incoming refugees in neighborhoods where they’re most likely to find jobs.
- DataKind helped California’s Moulton Niguel Water District save more than $25 million by using data analytics to predict water resource demand more accurately. The improved forecasts prevented the district from having to use expensive water tankers to transport water it didn’t need.
- Benefits Data Trust helps people in need of food, housing, or medical attention link to public services. Data analytics contributed to the group processing more than 930,000 applications, representing $7 billion in benefits for individuals and families in need.
How Data Science for Good Benefits People in Need
Data scientists’ work impacts people in their communities and around the world. For example, homelessness impacts cities and towns in all parts of the world, but its causes vary from place to place. As Gartner reports, the Community Technology Alliance (CTA) applies data science to gain a better understanding of the local characteristics of each community’s homeless population and the resources available to address their food, housing, and health care needs.
Examples of how data science improves the lives of our neediest neighbors range from worldwide programs to local projects:
- The U.N. refugee agency UNHCR reports that at the end of 2019, 5 million people worldwide were forcibly displaced , including 26 million refugees, about half of whom were under the age of 18. UN Global Pulse worked with UNHCR to create social media campaigns that made people more receptive to serving as host communities.
- The N. Sustainable Development Goals (SDGs) set the standard for data science for social good. The 17 goals range from ending poverty to promoting “peaceful and inclusive societies for sustainable development.” The Act Now bot is designed to help individuals determine the best way to contribute to achieving the SDGs.
- Deloitte describes how the U.S. Department of Housing and Urban Development (HUD) is working with local government agencies to apply data science to discover the most effective approaches to combating homelessness. One example is a case management system that combines data analytics and digital technology to track, monitor, and support people as they transition through the three stages of homelessness: at risk, currently homeless, and in homes but in need of assistance to remain.
Companies Using Data Science for Social Good
In 2018, online marketing firm EveryAction released results from one of its surveys that found that 90% of nonprofit organizations collected data about their operations, yet only 5% stated that their decisions were always data-driven. Nonprofits lack the time, resources, and expertise required to benefit from the data they collect.
Many data science companies are stepping up to offer their services to nonprofits whose work benefits the public. Here are examples of companies applying data science for social good:
- Qlik sells a data analytics platform that large firms use to make their business processes run more efficiently. The company has made a commitment to provide its technology to nonprofits working toward building a more sustainable world. A Qlik project was to create a platform for nonprofits to use to boost data literacy in their communities.
- IBM Science for Social Good brings together scientists and engineers who work for IBM Research with subject matter experts at nongovernmental organizations (NGOs), government agencies, and nonprofits. Projects include a natural language processing algorithm for the United Nations Development Program (UNDP) sustainability projects and personalized financial advice for low-wage earners via Neighborhood Trust Financial Partners.
- Mastercard’s Center for Inclusive Growth had collaborated with 55 research organizations and had participated in programs that impacted more than 1.5 million people in 30 different countries as of the end of 2019. Its efforts include a training program for micro-merchants in Kenya, another training program that converted workers in Egypt from cash to more secure digital wallets, and yet another that brought data science expertise to underserved communities in New Orleans as well as other U.S. cities.
How Tech Companies Collaborate with Public Service Organizations
From modern capitalism’s earliest days, businesses have sought to contribute to the public good, as Harvard Business Review describes. The 18th-century economist and philosopher Adam Smith wrote that our innate morality naturally compels us to create a just and harmonious society in which to live and conduct business.
In their book Social Value Investing, authors Howard W. Buffett and William B. Eimicke describe the five aspects of establishing effective partnerships among private businesses, government agencies, and nonprofit organizations working to promote public welfare:
- Create cross-sector partnerships to develop a process that helps diverse but complementary organizations coordinate their efforts and build on their comparative strengths.
- Collaborate leadership to facilitate managing people in decentralized teams that span the various participating organizations.
- Integrate stakeholders by establishing a specific place that instills a sense of permanent community and cements long-term relationships among place-based co-owners.
- Secure financing for the project by developing financing portfolios for public data projects to diversify risk and pool available capital.
- Define success collaboratively so that the performance of social impact projects can be measured in ways that align with partners’ and stakeholders’ goals and principles.
Data Scientists at the Forefront of Data for Social Good
One of the most extensive data scientist networks collaborating on projects to benefit the public is run by The Alan Turing Institute in the U.K. Researchers affiliated with the institute work with businesses, universities, government offices, and nonprofit organizations on developing prediction algorithms to support health care decisions, the ethics of machine learning in children’s social care, and similar efforts.
The following scientists are among the leading practitioners of applying data science for social good:
- Fei-Fei Li is a Stanford University professor of computer science and founder of AI4ALL, a nonprofit that encourages diversity and inclusion in artificial intelligence (AI). She and her group work to overcome the potential for bias in machine learning and other AI algorithms that results from development teams that don’t represent the populations that the systems are intended to serve.
- Paul Duan founded Bayes Impact, a nonprofit that supports “citizen-led public services” that are designed and built by citizens for citizens. The group worked with the U.S. Department of Justice and California’s attorney general to create URSUS, a tool that examines police use of force in an attempt to restore trust between police departments and the citizens they serve.
- Sara Hooker, founder of the nonprofit Delta Analytics and a research scholar at Google Brain, has established a community of more than 90 data scientists who volunteer their services to help nonprofit organizations benefit from the application of data science to achieve their public service goals.
Resources for Data Science for Social Good
- InsideBIGDATA, “Using Data Science for Social Good” — Discover examples of data science projects intended to promote the public welfare, including org and MIT Media Lab’s Ginger, which provides 24/7 mental health support.
- Data Science for Social Good, Resources from the Data Science for Social Good Fellowship — Find tools and data sets that support data science research efforts, including source code for past projects and peer-reviewed publications.
- Inside Angle, “AI Talk: AI for Social Good” — Learn about the resources that Microsoft researcher Lester Mackey described in his presentation at the 2020 International Conference on Machine Learning.
Data Analysts for Social Good
Data scientists interested in applying their skills and experience to help people in need must approach their projects deliberately to ensure that they don’t end up doing more harm than good. Data analysts’ work in a small village in northern India demonstrates the steps entailed in successfully completing a public service project, as Towards Data Science explains.
- Before determining the problems that the villagers faced, the team had to get to know the people of the village and their way of life.
- The researchers had to make sure that their work wouldn’t unduly disrupt the villagers’ daily routines.
- The team used two frameworks to observe and interpret the environment, activities, and interactions of the villagers.
- The team developed several maps and diagrams to identify problems the villagers experienced that data analytics could alleviate.
- Only after completing the preparatory steps did the team start collecting and analyzing the data. The result was improved access to clean water for families in the village.
Volunteer with a Public-Focused Data Science Program or Organization
Many organizations conduct programs using data analytics techniques to solve problems that affect community members. Among the most popular organizations that recruit volunteer data analysts for social good projects are the following:
- DataCorps recruits data scientists to volunteer as members of teams working on long-term projects for nonprofit organizations whose efforts contribute to the public welfare. The teams include a project manager, a data ambassador, and two DataCorps data experts who are joined by a representative, a project champion, and two data specialists from the partner organization.
- The Digital Humanitarian Network (DHN) raises awareness among nonprofit organizations and public agencies about the growing number of technology organizations dedicated to helping people in need. While the group no longer directly operates projects promoting data science for social good, it continues to offer humanitarian organizations access to sources for data analytics expertise.
- Code for America brings together technologists, government data experts, and social justice activists to work on projects designed to “help government work for the people who need it most.” Among the tools that Code for America has helped create is Clear My Record, which is an app that helps people seal or clear their criminal records after a period of crime-free living.
- Thorn, which Ashton Kutcher and Demi Moore cofounded in 2012, helps combat child sex trafficking. Law enforcement agencies in all 50 states and Canada use the organization’s Spotlight tool to identify human trafficking victims and assist in investigating and prosecuting traffickers.

Add This Infographic to Your Site
<p style="clear:both;margin-bottom:20px;"><a href="https://onlinemasters.ohio.edu/blog/data-science-for-social-good/" rel="noreferrer" target="_blank"><img src="https://s3.amazonaws.com/utep-uploads/wp-content/uploads/sparkle-box/2020/12/07115717/OU-MBAn-2020-Q3-Skyscraper-Replacement-How-Data-Science-Can-Be-Used-for-Social-Good-miniIg2-v2.jpg" alt="Understanding how Thorn’s spotlight tool helps law enforcement agencies fight child sex trafficking." style="max-width:100%;" /></a></p><p style="clear:both;margin-bottom:20px;"><a href="https://onlinemasters.ohio.edu" rel="noreferrer" target="_blank">Ohio University </a></p>
Participate in a Data Analytics Competition
A common practice among data scientists looking for ways to contribute their skills to benefit the public is by participating in one of the many coding contests, or “hackathons,” that private companies and organizations sponsor. The goal of the contests is to devise solutions that public agencies and nonprofits can apply to better serve populations in need that the organizations target.
The following are examples of groups that sponsor data analytics competitions:
- Kaggle provides data scientists with free access to more than 50,000 public data sets and 400,000 public notebooks for running machine learning code. Kaggle-sponsored competitions include analyzing data compiled by the global nonprofit CDP to identify key performance indicators (KPIs) that relate to environmental and social issues.
- International Data Analysis Olympiad (IDAO) sponsors an annual event in which teams of data scientists compete to create machine learning models and resource-efficient algorithms that address real-world problems. The 2020 contest is nearing its final round; it entails using simulation data to predict the position of space objects to protect orbiting satellites.
- DrivenData combines data science and crowdsourcing to create competitions that address the most serious social challenges that people face around the world. The competitions typically last from two to three months and call for creating the most efficient statistical model to use for solving difficult predictive problems.
Resources for Data Analytics for Social Good
- Data Science for Social Good Foundation, Projects — Browse dozens of data science projects sponsored by public agencies and nonprofit organizations, such as the World Resources Institute, the UNICEF Office of Innovation, and Covid Act Now.
- HData Systems, “How Will Data Science Help Foster the Society?” — Learn about programs that support the use of data analytics, machine learning, and other technologies to help nonprofits and government agencies.
- The Bridgespan Group, Stories of Impact — Discover the many ways that data science volunteers have contributed to solutions that directly address the effects of climate change in the U.S. and around the world.
Tenets for Ethical Use of Data for Social Good
No single set of data ethics principles applies to how tech companies and other organizations use and protect the data they collect, store, analyze, and share. The Conversation found that many large tech firms have no data ethics guidelines of their own. In place of homegrown ethics guidelines, the companies rely on toothless third-party ethics initiatives. As a result, ethics violators face no significant consequences. Tech giants that have developed their own ethical principles relating to AI operations include Google, Microsoft, and IBM.
The ethical use of data encompasses five areas:
- Privacy acknowledges that the private information that customers share with data collectors becomes the property of the collectors, but the collectors have a responsibility to respect customer confidentiality.
- Governance addresses accountability in ensuring data accuracy and quality and the ethical use of algorithms.
- Fairness requires that the data be treated with consideration and respect for the individuals associated with the data. The data must never be used in a way that discriminates against or marginalizes community members.
- Shared benefit means the people who are the source of the data retain some control over its use and have a right to expect that use of the data will benefit them in some way.
- Transparency implies that organizations will be open about how they collect and use the data and that they collect no more data than necessary for their immediate purposes. This is the area that many tech firms balk at complying with.
Fairness, Accountability, and Transparency in Machine Learning
In the course of training machine learning algorithms, developers sometimes transfer their biases to the data used to train the systems, as the UX Collective explains. This can cause the resulting machine learning engines to discriminate against segments of the community, which helps perpetuate systemic injustice. The problem will persist as long as women and minorities remain underrepresented in the tech fields responsible for designing machine learning systems.
One problem in removing bias from machine learning algorithms is that if the systems are too transparent, they become easy to “game,” so their results can be skewed to favor certain parties unjustly. Some AI researchers have concluded that the level of social responsibility that a machine learning system demonstrates should be based on how it’ll be used. For example, machine learning tools used to distribute education, employment, police protection, health care, and other social benefits require a higher level of ethical accountability than those designed to identify the ads a person sees while browsing the web.
Algorithmic Accountability
Algorithmic decision-making is regularly applied to processing job applications, distributing social services, and determining the type of information that people view when they visit websites. When these algorithms lead to decisions that are discriminatory or inequitable, people can be denied opportunities and benefits unfairly.
The Algorithmic Accountability Act of 2019 attempts to address this source of potential bias by requiring that large tech firms assess the impact of “high-risk automated decision systems,” as the Center for Data Innovation explains. However, the legislation doesn’t clearly define the types of systems it would apply to, nor what constitutes a “significant risk” to consumer data privacy.
Communication Director presents three principles for managing algorithmic accountability:
- Facilitate access to continuous debate to ensure that anyone negatively affected by the algorithm can participate in addressing and eliminating the bias.
- Help people understand the issues at stake in identifying bias in algorithms. This task becomes more challenging as algorithms become more complicated.
- Include all arguments in the discussion to gain as many relevant viewpoints as possible. Algorithmic harms frequently result from how people are categorized, potentially leading to stigma. The only way to avoid this built-in bias is by allowing all people affected by the algorithm to participate in its development.
Data Science Association Code of Conduct
Various data science organizations have attempted to establish a code of ethics for the field, but as Towards Data Science notes, reaching a consensus among data scientists as to the principles’ scope and their relationship to an individual’s own value system is difficult. Among the groups that have formulated a code of ethics for data scientists are the Association of Data Scientists and the Data Science Association, whose code of conduct covers eight areas:
- Demonstrating competence, including knowledge, skill, thoroughness, and preparation
- Clarifying the scope of services provided to clients and committing to meeting client objectives
- Maintaining regular communication with clients and keeping them fully informed
- Protecting the confidentiality of client information
- Avoiding conflicts of interest
- Honoring duties to prospective clients of fairness and openness
- Adhering to full disclosure to clients about the quality of data and evidence
- Avoiding misconduct, including fraud, deceit, misrepresentation, and prejudice
Resources for Ethical Use of Data for Social Good
- Data & Society, “Algorithmic Accountability: A Primer” — Delve deeper into the ethical considerations that must be addressed when designing decision-support algorithms.
- Towards Data Science, “Doing Data the Right Way: The Ethics of Data Science” — Learn more about murky ethical areas, such as informed consent, data ownership, and responsibility for data validity.
- Deon, An Ethics Checklist for Data Scientists — The Deon command-line tool automates the process of adding an ethics checklist to data science projects.
The Growing Importance of Data Science to Public Health and Welfare
The same advanced data analytics tools that have driven the growth of large tech firms hold great promise for having just as big an impact on the provision of public services to the people who need them in an effective, efficient manner. Data scientists possess valuable skills and the deep-seated desire to apply those skills to benefit their communities and the world at large. Their contributions are limited only by their imagination.
Infographic Sources:
Explorium, “Data Bias and What It Means for Your Machine Learning Models”