By Nishitha Chidipothu, Snigdha Mishra, Shiya John and Jim Samuel
“Data has the power to revolutionize and disrupt the way societies are governed” – Rikkus Duus and Mike Cooray
Artificial Intelligence (AI), adaptive AI technologies, and open data are critical for future socioeconomic value creation (Samuel, et al., 2022; DeLallo and Tennison, 2020). With the promise of solving a vast array of societal and economic problems, AI initiatives have attracted huge amounts of attention and investments. Undoubtedly, it is worthwhile to invest in AI technologies to help advance societal agenda and yet it is necessary to ensure stability and sustainability (Neufeld, 2021). AI projects vary in size, complexity and scope, and depending on each institution’s goals, culture, and needs, the dynamics of AI value creation and returns on investment may vary. For processes which do not require human intervention, such as those associated with ‘AI-assisted fraud detection,’ the returns could be visible within months. Those in need of human collaboration such as in the case of ‘AI-supported customer service,’ it may take longer to observe results (Fountaine, et al., 2019). Given the growing importance of ‘AI+Open-Data’ trends, this article clarifies and highlights the role of AI and open data in society, illustrates domain applications, and ends with an emphasis on the implied need for public policy.
AI has been defined as a “set of technologies that mimic the functions and expressions of human intelligence, specifically cognition and logic” (Samuel, 2021). The term ‘open data’ has been used for a few decades, and though definitions vary, there is common agreement that the concept of open data revolves around making data freely available, especially government data for the purposes of transparency, ethics, accelerated insights generation, and innovation (Chauvette, et al., 2019). The Canadian Tri-Agency articulates a persuasive and compelling view of open data:
“When properly managed and responsibly shared, these digital resources enable researchers to ask new questions, pursue novel research programs, test alternative hypotheses, deploy innovative methodologies and collaborate across geographic and disciplinary boundaries. The ability to store, access, reuse and build upon digital research data has become critical to the advancement of science and scholarship, supports innovative solutions to economic and social challenges, and holds tremendous potential for Canada’s productivity, competitiveness and quality of life” (Government of Canada, 2016).
In accordance with the International Open Data Charter (ODC), we define ‘open data’ as being data that is made freely available for open consumption, at no direct cost to the public, which can be efficiently located, filtered, downloaded, processed, shared, and reused without any significant restrictions on associated derivatives, use, and reuse (ODC, 2015). Our definition does not restrict source, and therefore it includes but is not limited to open government data (OGD).
The combination of AI and open data is a powerful economic enabler and citizens and organizations are benefiting from the new opportunities made possible by open data. Greater transparency and more OGD information are promoting higher quality decision making, increasing data driven accountability, opening up new industries, and supporting innovation globally. The open data movement takes data out of the confines of the technologically privileged few, and makes it freely available for all people and entities to use, reuse, and consume data. Teams with diverse persons from different backgrounds will be more effective in conceptualization and insights generation, and in balancing the needs of institutional priorities and business problems in decision making (Fountaine, et al., 2019). Open data can also be used for continuous improvement: “When Open Data is used for new products or services, it can increase data demand – and drive the release of more datasets and improvements in data quality” (ODT-Worldbank, 2022). This in turn leads to improved products and services.
Open data can be used in different categories ranging from agriculture and environment to budget and public finance. With the development of new technologies, agricultural science has continuously improved and is a vital component of national and global economies. With the introduction of informatics and AI into agriculture, the new AI+Open-Data driven agriculture is leading the way in increasing crop yields, fighting plant diseases, and improving logistics. In AI+Open-Data driven agriculture, researchers collect large volumes of data to help resolve a broad range of challenges, including helping farmers in analyzing soil conditions and weather patterns. AI tools and applications also help in managing logistics and making optimal business decisions, including identification of crop sequences for profit maximization contextualized to external variables (Walch, 2020; Hidalgo, C., 2021). For example, open-data applications in agriculture will have – “weather datasets, data on seed genetics, data on environmental conditions, and soil data” (data.europa, 2019).
Numerous open data and AI driven agriculture support initiatives have been implemented globally, including France: ‘Open data for local procedures,’ USA: ‘Open data for Nutritional Health,’ Europe: ‘Open data for food safety,’ and Sub-Saharan Africa: ‘Open data for food security’ (Hidalgo, 2021). Open data has grown to be of significant importance in assessing the unpleasant consequences of environmental crises. Traditionally, key environmental domains include “climate change, air quality, biodiversity, and water resources” (Willoughby, 2019). Open-data initiatives in the environment domain grew during the early 2000’s, and they have been utilized to display the negative consequences of environmental dangers, in order to aid researchers in improving their understanding of potential hazards.
The synthesis and creation of multidisciplinary data, using open geographic information systems (GIS) data is a strong avenue for value creation. GIS data can be combined with many variables, such as financial information, criminal statistics, agricultural information, and real estate information (Tate, 2020). Cities can use GIS data to announce travel directions, updates for emergency services, events at specific places, and much more. Tate (2020) states it quite aptly: “providing open data in conjunction with GIS to relay this information is an essential step in the advancement of civil society.”
The importance of the healthcare domain can never be overstated and with the recent COVID pandemic, open data has been widely acknowledged as being critical for the management of public healthcare issues. Publicly available patient related information (i.e., partially deidentified access, without harming privacy), availability of medicines and medical care, epidemic breakout mapping with GIS information, COVID-19 daily cases, and social media data have proven to be very beneficial in estimating public sentiment, envisaging reopening scenarios, evaluating socioeconomic impact of the pandemic and predictive modeling for the COVID-19 pandemic (Samuel et al, 2020a, b; Ali, et al., 2021, Rahman, et al., 2021; 2022). Open data on budget and public finance is vital to transparency and accountability: in order to nurture public trust, which will lead to better engagement and public trust, governments are taking steps to disclose budgetary and financial data (OECD, 2021).
It is evident from these domain illustrations that the scope and possibilities for value creation with AI and open data are near boundless. However, it is necessary to address stability and sustainability issues to ensure that the promise of AI and open data to society is delivered without derailment. Our research focuses on identifying open data portals in the state of New Jersey, and on the development of best practices and public policy to facilitate positive outcomes for the combination of open data and AI. Public policy needs to address multiple dimensions of open data for AI applications including risks, security, ethics, and value creation. Operationally, this is a challenging task to implement consistently due to the dynamic nature of data and changing models and perceptions of risks. Our research leans onto the concept of ‘Availability-Accessibility-Usability’ from HPC and posits that it is not sufficient for open data to be merely available, open data must be accessible and easily usable for AI applications for open data to be truly transformative (Samuel, et. Al., 2022). Application of open data best practices will lead to improved outcomes. With trillions of dollars of potential value creation with open data at stake, it will be worthwhile and necessary to invest into the development of public policy to support the ‘Availability-Accessibility-Usability’ strategy for open data (John, et al., 2022). While there has been a fair amount of focus on open data and on AI separately, current ‘AI+Open-Data’ trends clearly imply that it is now necessary to elaborate public policy to support the powerful combination of AI and open data for the benefit of human society.
- Duus, R. and Cooray, M., The future will be built on open data – here’s why, 2016.
- Samuel, J., Kashyap, R., Samuel, Y. and Pelaez, A. Adaptive cognitive ﬁt: Artiﬁcial intelligence augmented management of information facets and representations. International Journal of Information Management 65 (2022) 102505. https://doi.org/10.1016/j.ijinfomgt.2022.102505
- Neufeld, S. Deploying open government data for AI-Enabled Public Interest Technologies. 2021, ORF.
- DeLallo, David and Tennison, Jeni “How to make the most of AI? Open up and share data” McKinsey, 2020.
- Knapp, B. (2018). Here’s where the Pentagon wants to invest in artificial intelligence in 2019. C4ISRNET, February, 16.
- Samuel, J. (2021). A call for proactive policies for informatics and artificial intelligence technologies. Scholars Strategy Network. Url: https://scholars. org/contribution/call-proactive-policies-informatics-and
- Chauvette, A., Schick-Makaroff, K., & Molzahn, A. E. (2019). Open data in qualitative research. International Journal of Qualitative Methods, 18, 1609406918823863.
- Government of Canada. (2016a). Tri-agency statement of principles on digital data management. Ottawa, Canada.
- ODC, Open Data Charter. URL: https://opendatacharter.net/principles/
- Fountaine, T., McCarthy, B., & Saleh, T. (2019). Building the AI-powered organization. Harvard Business Review, 97(4), 62-73.
- Walch, K. (2020, October 30). How AI can be used in agriculture: Applications and benefits. SearchEnterpriseAI.
- Open data in the agricultural sector. (n.d.). Retrieved October 16, 2022, from https://data.europa.eu/en/datastories/open-data-agricultural-sector
- Hidalgo, AC., 5 ways to use Open Data in Agriculture. 2021, Opendatasoft.
- OECD budgeting transparency toolkit. OECD. (n.d.).
- Willoughby, S. (2019) Open Data and the Environment. In T. Davies, S. Walker, M. Rubinstein, & F. Perini (Eds.), The State of Open Data: Histories and Horizons. Cape Town and Ottawa: African Minds and International Development Research Centre.
- Tate, L. (2020, March 24). What are the benefits of open data and GIS to the public? The GIS Blog.
- Samuel, J., Ali, G. G., Rahman, M., Esawi, E., & Samuel, Y. (2020). Covid-19 public sentiment insights and machine learning for tweets classification. Information, 11(6), 314.
- Rahman, M. M., Ali, G. M. N., Li, X. J., Samuel, J., Paul, K. C., Chong, P. H., & Yakubov, M. (2021). Socioeconomic factors analysis for COVID-19 US reopening sentiment with Twitter and census data. Heliyon (ScienceDirect by Elsevier), e06200.
- Ali, G. M. N., Rahman, M. M., Hossain, M. A., Rahman, M. S., Paul, K. C., Thill, J. C., & Samuel, J. (2021, August). Public perceptions of COVID-19 vaccines: Policy implications from US spatiotemporal sentiment analytics. In Healthcare (Vol. 9, No. 9, p. 1110). MDPI.
- Samuel, J., Rahman, M., Ali, Nawaz G. G. Md., Samuel, Y., Pelaez, A., Chong, P. H. J. and Yakubov, M (2020) “Feeling Positive About Reopening? New Normal Scenarios From COVID-19 US Reopen Sentiment Analytics,” in IEEE Access, vol. 8, pp. 142173-142190, 2020, https://ieeexplore.ieee.org/document/9154672
- Rahman, M., Paul, K. C., Samuel, J., Thill, J. C., Hossain, M., & Ali, G. G. (2022). Pandemic Vulnerability Index of US Cities: A Hybrid Knowledge-based and Data-driven Approach. arXiv preprint arXiv:2203.06079.
- Samuel, J., Brennan-Tonetta, M., Samuel, Y., Subedi, P. and Smith, J. “Strategies for Democratization of Supercomputing: Availability, Accessibility and Usability of High Performance Computing for Education and Practice of Big Data Analytics”, Journal of Big Data – Theory & Practice, 2022.
- John, S., Mishra, S. and Samuel, J., Catalyzing the Information Economy: Moving Towards Strategic Expansions of Open Data-Driven Value Creation. New Jersey State Policy Lab, 2022