Key Takeaways

Museums and cultural institutions are stewards to valuable data that is at risk of exploitation by AI companies.
Institutions are not the only owners of this data; descendants and communities of peoples from which their collections originate also have ownership.
Adapting cultural competency frameworks may help protect this data.
This moment offers an opportunity to ensure all stakeholders of this valuable data are informed and compensated; collective action is critical.
Slow down: Don’t let tech companies set the pace for decisions and action.

It's no surprise that AI was a prevailing topic at AAM's Annual Meeting in Philadelphia. Even when sessions were not specifically about AI, the subject came up during Q&As or side conversations in the halls: stories of displeased visitors asking staff if certain content was AI, colleagues asking each other about their institutions' AI use, sharing tips and considerations for drafting AI policies.

One session focused on a different aspect of the impact of AI. During "Museums, Data, and Culturally Competent AI," session moderator Tiffany Lyons of Lord Cultural Resources led panelists through a discussion about how museums can inject their core values — inclusion, accuracy, and social equity — into the digital future.

Data scraping

The automated process of extracting large amounts of information from websites. AI companies use web scrapers and crawlers to collect publicly available text, images, and metadata to train their models. While some sites use robots.txt files to signal they don't want to be scraped, there is no guaranteed way to block all scrapers.

In a way, this session was about aspects of the AI landscape that have been outside of the control of individual institutions due to the speed at which the AI companies have been working. Right now, organizations are having internal discussions about when and where and if it is appropriate to use AI tools to produce visitor-facing content and materials, or how to use these tools internally to free up staff time. But AI tools are able to do these tasks because tech companies have been training their AI models on all the publicly accessible data they can find — including information from museums' websites and collections.

Like any resource, data can be exploited. Numerous lawsuits against these tech giants argue exploitation has already occurred and is ongoing. So what does that mean for museums?

AAM session panelists Dr. Jaelle Scheuerman, co-founder of Train with Intent, and Ali Hossaini, research fellow at King's College London and co-author of the Manual of Digital Museum Planning, said while museums may not have the individual resources to take on these companies in court, and the current lawsuits will take some time to play out, there are conversations and actions the industry can take now to protect their unscraped data and prepare for future disruptions.

The value of data

To understand the value of data, Hossaini asked the audience to consider the different perspectives of a salt flat in Chile. People may begin extracting that resource, arguing that it's not worth much, but it actually contains lithium, which is incredibly valuable.

“So right now, when we digitize collections and put them online, we're essentially giving up this heritage to an extractive economy that we didn't quite realize existed, because it was new a few years ago,” Hossaini said.

The data museums hold is rich in cultural and historical information. By training large language models (LLMs) on this data, tech companies create culturally competent AI. These tech firms can then sell more sophisticated and informed AI systems to huge corporations, and those corporations in turn can improve their products and marketing.

Large language model (LLM)

A type of AI system trained on massive datasets of text to understand and generate human language. LLMs power tools like ChatGPT, Google Gemini, and Claude. Their capabilities depend directly on the quality and breadth of data they are trained on.

What is culturally competent AI?

Cultural competence, or cultural competency, is a skill that's recognized by corporations and organizations all around the world.

"It means having the capacity to adapt oneself," Hossaini said. "In other words, your bearing and your presentation, and the way that you treat another person in a way that's respectful of their positionality — in other words, respectful of their background, their culture, and their identity."

This has both moral value, with benefits like supporting democracy, supporting civil society, supporting basic decency and equity. But, Hossaini said, cultural competency also has commercial value. When a fast food restaurant wants to open in a new country, they research how to best tailor their product to fit the tastes of locals, and how to best advertise to people in that area. That information has direct correlation to financial outcomes for the restaurant.

“But let's look at what happens…when you give all the tools for cultural competency out for free to organizations that may or may not have your best interests in mind,” Hossaini said.

That's where cultural competency frameworks come in.

Sovereignty frameworks: OCAP principles

Dr. Scheuerman said growing up during a period when institutions were sharing their digital collections online was wonderful, but the rapidly shifting tech landscape has changed things.

"However, then along came the AI companies, and they sucked up all the data, and I had to step back and reevaluate my thoughts a lot on this," Dr. Scheuerman said. "Over the course of the last couple of months, I've been starting to look more into these OCAP principles from the First Nations as a way to frame how we give creators control over their data and still think about the access."

The integration of these principles may help guide the industry as they balance providing access to information as a public good, but still retain control over their data.

The OCAP principles show both the challenge and the opportunity with AI, Hossaini said. Data is valuable, which holds exciting potential for museums as a financial opportunity. But extending the concept of stewardship from collections to data and metadata about these collections requires embracing joint sovereignty that enables all people to control their legacy and heritage.

"I think there's a very positive thing that can happen now, because all of a sudden museums could say, 'Look, we recognize your role in this, we recognize your legacy, and we want you to benefit from it.'"

"I live in London, and we all know the British Museum's position on restitution — pretty hard line — but there's no reason why they can't give joint ownership of the data and metadata around the collection," Hossaini said. This data and metadata has value, he added, value that could help both museums and the communities "whose ancestors produced this incredible wealth, which is now in museums, primarily in the global north, need to have not just sovereignty over their data…but should also have a benefit from that marketing and licensing."

“So, I think there's a very positive thing that can happen now, because all of a sudden museums could say, ‘Look, we recognize your role in this, we recognize your legacy, and we want you to benefit from it,’” Hossaini said.

Hossaini likened it to the music industry. When someone writes a song, they can work with a collective management organization who will intervene on their behalf to prevent exploitation of the music. Luckily, museums have been slow to fully digitize their data, giving them an opportunity to answer critical questions about how to move forward: Who is going to control the data? Who is going to benefit, and who is going to get license fees from it? How can we begin these conversations with the descendants of the people who created the artifacts, both tangible and intangible?

Institutions can take steps to proactively nurture this relationship between communities and these data sets that they are stewarding, Dr. Scheuerman said. With a legacy of collaborating with subject matter experts, it is a natural extension for museums and cultural institutions to create relationships between computer scientists and the data stakeholders to ensure that data is being properly protected. This could include facilitating data literacy training, so everyone understands benefits and consequences of how data is stored.

"Then when these communities are asked to make these decisions about how the data will be made available online or otherwise, they can make that decision with a lot of knowledge and understanding of what the outcomes might be," Dr. Scheuerman said.

Building value in the new data economy

How can institutions and the data stakeholders create new revenue streams from culturally competent AI data sets? What will that look like?

Model collapse

A phenomenon where AI models trained primarily on AI-generated content (rather than human-created content) progressively degrade in quality and diversity. This makes authentic, human-curated data — like museum collections — increasingly valuable over time.

The good news is we don't need new laws, Hossaini said. Existing laws enable people to form collective management organizations and data trusts, and if they're big enough, to manage their data themselves.

The New York Times and Getty Images are recent examples of companies that felt their content was being exploited without any benefit to them by companies training LLMs on their data.

"Well, they negotiated a licensing agreement with those companies, established some commercial baselines, and now they have another income stream," Hossaini said. "Museums can do the same, and I think it's critical for moral reasons, for all the reasons that we want to preserve and ensure that people's heritage is sustained, but it also makes sense financially, because we all need new revenue streams."

Collective action is critical

Very few museums individually would have much power to affect the kind of change needed, Hossaini said. Just as the music industry collaborated to address pirating and licensing issues, museums should band together and work toward forming collective licensing agreements.

This is a "generational project," Hossaini said, so continuing the conversations across conferences like AAM is a good idea. Hossaini also mentioned a European concept called the Knowledge Triangle, which is a partnership between a charitable institution or nonprofit, an academic institution, and a commercial institution.

The commercial partnership is not about arming multi-national companies with the cultural competency to break into new markets, Hossaini said, though there is no way to ensure 100% control of the outcome of data.

"We want to support cultural competency where people's positionality and their identity is respected and not just catered to momentarily to sell them on something they don't need," Hossaini said. "Now, it will be used for that, but if it does, at least let the person who's buying it get some benefit out of it."

Don't move at the speed of tech

One key piece of advice from this session: Don’t rush into any decisions. Dr. Scheuerman said that while it seems as if AI companies are moving quickly on purpose, museums do have time to pause and make informed decisions about their data: “How do we actually want this used? Who should have access to it? Who should have possession of it? And how can we get there in a way that aligns with our values and our stewardship, so that in the future, when we're looking back, we're not like, okay, lots of bad decisions were made because we're all forced to just run and do something as quickly as possible.”

Hossaini agreed with Dr. Scheuerman.

"Think about it, your institution might have been around for a few centuries already, but it's at least been decades, and the artifacts in it are certainly much older, so there's no real rush," Hossaini said. "I think what people need to remember foremost is, look, you own your data, and we've already established that you don't — you're not the sole owner of it, actually somebody else whose legacy is embodied in that collection also owns it, and we have sought to have dialog with these individuals and these communities who are all over the world for many, many years."

Actions and considerations for your institution:

Think carefully about what data you make publicly accessible on your website
Schedule intentional conversations about AI and data ownership at your organization
Explore the OCAP principles as a framework for data governance decisions
Consider connecting with peers at conferences and through professional organizations to discuss collective approaches to data licensing

Museums, Data, and Culturally Competent AI