Big Digital Tech Moves Into Synthetic Biology: The Generative AI Rush / Black Box Biotech P1/2

Posted on October 9, 2024 by Yves Smith

Yves here. If you are the sort who has reservations about GMOs, you really won’t like synthetic biology, particularly the AI “we don’t really know how we got there” kind. GMOs have reduced diversity while also reducing fitness (as in less robustness to other threats aside from the one that was the target of the genetic modification).

By Lynn Fries. Originally published at GPENewsdocs

Jim Thomas says profit driven generative biology, Big Tech integration of artificial intelligence with synthetic biology, raises serious challenges for global oversight of biotechnology and governments need to separate hype from reality at the upcoming 2024 UN Biodiversity Conference (CBD COP16).

October 4, 2024 / Produced by Lynn Fries / GPEnewsdocs

Transcript

LYNN FRIES: Hello and welcome. I’m Lynn Fries producer of Global Political Economy or GPEnewsdocs.

Today’s segment will focus on AI Generative Biology and why this new approach to biotechnology raises serious challenges for the Convention on Biological Diversity. This in featured comments by Jim Thomas. A key message being that the integration of artificial intelligence with synthetic biology under the control of the world’s largest and best capitalized companies, is a recipe for real problems down the line.

Problems like unexpected impacts on health, economies, on biodiversity in the longer term once a greater variety of engineered proteins make it into the market, our bodies and the biosphere.

As argued by Jim Thomas in his report DNAI The Artificial Intelligence / Artificial Life Convergence: the world’s regulators and governments have the chance to apply the precautionary principle before the number of novel protein entities entering the biosphere starts to mimic the toxic trajectory of synthetic chemicals. A trajectory that in early 2022 scientists reported that humans had breached the safe ‘planetary boundaries’ for novel chemical entities in the biosphere.

To précis points made by Thomas, it was from the enormous challenge of trying to deal with the negative effects of unassessed, poorly understood synthetic chemicals that the Precautionary Principle was first established in environmental governance.

This principle roughly states that it is appropriate and prudent to take early action to prevent, regulate and control an emerging threat before we have all the data to conclude on its exact nature. The precautionary principle was crafted exactly to try to prevent highly disruptive technological developments running into widespread application ahead of proper oversight and governance.

The precautionary principle is enshrined in the UN Convention on Biological Diversity and the Cartagena Protocol on Biosafety. It is this core principle that has guided the Convention on Biological Diversity in meeting its ecological and socioeconomic objectives including assessment of potential threats to livelihoods, the sustainable use of biodiversity and ethical and cultural considerations.

Since established in 1992 at the UN Rio Earth Summit, the Convention on Biological Diversity and its protocols has served as the world’s premier global instrument for oversight of biotechnology. The vast majority of the world’s governments have adopted the Treaty and are called the Parties to the UN Convention on Biological Diversity. And when these Parties meet, that’s called the Conference of the Parties.

And this is the supreme decision making body of the Convention on Biological Diversity. The next Conference (COP for short) will be held from the 21^st October to 1 November in Cali, Columbia. To further clarify all this, when you hear reference to COP 16 of the CBD, what that’s referring to is the 16^th meeting of Conference of the Parties to the Convention on Biodiversity.

In the lead up to the upcoming CBD’s COP 16, the African Centre for Biodiversity, together with Third World Network and ETC Group produced Black Box Biotech – a new report on generative biology. The report was researched and written by Jim Thomas and serves as briefing paper ahead of the CBD COP 16 .

Under the auspices of the African Centre for Biodiversity, Third World Network and ETC Group the report was published on September 3 followed by an online webinar on September 12^th as a further briefing ahead of the CBD’s COP 16. Titled Black Box Biotech, the online briefing featured five speakers and was moderated by the African Centre for Biodiversity.

The online briefing announcement provided an overview as follows:

The United Nations Convention on Biological Diversity has for 30 years governed new developments in biotechnology in the frame of precaution and justice and has also recently established a process of technology horizon scanning, assessment and monitoring of new developments. Now there is an industrial attempt to converge next-generation genetic engineering tools (synthetic biology) with generative AI (of the sort used by ChatGPT) in a new “generative biology” industry.

On the agenda of the Black Box Biotech was:

Why the Convention on Biological Diversity’s expert group propose an urgent assessment of this newest AI-biotech convergence

How the use of generative AI in biology bring thorny new problems stemming from the opaque and error-prone ‘black box’ character of AI

How the world’s largest digital tech companies (including Google, Microsoft, Amazon and NVIDIA) are fueling a ‘generative biology rush’ including a bold biopiracy grab of all the world’s digital sequence information on genomic resources.

What can be done at COP16 in Cali, Columbia?

This segment, as noted at the open, will feature comments on all this by Jim Thomas. Of special relevance to today’s segment on generative biology, Jim Thomas was a member of the Multidisciplinary Ad Hoc Technical Expert Group (mAHTEG) on Synthetic Biology established by the Conference of the Parties to the UN Convention on Biodiversity. He has almost three decades of experience tracking emerging technologies, ecological change, biodiversity and food systems – on behalf of movements and in UN fora.

Jim Thomas’s home page is www.scanthehorizon.org where he posts on his current engagements as researcher, writer, and strategist. Prior to this, he shared the work of the ETC Group where he was Co-Executive Director and Research Director.

LYNN FRIES: We go now to our featured clip.

SABRINA MASINJILA: Now without further ado, I would like to now welcome our first speaker, Jim Thomas, whose bio has already been posted in the chat. We won’t read bios because of time. Now, Jim, welcome. And please share your screen and start off with the presentation.

JIM THOMAS: Great and I hope everybody can see my slides. As Sabrina says, it’s my task here to present this new briefing, Black Box Biotech, which is a sort of short introduction to the question of the integration of artificial intelligence and synthetic biology, what’s being called generative biology.

And I really do want to thank the African Center for Biodiversity, who really showed great foresight in commissioning this work along with ETC Group and Third World Network, and also the reviewers, particularly Dr. Maya Montenegro. of UC Santa Cruz and Dr. Dan McQuillan from Goldsmiths College in London.

One thing I would really like to emphasize is that this report is just a briefing. It’s an introduction. It’s a sort of a preliminary scan of the issues that are raised.

And that’s because there isn’t yet a deep dive significant report that’s looked at the many policy issues, equity issues, sustainability issues that are now raised by these new developments in so called generative biology and there urgently needs to be that.

Such a study needs to happen under the aegis of a trusted international body such as the Convention on Biological Diversity. And luckily, that’s exactly the option that’s in front of the Conference of the Parties, the 16th Conference of the Parties in Cali next month.

The option to actually commission an in depth assessment on the potential impacts of the integration of artificial intelligence and machine learning into synthetic biology. This is something that’s urgently needed. This technology, this integration is moving very fast into commercial use.

For those who may not know some of these terms, synthetic biology is the term that’s used to broadly describe the next generation techniques and approaches to genetic engineering. It tends to refer to more experimental approaches and newly emerging technologies. And honestly, it’s very often a sort of a hype term that mobilizes capital and investment but also research agendas.

And the underlying concept behind synthetic biology is to try to make the somewhat messy world of biology more predictable, more rational, more of an engineering substrate, programmable even.

And so there are a lot of metaphors in the field of synthetic biology – like programming DNA as code, or life as machines – which make strong, powerful metaphors but are problematic, in terms of obscuring that some of the complexities and messiness of the world.

And of course, because this is a term that’s often used for mobilizing money, there’s a tremendous amount of hype. We’re talking about techniques such as genome editing, synthesizing new DNA and RNA or proteins, and so forth.

If the field of synthetic biology is full of hype and obscuring metaphors, then even more so the topic of artificial intelligence. And I think it’s important to recognize that artificial intelligence as a term covers a whole basket of computational technologies used for data analytics, forecasting, natural language processing, and so forth.

And most importantly, what we’re not talking about here is the science fiction version of artificial intelligence. This is not thinking machines, intelligent computers, computer sentience.

What today passes for artificial intelligence is sets of computation that calculate probabilities and then make sort of predictions. And they’re often trained on extremely large sets of data that are then interrogated in order to make these kinds of predictions.

There are different types of artificial intelligence. I’m not going to talk about traditional AI. Discriminative AI is the sort of AI system that takes large amounts of unstructured data. And is able to look within it and sort of recognize patterns. For example, to look at pictures and recognize that there’s a cat there.

And generative AI, which is much of what we’re going to be talking about, also depends upon taking large sets of data. And then building a model which can generate similar types of data, basically.

This is the kind of AI that you have that says: not recognize a cat, but draw me a picture of a cat or write me an article about a cat. And create synthetic data in a predictive way, rather in the way that your phone will make predictive text. It will try and work out what you want and present it to you.

The reason why generative AI is so much to the fore is that there is a massive investment boom right now around these generative AI systems. Really sparked by the the coming out of ChatGPT at the end of 2022.

Now we see hundreds of billions of dollars being sunk into the generative AI space with the hope by investors that they’re going to get very real outcomes here.

So far, we’ve seen Goldman Sachs and others point out these hundreds of billions of dollars are really not yielding anything very much. And so there’s a hope by moving into synthetic biology and other areas, they will yield a bit more.

In the report, we lay out four different areas where artificial intelligence is combining with synthetic biology and biotech. And I’m going to focus mostly on the first of these:generative biology, the use of artificial intelligence for bio-design.

And what we’re talking about here is asking an artificial intelligence model to come up with new strands of DNA, or new protein sequences, that might not have existed before. And I’ll talk more about that in a moment.

But it’s worth noting that also use of artificial intelligence can improve machine vision and laboratory processes or fermentation in bio-production that we’re seeing, for example, in digital agriculture. The combination of living organisms being modified alongside AI. And that’s what we call bio-digital convergence.

And we’re even seeing, artificial intelligence computation being carried out at a experimental way in living cells, for example, brain organoids. So, there’s there’s a bio-computation part to this.

Biodesign and generative biology, which is mostly what we’re going to be talking about, sits on a very simple idea. If you’ve used ChatGPT, you’ll know that that’s an AI model which is trained on millions, even billions of pieces of text. Such that you can say to it: write me a poem about a dog. And it will write you what looks like a poem about a dog.

Or you can get something like MidJourney, and you can say: draw me a picture of a dog. And it will draw on the millions of images that it’s been trained on to draw you a picture of a dog.

And so the idea is you can take one of these generative AI models and train them on millions of digital sequence information about genetic resources, on DNA and RNA, then you can also say: design me a protein. And as Jason Kelly here of Ginkgo Bioworks puts it: The idea is to make an AI model that can speak protein or speak DNA, just like ChatGPT speaks English.

The poster child for this, the sort of proof of principle, is a program that’s very high profile called Alpha Fold, developed by DeepMind which is Google’s AI section.

And in about 2017, 2018, Alpha Fold was trained on many thousands or ultimately hundreds of thousands of sequences, protein sequences, the sequence of RNA that is then folded into a living protein.

And by 2021, DeepMind were claiming that AlphaFold could work out how every single protein that is known, every protein sequence, folds into actual proteins. And this solved, supposedly, what is called the protein folding problem in biology.

And that AI was able to do what takes many years for a human scientist to do. This was really held up as a major leap forward for big science and for AI driven biotechnology.

The excitement over Google AlphaFold is also about the fact that now we can get an AI to begin to control or predict the living world at the molecular level.

But it’s worth raising a bit of a red flag here. While there’s a lot of excitement at the lab bench, protein scientists are saying: well, wait a minute. These are just predictions as is true of much of AI. This has to be checked.

And in fact, we’re seeing a large number of errors in what AlphaFold is predicting. And limits and even hallucinations, which I’ll come to soon.

And so AlphaFold, even though it’s held up as this wonderful example of using AI to solve biological problems, actually was very much over-claimed. And this is something we’ve seen, whether it’s with gene editing or even generative AI, instant over-claims. And then need to sort of row back on that.

One metaphor that we’ve leaned on heavily in this briefing, and it’s important to understand if you’re not familiar with debates around artificial intelligence, is the concept of the black box.

Basically,the black box problem, which is much discussed in AI policy, refers to the fact that when you train an AI model and then it makes outputs and decisions, it does so sort of hidden away in a black box.

It’s not possible to understand why it made decisions that it made. It’s just simply too complex. And the black box problem of not being able to have explainability, this opaqueness, causes real problems for policy and in this case for outcomes.

It means that humans have been cut out of the loop on decision making and deciding why particular genetic sequences are used. It has serious implications for safety and accountability and traceability. And I’ll come back to that in a moment.

Another common topic in AI policy that’s very relevant here is the notion of hallucinations. This is when you have an AI model, for example, supposed to produce images, and those images look okay. But when you look closely at them, you find that they have all sorts of errors or AI text that is full of errors.

So here we have an old man who when you look closely has an impossibly long right arm and six digits on his hand being impaled by a unicorn. This is because of a hallucination by the AI image system.

Analysts have estimated that AI systems will hallucinate about a third of the time. And about half the time there’s some kind of error within their results. This is quite significant.

It’s significant when you’re talking about text and images. It becomes extremely important if you’ve got hallucinations occurring in living organisms or within biological molecules.

Scholars have pointed out that this isn’t the system not working properly. This is the system working properly. This is actually baked into how AI generative AI works.

And [scholars] have suggested that generative AI systems should be scientifically classified as bullshit machines. They just make something that kind of looks right but they’re not fundamentally interested in finding truth.

There are also very serious issues around bias that we refer to in the report. And that has to be brought into considering the use of AI for building synthetic organisms.

In the report, we touch on some of the biological molecules that AI systems, generative AI systems, are now being asked to generate. These are novel biological molecules.

New synthetic proteins that would never have existed before in nature, new strands of DNA and RNA. And each time, the decision making on how to order those genetic codes or those protein codes is hidden. It’s hidden in the black box.

There are also companies that are building new gene editing proteins. People will be familiar with CRISPR Cas9 as a system. But there are companies like Profluent who are now creating new AI generated gene editing systems.

Or indeed ways of changing the epigenetics. Things like histone modification, the ways in which, genetic systems express themselves. That’s also being redesigned through artificial intelligence.

One of the things we focus on in this short report is how much Big Tech, Big Digital Tech is embracing this shift to putting together artificial intelligence and synthetic biology.

A recent high profile book on artificial intelligence by Mustafa Suleyman who was in fact one of the founders of DeepMind and is now the head of Microsoft AI really is focused on this question of how artificial intelligence and synthetic biology are creating, as he says, one of the most profound moments in history.

Well, that’s a lot of hype, but it is significant. And we’re seeing some clusters of work by very large digital tech companies.

Google, of course, because of their work on DeepMind, on AlphaFold, have their own generative biology company called Isomorphic Labs. But are also working with probably the leading synthetic biology company, Ginkgo Bioworks, to produce synthetic versions of flavors and fragrances and food ingredients.

Microsoft has their Microsoft GPT platform and OpenAI, which is largely owned by Microsoft, is working with Los Alamos lab on these issues.

Amazon is also working with a generative biology company called Evolutionary Scale and they have a model called ESM3. But interestingly, the Bezos Earth Fund, which is by Amazon founder Jeff Bezos, has put a hundred million dollars into using artificial intelligence for climate and nature but largely focused on synthesizing proteins for food.

And other companies, such as NVIDIA, Salesforce, Meta, Tencent, Alibaba. These are literally the world’s largest and best capitalized companies who are all going fully into this area.

So I wanted to end by, by touching on five urgent challenges that this commercial rush into generative biology raises.

I was part of the Multidisciplinary Ad Hoc Technical Expert Group on Synthetic Biology [ mAHTEG of the Conference of the Parties to the Convention on Biological Diversity] that began to look at this topic earlier this year. And very quickly, the issues that that group started to identify were about biosafety.

Of course, if you’re producing new DNA strands and so forth that have hallucinations in them, then we have a worry about safety.

But in fact, many on that group who were biosafety assessors pointed out that as assessors, if the decision making over how to make these new proteins and DNA was done in a black box, they have no data to work with to do safety assessments, and that’s very significant.

So the black box is obscuring the ability to do biosafety assessments.

The military planners have also pointed to something called the pacing problem. Briefly, in the same way we’re now seeing large amounts of synthetic pictures and text coming out of chat GPT that’s overwhelming the internet.

What happens when we start to see large amounts of synthetic early produced artificial intelligence designed living organisms that may overwhelm biosafety regulators.

Probably the issue that I found most concerning, however, was about biopiracy. As I’ve mentioned in passing, in order to have these models, you first have to train them on massive amounts of what’s called digital sequence information.

A company like NVIDIA, here there’s a quote from NVIDIA saying that for their GEN SLM platform they took all the DNA data, DNA and RNA data for viruses and bacteria, about 110 million genomes, learnt a language model over that. And can now ask it to generate new genomes, for profit.

That’s a massive utilization of digital genetic sequences. And because this is all done in a black box, there’s no traceability back to which sequences are being drawn on to create the new genomic sequences, protein sequences and so forth. You’ve lost traceability. You have this massive utilization for commercial uses.

And this gets away from the the core principle that the Convention on Biological Diversity has worked on the idea of Fair and Equitable Access and Benefit Sharing. The idea that we trace where genetic sequences come from. And then when they’re used for commercial and other purposes, their benefits go back to those original stewards of biodiversity. By having this in a black box, that’s lost.

And just to emphasize every single one of these models requires that massive amounts of data is training those genetic sequences. And that massive amount of data is increasing.

We now have companies, I think there may be people here from Basecamp Research, who are now trying to get new genetic sequences to enlarge the amount of data going in. So this is very important.

We’re going to hear a lot of promises around. The fact that the use of generative biology could create new drugs, new proteins for so called sustainable foodstuffs, for sustainable biomaterials. And that this may help the sustainable use of biodiversity to reduce fossil fuel uses and so forth.

But that has to be put in context. These systems, these generative AI systems, are incredibly energy hungry. The computation required at the moment is using up energy on the scale of say the country of Sweden – electricity.

But also massive water use of course is being extracted from agricultural systems and so forth. And of course, massive use of minerals, silicon, copper, and so forth.

So, maybe at the end of the day we find that use of these systems actually puts too much pressure on biodiversity. There are concerns about how you can not just produce new plastics or new drugs but you can also produce new viruses and new toxins. And this has to be dealt with through the Biotechnology Weapons Convention.

And finally, one of the core tasks of the Convention on Biological Diversity is the commitment to respect, preserve, and maintain the knowledge, the innovations, and practices of indigenous and local communities.

Much of what is being promised on the promises side is to be able to make new sweeteners, new proteins, new flavors, new fragrances. And these will, you know, directly replace those that have been stewarded by, grown by, looked after by indigenous peoples and local communities. And change the underlying economies that these communities depend upon.

I’m going to leave it there. Others will have important things to say. I really encourage you to look at this new report on the ACBio website (acbio.org.za) . Thanks very much.

FRIES: So that gives us a picture of the generative biology rush and the serious challenges it raises for the UN Convention on Biological Diversity over the long term and immediately this Oct 21 to Nov 1 at COP 16.

Will the the current rush of big technology companies, investment and hype around generative biology and the use of AI to boost biotechnology production have its way over biotech and biodiversity policy making at the CBD COP 16?

Or at COP 16 of the CBD, will Parties follow the recommendations of the Convention on Biodiversity Technical expert group? By which as put in the report: Parties have a very clear and straightforward opportunity to prevent the issues of generative biology from upturning and hollowing out the decades of good work that the Convention on Biological Diversity has led to help establish good governance over biotechnology.

This as the CBD needs to find answers to at least the following questions:

1. Does AI generative biology undermine access and benefit-sharing arrangements of the Nagoya Protocol and the governance of Digital Sequence Information / DSI?

2. Does AI generative biology undermine biosafety arrangements of the Cartagena Protocol on Biosafety?

3. Does AI generative biology pose biosecurity/bioweapons risks?

4. Will the integration of AI with SynBio improve or worsen health and sustainability?

5. What are the implications of AI/SynBio integration for traditional knowledge and practices?

With the benefit of this briefing in the lead up to COP 16 of the CBD, hopefully we all will now be able to separate hype from reality in news and analysis of all this. Especially as the outcome of this 16 Conference of the Parties to the Convention on Biological Diversity becomes a news item as with the conclusion of COP 16 on Friday, November 1st.

This concludes today’s segment from GPEnewsdocs. Thank you for your interest.

This video documentation has been adapted from the original recording of September 12^th, 2024 online briefing courtesy the African Centre for Biodiversity.

BIOs

Sabrina Masinjila has worked as a research and advocacy officer for the African Centre for Biodiversity (ACB) for the past decade, with a focus on biosafety, seed systems and agricultural biodiversity.

Jim Thomas is a writer, researcher and strategist whose current work can be followed at www.scanthehorizon.org. Formerly, he shared the work of ETC Group where he was Co-Executive Director and Research Director. Thomas has decades of experience tracking new trends, emerging futures and developments on the policy horizon in technology, biodiversity, food and justice. He served as member of the Multidisciplinary Ad Hoc Technical Expert Group (mAHTEG) on Synthetic Biology established by the Conference of the Parties to the UN Convention on Biodiversity.

Print Friendly, PDF & Email

Subscribe to Post Comments
One comment

PixelNomad October 9, 2024 at 6:47 am

Yves, you’ve hit the nail on the head—this generative biology rush feels like we’re fast-forwarding into a future where we’re not even sure of the script anymore. The parallel to Jurassic Park is perfect, except this time, the “wildlife” being created are proteins and DNA sequences that don’t even have a natural precedent, thanks to AI systems that, as you rightly point out, hallucinate. We’re looking at a black-box operation where not even the people running the show fully understand the outcomes, but the hype machine is already in full swing.

AlphaFold, once heralded as the solution to the protein folding problem, is emblematic of this. Yes, it was groundbreaking in theory, but the reality is messier. When you’re dealing with life at the molecular level, “good enough” predictions just don’t cut it. The moment we start seeing errors or hallucinations in how these AI models generate biological data, the stakes aren’t just high—they’re existential. This isn’t about producing faulty code in software; it’s creating biological materials that could interact unpredictably with ecosystems and even human health.

Then there’s the sheer energy and resource drain from all this. We’re talking about AI models that require massive computational power—comparable to the electricity consumption of entire nations—while these same companies wave the sustainability flag. How sustainable is it, really, when your AI needs as much juice as a small country? The disconnect between the greenwashing narrative and the resource-intensive nature of these technologies needs serious scrutiny.

Your focus on biopiracy also can’t be understated. Companies like NVIDIA mining vast amounts of DNA data to fuel these AI systems without any meaningful traceability or compensation to the communities whose genetic resources they’re using? It’s another chapter in the long history of exploitation under the guise of innovation. The Nagoya Protocol was meant to safeguard these resources, but in the opaque world of AI-driven synthetic biology, those safeguards are quickly being eroded.

We’re speeding into this future without guardrails, and COP16 is one of the last stops where governments can apply the precautionary principle to slow down this corporate takeover of biodiversity. It’s not just about stopping a few bad actors—it’s about putting the brakes on a system that is fundamentally opaque, profit-driven, and, at its core, uninterested in anything resembling accountability.

Thanks for keeping this conversation alive! I was pushed here by the suggestion of another comment. Thank you

Comments are closed.