“The boundaries are being violated now. We are approaching a turning point with the arrival of these advanced technologies, the deepest in history. The upcoming technological wave is based mainly on two general-purpose technologies, capable of operating at both the largest and the most detailed level: artificial intelligence and synthetic biology.”
Mustafa Suleyman, co-founder of DeepMind (now part of Google) and Inflection AI
“Text-to-organism” is my shorthand for a fictional system where you type in (in text) the characteristics you want for a genetically modified organism and then an artificial intelligence algorithm designs the DNA for you, perhaps even printing out the modified DNA ready to be incorporated into a living organism.
In contrast to Midjourney or ChatGPT, the ability to convert text into organisms does not exist yet – but it is not as far away as you might assume. Along with other developments along the rapidly merging borders of artificial life and artificial intelligence, developments that three years ago would have been classified under the “Science Fiction/Fantasy” category are now characterized as “Business/Corporate Capital.” […]
Speaking about policy-making, I am a member of the UN’s Multidisciplinary Ad Hoc Technical Expert Group (mAHTEG) on Synthetic Biology, under the Convention on Biological Diversity. We are a small group tasked by governments with “horizon scanning, assessment and monitoring” of new developments in cutting-edge genetic engineering (or “synthetic biology”). I was delighted when mAHTEG took on this issue of integrating artificial intelligence and synthetic biology. However, as I began to examine it more closely, I was astonished by how much the field had advanced. Specifically, it appears that developments in so-called “generative AI” – the generation of new texts, images, videos, etc., as exemplified by Open AI’s ChatGPT – is now bringing a parallel shift in how life can be redesigned, manipulated and constructed. It has also brought a deluge of additional capital, corporate enthusiasm and advertising campaigns to the already heated commercial space known as “Syn Bio”. The major corporate players in the AI revolution, many of whom have no institutional background in biology, are now eagerly beginning to play with the smallest parts of living organisms. As I recently wrote to my fellow mAHTEG members, “the current pace, investments and shifts in technical and corporate developments in this field should make us all sit up with great attention and focus to deal with this issue.”
“Synthetic biology” (Syn Bio) is more familiar in technology, investment, and science policy circles1. It is presented as the construction or “programming” of artificial life forms (a kind of “GMO 2.0” or “extreme genetic engineering”). As a field, Syn Bio was established two decades ago based on the use of large genomic datasets (big data) and new tools that would allow genetic engineers to rationally redesign biological parts, organisms, and systems. The use of artificial intelligence for redesigning genetics has always existed in some form. Synthetic biology companies such as Amyris Biotechnologies, Zymergen, and Gingko Bioworks (only the latter still exists) have been using forms of artificial intelligence for over a decade—employing algorithms to classify genomic data and select viable DNA sequences within genome design processes. However, last year’s release of mass-market “foundational models” of artificial intelligence (such as Open AI’s ChatGPT, Google’s Bard, Meta’s LLAMA, or Stability AI’s Stable Diffusion) brought not only a historic shift in the field of artificial intelligence, but is now also driving a parallel transformation in Syn Bio. This change is described as a move from “discriminative AI” to “generative AI,” and in biotechnology it accompanies a shift from genomic analysis to a more generative synthetic biology.
To put it broadly, while the discriminative AI of the previous decade focused on classifying differences among data to detect clusters and outliers (automated high-performance analysis), generative AI directly produces new forms or rearrangements of this data (automated de-novo high-performance design). Today’s large language models (LLMs), such as ChatGPT, respond to a human request or “prompt” by predicting which data elements will most likely satisfy the user’s request. The AI then automatically composes texts, images, or videos that appear to us as entirely new and creative, but are in reality simply a series of predictions. Generative AI programs introduce these new forms into the world as synthetic images, synthetic text, synthetic video, etc.
The mechanism by which generative artificial intelligence operates may seem somewhat magical, but in reality it is simply statistical. First, these systems undergo massive computational “machine learning” or “training.” This classifies and maps patterns of relationships between elements within billions of digital incoming texts and images to create algorithmic rules. For example, this machine training can record how much more frequently the word “pizza” is associated with the word “cheese” than with the word “elephant”—and create a rule from this. AI specialists then “fine-tune” these general relationship models on more specialized data sets (to improve the rules). The system subsequently generates results that align with the algorithmic rules inferred by the trained model, selecting the statistically most probable outcome (e.g., the most likely next word in a paragraph). This creates seemingly new variations of text or image that satisfy users’ requests for new yet coherent “synthetic media.” […]

AI-enhanced synthetic biology
Having apparently conquered the generation of synthetic texts, images, sounds, and videos, leading artificial intelligence companies (including the largest and richest companies in the world based on market capitalization) are now focusing on exploiting other commercially valuable forms of “language” that they could use to train their “large language models.” Specifically, there is incredible excitement regarding AI’s potential to master the biological “languages” of genomics and proteomics (and by extension, the field of synthetic biology). Having dramatically disrupted graphic design, scriptwriting, advertising, cinema, the legal profession, journalism, and many others, the life industries (agriculture, food, conservation, healthcare) as well as biological material production are now firmly in the crosshairs of AI companies seeking the next “breakthrough application” or their next substantial payout. This is considered the next multi-billion-dollar commercial frontier for the generative AI revolution that will “shake up the market” and yield significant returns for investors.
NVIDIA’s director of AI research (the leading AI chip manufacturer) Anima Anandkumar describes an example of how a top artificial intelligence company is making this transition to applying generative AI capabilities in synthetic biology. She describes Nvidia’s new large language model GenSLM in the following way:
“Instead of creating the English language or any other natural language, why not think about the language of genomes? You know, we took all the DNA data that is available – both DNA data and RNA data, for the viruses and bacteria that are known to us – about one hundred and ten million such genomes. We trained a language model on them and then we can now ask it to create new genomes.”
If Nvidia’s arrogant description of simply “acquiring” all this DNA data sounds like a major colonial plunder (biopiracy) – that’s because it is. I’ll return to this.
As a demonstration of GenSLM, Nvidia recently “fine-tuned” its genomic language model (based on these 110 million “acquired” genomes) with an additional dataset of 1.5 million covid viral sequences, in order to create DNA sequences for new variants of the coronavirus. They did this, as they said, in order to help predict pandemics and design vaccines – although what they didn’t openly acknowledge is that this also has dual security concerns. If you or I decided to create designs for infectious strains of biological viruses on your computer, you would expect security services to knock on your door very quickly. I’ll return to this as well.
Nvidia announced with pride that among the synthetic variants of covid that GenSLM digitally produced were also strains that match very closely with recent real biological variants that emerged in nature. That is: they correctly predicted the way Sars-C-V-2 could mutate. This matters because it helps surveillance efforts and vaccine design. But this work is not only about covid. Nvidia emphasizes that the large language genomic model (GenSLM) could now be fine-tuned to regenerate genomes of other viruses or bacteria, allowing the automatic creation of new syn-bio microbes, just as ChatGPT produces texts. Once these new sequences are obtained, DNA can either be “printed” on a DNA synthesizer or modified into a living organism using CRISPR-type gene editing tools. Nowadays, it is a relatively insignificant step to go from designing a new genome blueprint on a computer, to applying it in biological software – the hard part is to get the design right. In fact, Nvidia created a first “ChatGPT for designing viruses and microbes”.
It should be noted that NVIDIA—a traditional chip manufacturing company and the sixth-largest company in the world by market capitalization—was not previously considered a biotech company. As such, it may lack the appropriate internal instincts to deal with microbiological risks or the complex politics of synthetic biology. Even biotech companies aren’t very good at this—see Bayer’s (formerly Monsanto) track record with global protests, lawsuits, genetic contamination, etc.
Nvidia is far from being the only trillion-dollar tech giant applying generative artificial intelligence to organisms and Syn Bio components. Meta (Facebook), Microsoft (which now controls Open AI), Alphabet (Google), and Stability AI are also investing heavily in developing generative AI tools for synthetic biology. The first three of these, like NVIDIA, are also among the 7 richest companies in the world. Established corporate giants in the biotech world (e.g. Bayer, Syngenta, Corteva) also use generative AI or partner with smaller companies that apply it on their behalf. A recent AI report from a British think tank, the Ada Lovelace Institute, suggests that the market for AI-based genomic technologies could exceed £19.5 billion (approximately $25 billion) by 2030, up from just half a billion in 2021. But this forecast may already be outdated, given the pace of developments.
The impact of purely digital generative AI is already noticeable in many other economic sectors (e.g., entertainment, law, education, advertising). However, leaders in biotechnology and artificial intelligence are proclaiming that applying generative AI to biology will be an even more explosive act of “disruption” than what we have seen so far—predicting even an “atom-splitting” moment for artificial intelligence. In his recent book “The Coming Wave,” DeepMind’s co-founder (now part of Google) Mustafa Suleyman describes the current convergence of generative AI with Syn Bio as the most significant “tsunami” technologists have ever seen. Another advocate for the AI and Syn Bio union is Jason Kelly, CEO of Gingko Bioworks. His company recently entered a five-year collaboration with Google to train large language models for synthetic biology. He describes the exceptional business opportunity of applying generative AI to Syn Bio as follows:
“That’s why ‘Bio’ is particularly interesting for people interested in artificial intelligence: the idea of a foundational model and fine-tuning with specialized data – AI people understand this. Let’s try this with one of the categories of the English language – let’s say legal. This thing has to compete with a lawyer at Robeson Grey who was trained for 15 years, taught by other people, writing contracts designed to be understood by human brains in English (a language that co-evolved with our brains). This also gives us the advantage of how our brain works – and so we ask these computer brains – neural networks – to compete with us in our own fields.
Now let’s move on to biology. I remind you that whatever runs in code (sequential letters) looks very much like language – but it’s not our language, we didn’t invent it, we don’t speak it. We don’t read it, nor do we write it, and so I feel that these computer brains will cut our grass much faster in this area than in English… That’s where you should look if you’re trying to understand where AI will really turn the tables… It won’t be a simple disruption, but rather it will be something like the splitting of the atom: in biology.”

Black box biology
In short: Kelly points out that AI robots will be more comfortable navigating the logic of mechanical biology than human brains, and thus we human brains will increasingly struggle to understand what AI robots are doing for genetic engineering. Reassuring? One of the fundamental issues in AI ethics is already something called “the black box problem.” Practically, this describes how AI systems develop highly complex sets of rules for themselves that are not obvious nor easily comprehensible to humans. However, these self-developed rules subsequently force AI systems to make decisions that have real-world impacts. The “black box” problem becomes thorny when, for example, a self-driving car trained by artificial intelligence fails to recognize a bicycle and crashes into it, or when it decides to accelerate alongside a truck. Because the decision-making process of an AI is a “black box” to us humans, we cannot understand why it made that potentially fatal decision, and thus it is nearly impossible to protect ourselves from it.
Related to this is the phenomenon of “hallucinations” produced by generative AI systems. Large language models, such as ChatGPT, systematically incorporate elements in their output that appear compelling but are inaccurate or bizarre: living people are described as dead, dates are given incorrectly, AI images show humans developing additional body parts or distorted, illegible license plates, etc. While such hallucinations and black box failures can be quite problematic in the two-dimensional and electronic domains of text, image, video, or sound, they could be extremely problematic if incorporated into the genomic design of four-dimensional living organisms or active biological proteins that are released into the body or the biosphere.
Genetic engineers already routinely face problems of unexpected and emerging results from small genomic changes (even as small as a single base pair). If an AI-designed genome begins to behave unpredictably or has significant side effects, it may be impossible to understand why these changes occurred or to identify the cause until long after the organism or protein has entered the biosphere. It is unclear how biosafety assessment of organisms or proteins designed by artificial intelligence can proceed, when the system is not even able to explain its design logic in ways that we can understand. In response to the serious problems of AI’s black box, the European Union is now prioritizing the development of “explainable AI.” Governments may also insist that any organism, protein, or other biological components designed through generative AI must be able to provide robust and human-understandable explanations for their design decisions.

Creating new nanomachines – Making AI “speak” proteins
To return to Gingko Bioworks, its collaboration with Google on artificial intelligence topics initially focuses on using generative artificial intelligence for protein design—leveraging its internal database of 2 billion protein sequences. Jason Kelly of Gingko explains that “The new idea is: can I build a foundation model that… speaks proteins the way GPT4 speaks English?”
It is worth focusing a little on why proteins matter as a commercial target. While the word usually reminds us of a type of food ingredient (meat, dairy, legumes, etc.), protein is actually a very specific type of biological form: a chain of amino acids, folded into a three-dimensional structure, which in turn often has bioactive properties. High school biology reminds us of the so-called “central dogma” of genetics, where DNA in the cell is translated at specific RNA sites that encode the order in which amino acids bind together in long chains (polypeptides). These chains then refold into complex proteins: nanoscale structures that perform the processes of life. These include enzymes and catalysts that allow, accelerate, or slow down essential biochemical reactions, as well as those proteins that are structured into living and non-living materials. Proteins have been described as nature’s nanomachines – which carry out much of the work of the living world at the molecular scale. The way amino acids are encoded by DNA and RNA and then refold into different structures determines how these machines are “programmed.” Thus, being able to “speak proteins” means having the language to program biological nanomachines.
For years, the essence of protein research was the attempt to understand how amino acid sequences fold and unfold into shapes with unique and significant biological and structural roles. One by one, scientists working on proteins tried to predict how a given linear amino acid sequence would fold into a specific three-dimensional shape with particular physical and biological properties. It was a slow process. Then, in 2018, the artificial intelligence company DeepMind (owned by Google) seemingly solved the puzzle of protein folding. DeepMind’s AlphaFold program was an artificial intelligence model trained on over 170,000 proteins from a public repository of protein sequences and structures, and it proved capable of successfully predicting the shape of a folded protein from its original linear code almost every time. In 2022, DeepMind released a database of codes with protein folding predictions that included nearly every protein known to science—almost 200 million different structures. AlphaFold’s achievement is often considered one of the strongest examples of artificial intelligence “success” in science.
But it’s not enough to work only with known proteins. Following the success of Alphafold, several of the first generative AI models in synthetic biology focus on creating entirely new proteins that have never appeared in nature (“generative protein design”), as well as modifying and “optimizing” existing natural proteins.
In fact, there are now quite a few creative artificial intelligence tools for protein engineering, with names such as ProtGPT2, ProteinDT, and Chroma—but there are also several newly established companies (beyond Gingko) that focus exclusively on using artificial intelligence to create a range of new proteins for commercial markets, including enzymes, catalysts, food ingredients, pharmaceutical products, biomaterials, coatings, gene therapy, and more. In another example of how AI brings unconventional technological players to Syn Bio, global cloud services company Salesforce has developed ProGEN: yet another large AI language model for creating new proteins. This model was trained by feeding the amino acid sequences of 280 million different proteins into a machine learning model. Salesforce then refined the model by fine-tuning it with 56,000 sequences from a single category of proteins: lysozymes—in order to create functional new lysozymes (which are used, among other things, for food ingredients). A report on this work in Science Daily highlights just how vast the protein design space is for new variations within only this category of proteins:
“With proteins, the design options were almost unlimited. Lysozymes are small proteins, with about 300 amino acids. But with 20 possible amino acids, there is a huge number (20 to the power of 300) of possible combinations. This is greater than if we take all the people who lived throughout the entire duration of time, multiplied by the number of grains of sand on Earth, multiplied by the number of atoms in the universe. Given the unlimited possibilities, it is remarkable that the model can so easily create functional enzymes.”
As noted, lysozymes are an example of food ingredients (Salesforce started with egg proteins) and the use of artificial intelligence to design new synthetic protein alternatives for processed food markets fits perfectly with the ambitions of investors who claim that high-tech artificial proteins can be a profitable “climate solution” (replacing animal proteins) or a solution for biodiversity (replacing the environmentally harmful collection of proteins from palm oil or fish oil). There are many problems with these simplistic green claims, but it is worth noting that one of the factors that really drives these new “alternative protein markets” is the commercial opportunities opened up by such protein engineering tools – including artificial intelligence. Alt-protein enthusiasts, such as George Monbiot of the Guardian, like to refer to this type of synthetic bioproduction as “precision fermentation.” The “precision” part of this shorthand term is mostly promotional, but it also increasingly refers to AI-based design.
Food ingredients are just one piece of the future market for artificial proteins that companies like Salesforce or Gingko are pursuing. (Gingko has its own food ingredient company with alt-proteins called Motif Foodworks). Syn Bio companies are also developing artificial alt-proteins as coatings, sweeteners, pharmaceuticals, etc. – including many uses that would involve release into the environment or consumption by humans and animals of these new protein entities.

Planetary boundaries for new entities
This new ability of artificial intelligence to generate an ever-wider range of new proteins at an accelerating pace for industrial use should be considered a potentially significant industrial shift in production standards. This is a transition that could have enormous impacts on health, economies, and biodiversity in the long term, as a greater variety of artificial proteins enters the market, our bodies, and the biosphere. This should raise high demands regarding monitoring, assessment, and reporting, as well as the need to develop systems for recall, cleanup, and liability in case problems arise. None of these infrastructures currently exist. A historical point of comparison could be the emergence of synthetic chemistry techniques and the creation of the accompanying synthetic chemical industry based on petrochemistry in the late 19th and early 20th centuries, which arose from new “pyrolysis” techniques for hydrocarbons. The production of a range of commercially valuable new chemical molecules before adequate oversight and regulation led to the rapid spread of thousands of different synthetic chemical substances in the biosphere long before substantive laws for toxic substances were established. Many of these synthetic chemical substances are now subject to complex and difficult global cleanup or mitigation efforts due to the unexpected biological and health impacts of synthetic compounds interacting with the natural world. It is estimated that today there are between 140,000 and 350,000 different types of industrial chemicals released into the biosphere at approximately 220 billion tons annually, and that the US alone adds about 15,000 new synthetic chemical substances to the registry every year. Most of these are new to nature, and many are toxic at some concentration. In early 2022, scientists reported that humanity had breached the safe “planetary boundary” for new chemical entities in the biosphere.
Consider now the prospect of unleashing a new synthetic protein industry, one that will be backed by massive speculative capital aiming to artificially produce a range of unprecedented proteins for rapid returns on investment. Once again, this industry seeks to realize profits in the market before deliberate international discussion and the establishment of regulations. This should raise major red flags. The fact that this release is heavily laden with the current investment marketing campaign for artificial intelligence is doubly concerning. It should be recalled that proteins have been described as complex nanomachines, whose interactions govern most life processes at the molecular level. Synthetic proteins, as a category of complex molecules, may therefore be more likely to be biologically active (and disruptive) than simple synthetic chemical compounds—in fact, they may be deliberately designed for industrial purposes to accelerate, slow down, transform, or otherwise alter molecular biological processes at the basis of life—thus requiring more complex safety assessments. Researchers have observed, for example, that synthetically constructed proteins appear more stable than naturally developed proteins, which could raise comparisons to the resistance problems of certain categories of synthetic chemical substances.
The immense challenge of addressing the negative impacts of countless and inadequately understood synthetic chemical substances led, for the first time, to the establishment of the Precautionary Principle in environmental governance. This principle broadly states that it is appropriate and prudent to take timely action to prevent, regulate, and control an emerging threat, even before we have all the data to determine its exact nature. The precautionary approach is enshrined in the preamble of the Convention on Biological Diversity, as well as in the first objective (Article 1) of the Cartagena Protocol on Biosafety, and also constitutes Principle 15 of the Rio Declaration from the 1992 Earth Summit. The precautionary principle was specifically shaped to prevent the widespread application of technological developments causing major disruption before proper oversight and governance. This time, we have the opportunity to apply it before the number of new protein entities entering the biosphere begins to mimic the toxic trajectory of synthetic chemical substances. If synthetic proteins become a rapidly developing, structurally diverse, and widely distributed category of new synthetic entities, they will enter the biosphere and demand new forms of biosafety assessment and oversight. If unchecked, this will further exacerbate the transgression of planetary boundaries for new entities, adding both new protein entities and new genomic entities to the biospheric load.

Text to protein
Even more concerning is that the de novo industrial production of proteins via Syn Bio, created by artificial intelligence, could become more rapidly widespread, automated, and difficult to manage. The spread of industrial chemistry was somewhat slowed by the need for large and expensive production facilities. This broader distribution of syn-bio entities could result from new AI protein engineering tools. Just as ChatGPT quickly enabled millions of ordinary users with a simple web browser to input natural language text descriptions in order to generate synthetic media, new foundational models are being developed for natural language text-to-protein production. In a system like ProteinDT, which self-identifies as a “text-guided protein design framework,” the user can write in natural language (such as English) the general characteristics they want to see in a synthetic protein (e.g., high thermal stability or luminescence). The artificial intelligence generative model will then create multiple viable synthetic protein sequences. These can be selected and created from synthetic RNA chains (e.g., expressed by a modified microbe or in a cell-free system). The equipment for converting these designs into reality is becoming increasingly widespread on its own.
This “text-to-protein” model could make oversight even more difficult. For example, a document on text-to-protein production acknowledges that “although text-based protein design has many potential positive applications in agriculture, biotechnology, and therapeutics, it can be considered dual-use technology. Like generative models for small molecules, ProteinDT could be applied to create toxic or harmful protein sequences. Although acting on these designs would require a laboratory, synthesizing custom amino acid sequences is usually straightforward compared to synthesizing new small molecules.” The document also notes that their own model allows for the creation of poisonous and dangerous proteins and that “future efforts to expand the training dataset and modeling improvements could increase the dual-use risk.”
It is not necessary to bio-synthesize snake venom for genetically engineered products to harm human welfare. Synthetic biology companies, such as Gingko, claim that by enabling the rapid design of new materials, it will be possible to replace existing petrochemical-based production with faster and lighter biological production methods. Replacing chemicals derived from oil may indeed be an outcome, but it will only be one of many commercial incentives driving the technology. Other commercial entities may attempt to replace precious natural products currently cultivated by small farmers or displace products sourced from forests or oceans—changing land and ocean usage patterns and affecting the livelihoods of farmers and fishermen.
We have already seen that the initial commercial targets for synthetic biology production were precisely these high-value natural flavors, fragrances, cosmetic ingredients, oils, spices, and textiles that are cultivated, collected, and managed by small farmers and indigenous populations. When I worked for the ETC Group, my former colleagues and I spent years documenting examples of synthetic biology companies that biosynthesize (or as some now say “precision ferment”) natural products. These included vanilla, stevia, silk, saffron, artemisinin, coconut oil, orange oil, and many other significant, culturally sensitive goods on which the world’s most vulnerable people depend for their livelihoods and cultural practices. In the report “Synthetic Biology, Biodiversity & Farmers” [https://www.etcgroup.org/content/synthetic-biology-biodiversity-farmers] that we published in 2016, which highlights 13 case studies of natural products that could be disrupted by synthetic bio-production – which could also have serious implications for biodiversity conservation, since small farmers and indigenous communities often form the basis of on-the-ground community conservation efforts. We found that of the 200 to 250 different botanical crops used by the food and flavor industry, 95% of these come from small-scale farmers and agricultural workers mainly in the global South. Overall it is estimated that 20 million small-scale farmers and growers depend on botanical crops cultivated for natural flavors and fragrances, and this is a conservative estimate. The commercial associations of the flavors and fragrances industry themselves acknowledge that these botanicals are “extremely important in terms of their socioeconomic impact on rural populations and may also have significant environmental benefits within agricultural systems.”
In 2016, when I was researching these biological applications, I observed more and more that the disruption of traditional natural products did not simply come from tanks of fermented microbes that produce alternative compounds. It also came from new industrial enzymes that could convert a low-value product into something else of high value. For example, a key goal of synthetic biology companies at the time was trying to direct genetically modified yeast, fed with sugar, to produce large quantities of sweet compounds called rebaudiosides found in stevia. The most commercial stevia has a high content of a rebaudioside called Reb A, which gives natural stevia its characteristic metallic taste as a sweetener component. However, there are much smaller quantities of Reb D and Reb M, which are much sweeter but exist only in small amounts in the leaves of natural stevia. While some companies, such as Cargill and Evolva, were actively engineering yeasts to produce quantities of Reb D and Reb M in fermentation tanks, others produced a genetically modified enzymatic protein that could convert the large quantities of Reb A already being cultivated in fields into Reb M. At first glance, this enzymatic “bioconversion” might seem like a good thing: stevia farmers could continue to grow stevia leaves, but consumers would end up with a sweeter product. This was certainly the argument put forward by the leading Stevia Pure Circle company regarding the mechanical enzyme approach.
But the same enzymatic approach can also be used to design engineered enzymes and proteins that completely convert unrelated raw materials into high-value components. For example, Ambrosia Bio, a synthetic biology company, collaborates with Gingko Bioworks. Its AI protein design tools create engineered enzymes that convert low-cost sugars and starches into allulose, a low-calorie sweetener that competes with stevia, which is usually found in small quantities in figs, raisins, and maple syrup. Allulose has the unique property of not raising blood glucose and insulin levels, while its taste is very similar to sugar. An analyst report provocatively suggests that the allulose industry is likely to be “the fastest-growing industry by 2031.” In reality, Ambrosia Bio is simply replicating what most allulose producers are already doing—including sugar giant Tate and Lyle. They also use genetically modified enzymes to convert corn and wheat starch into this new low-calorie super ingredient. In another reality, allulose could be an opportunity for raisin, fig, and maple producers to create new low-calorie sweetener markets. Instead, the production of syn-bio proteins means it has been entirely captured by large agricultural and biotech interests.
AI Piracy
But the loss of these economic opportunities [for local communities] is not only about artificial intelligence and genomic innovation. It is also based on a massive theft of genomic data. Remember those hundreds of millions of genomes and millions of protein sequences on which Nvidia, Salesforce, Gingko and others trained their large language models? They were not collected by Nvidia or Salesforce through mutual agreements with farmers, indigenous peoples and so forth. It is the result of decades of surreptitious bio-exploration and bio-piracy – sampling and sequencing genomic material from thousands of locations and communities and uploading these digital sequences to databases.
Biological materials constitute the common heritage of humanity and originate from specific communities that have managed, developed, protected, nurtured, and co-evolved with biological diversity, and which can be considered to have inherent rights and relationships regarding the use of these genetic resources. Over the past quarter of a century, indigenous movements, farmers, and Southern governments have waged continuous battles through the corridors of the Convention on Biological Diversity, the Food and Agriculture Organization, the World Intellectual Property Organization, the World Health Organization, and other bodies to assert their inherent rights over genetic material and to stop the theft of genetic material by biotechnology interests. This battle has given rise to a global “Seed Treaty” for plant genetic resources and the Nagoya Protocol on access and benefit-sharing of genetic resources. In recent years, the existence of large digital genomic databases has elevated this battle into a North versus South battle for the fair and equitable governance of Digital Sequence Information (DSI) of genetic resources—that is, the digital DNA sequences themselves upon which artificial intelligence giants now base their business plans.
Because it is certain that at no point, when bioprospectors or academic biotechnology researchers took DNA sequences from Southern communities, was the local community ever asked how they felt about converting their genetic heritage into digital code, uploading it into large language models, and having it used by generative AI to construct synthetic alternatives that could be sold for private profit. This prerequisite act of free and informed consent simply never occurred. However, it is the common resources taken from these communities that are now being used to fuel the anticipated windfall profits of AI technology companies. Highlighting AI-driven automated genetic engineering as an industrial opportunity is the culmination of why biotechnology companies and industrial nations fought so hard against granting communities rights over their own traditional biological resources.
There is an almost exact parallel with the legal and ethical battles that are not erupting around the commercial art produced by AI. Text and videos based on stolen, unpaid work by real artists, writers, actors, and others. When Dall-E or MidJourney “produces” a new work of art via AI – what it actually does is mix elements from images of millions of existing artworks that have been fed into the data used by machine learning. However, no attribution or commercial compensation is made to those whose artistic work is being exploited, nor are the original artists whose work has been stolen, used, mixed, and incorporated into the new “work of art” of artificial intelligence even acknowledged.
[…]There are now online tools, such as haveibeentrained.com, that allow artists to search the largest artificial intelligence data sets to see if their works are included in them and to insist on their removal. However, there is no similar process for communities to investigate DNA data sets to see if their stolen genetic material is being used by artificial intelligence to create synthetic biology organisms, proteins, and other commercial products. Existing processes under the Convention on Biological Diversity and the Seed Treaty must urgently account for the impacts of generative AI on the already bitter politics of digital sequence information—in order to defend the rights and economies of the world’s poorest and most vulnerable people. For example, this month [January ‘24], a process is launching under the Convention on Biological Diversity to create a multilateral mechanism for sharing the benefits from the use of digital sequence information (DSI) for genetic resources. If this mechanism does not address the concentration and use of digital DNA by artificial intelligence models (and does not require DNA-AI companies and artificial intelligence protein producers to pay), it will have completely missed the point.
Despite the daily plethora of stories about artificial intelligence, media coverage and analysis regarding the convergence of AI/Syn-bio is still quite thin. Where it does occur, it is usually the business and investment press that focuses on the economic promises and hopes of visionary founders of new ventures and their smart algorithmic science. But ultimately most people are looking in the wrong place. The real impact of the convergence of artificial life with artificial intelligence will be felt elsewhere, in the fields of farmers in the global South or even in the agricultural production areas of the North. If we reach the point where synthetic biology companies or their industrial customers can reliably type “design me a protein like silk” into a text-to-protein command line as easily as graphic designers can now type “draw me a picture of silk” into Midjourney, then global production economies may shift harmfully and even further from justice.
Original title: The Artificial Intelligence / Artificial Life convergence. Part 1: When AI Bots do Genetic Engineering, 11/01/2024
Author: Jim Thomas
Source: https://www.scanthehorizon.org/p/dnai-the-artificial-intelligence
Translation: Harry Tuttle
- For more: Synthetic biology and molecular programming: cells as a chain of (re)production, Cyborg no 5, February 2016 – Synthetic biology, the genealogy: life is cheap!, Cyborg no 20, February 2021. ↩︎
