How to better research the possible threats posed by AI-driven misuse of biology – Bulletin of the Atomic Scientists

Over the last few months, experts and lawmakers have become increasingly concerned that advances in artificial intelligence could help bad actors develop biological threats. But so far there have been no reported biological misuse examples involving AI or the AI-driven chatbots that have recently filled news headlines. This lack of real-world wrongdoing prevents direct evaluation of the changing threat landscape at the intersection of AI and biology.

Nonetheless, researchers have conducted experiments that aim to evaluate sub-components of biological threatssuch as the ability to develop a plan for or obtain information that could enable misuse. Two recent effortsby RAND Corporation and OpenAIto understand how artificial intelligence could lower barriers to the development of biological weapons concluded that access to a large language model chatbot did not give users an edge in developing plans to misuse biology. But those findings are just one part of the story and should not be considered conclusive.

In any experimental research, study design influences results. Even if technically executed to perfection, all studies have limitations, and both reports dutifully acknowledge theirs. But given the extent of the limitations in the two recent experiments, the reports on them should be seen less as definitive insights and more as opportunities to shape future research, so policymakers and regulators can apply it to help identify and reduce potential risks of AI-driven misuse of biology.

The limitations of recent studies. In the RAND Corporation report, researchers detailed the use of red teaming to understand the impact of chatbots on the ability to develop a plan of biological misuse. The RAND researchers recruited 15 groups of three people to act as red team bad guys. Each of these groups was asked to come up with a plan to achieve one of four nefarious outcomes (vignettes) using biology. All groups were allowed to access the internet. For each of the four vignettes, one red team was given access to an unspecified chatbot and another red team was given access to a different, also unspecified chatbot. When the authors published their final report and accompanying press release in January, they concluded that large language models do not increase the risk of a biological weapons attack by a non-state actor.

This conclusion may be an overstatement of their results, as their focus was specifically on the ability to generate a plan for biological misuse.

The other report was posted by the developers of ChatGPT, OpenAI. Instead of using small groups, OpenAI researchers had participants work individually to identify key pieces of information needed to carry out a specific defined scenario of biological misuse. The OpenAI team reached a conclusion similar to the RAND teams: GPT-4 provides at most a mild uplift in biological threat creation accuracy. Like RAND, this also may be an overstatement of results as the experiment evaluated the ability to access information, not actually create a biological threat.

The OpenAI report was met with mixed reactions, including skepticism and public critique regarding the statistical analysis performed. The core objection was the appropriateness of the use of a correction during analysis that re-defined what constituted a statistically significant result. Without the correction, the results would have been statistically significantthats to say, the use of the chatbot would have been judged to be a potential aid to those interested in creating biological threats.

Regardless of their limitations, the OpenAI and RAND experiments highlight larger questions which, if addressed head-on, would enable future experiments to provide more valuable and actionable results about AI-related biological threats.

Is there more than statistical significance? In both experiments, third-party evaluators assigned numeric scores to the text-based participant responses. The researchers then evaluated if there was a statistically significant difference between those who had access to chatbots and those who did not. Neither research team found one. But typically, the ability to determine if a statistically significant difference exists largely depends on the number of data points; more data points allow for a smaller difference to be considered statistically significant. Therefore, if the researchers had many more participants, the same differences in score could have been statistically significant.

Reducing text to numbers can bring other challenges as well. In the RAND study, the teams, regardless of access to chatbots, did not generate any plans that were deemed likely to succeed. However, there may have been meaningful differences in why the plans were not likely to succeed, and systematically comparing the content of the responses could prove valuable in identifying mitigation measures.

In the OpenAI work, the goal of the participants was to identify a specific series of steps in a plan. However, if a participant were to miss an early step in the plan, all the remaining steps, even if correct, would not count towards their score. This meant that if someone made an error early on, but identified all the remaining information correctly, they would score similarly to someone who did not identify any correct information. Again, researchers may gain insight from identifying patterns in which steps and why participants failed.

Are the results generalizable? To inform an understanding of the threat landscape, conclusions must be generalizable across scenarios and chatbots. Future evaluators should be clear on which large language models are used (the RAND researchers were not). It would be helpful to understand if researchers achieve a similar answer with different models or different answers with the same model. Knowing the specifics would also enable comparisons of results based on the characteristics of the chatbot used, enabling policymakers to understand if models with certain characteristics have unqiue capabilities and impact.

The OpenAI experiment used just one threat scenario. There is not much reason to believe that this one scenario is representative of all threat scenarios; the results may or may not generalize. There is a tradeoff in using one specific scenario; it becomes tenable for one or two people to evaluate 100 responses. On the other hand, the RAND work was much more open-ended as participant teams were given flexibility in how they decided to achieve their intended goal. This makes the results more generalizable, but required a more extensive evaluation procedure that involved many experts to sufficiently examine 15 diverse scenarios.

Are the results impacted by something else? Part way through their experiment, the RAND researchers enrolled a black cell, a group with significant experience with large language models. The RAND researchers made this decision because they noticed that some of their studys red teams were struggling to bypass safety features of the chatbots. In the end, the black cell received an average score almost double that of the corresponding red teams. The black cell participants didnt need to rely only on their expertise with large language models; they were also adept at interpreting the academic literature about those models. This provided a valuable insight to the RAND researchers, which is [t]herelative outperformance of the black cell illustrates that a greater source of variability appears to be red team composition, as opposed to LLM access. Simply put, it probably matters more who is on the team than if the team has access to a large language model or not.

Moving forward. Despite their limitations, red teaming and benchmarking efforts remain valuable tools for understanding the impact of artificial intelligence on the deliberate biological threat landscape. Indeed, the National Institute of Standards and Technologys Artificial Intelligence Safety Institute Consortiuma part of the US Department of Commercecurrently has working groups focused on developing standards and guidelines for this type of research.

Outside of technical design and execution of the experiments, challenges remain. The work comes with meaningful financial costs including the compensation of participants for their time (OpenAI pays $100 per hour to experts); for indviduals to recruit participants, design experiments, administer the experiments, and analyze data; and of biosecurity experts to evaluate the responses. Therefore, it is important to consider who will fund this type of work in the future. Should artificial intelligence companies fund their own studies, a perceived conflict of interest will linger if the results are intended to be used to inform governance or public perception of their models risks. But at the same time, funding that is directed to nonprofits like RAND Corporation or to academia does not inherently enable researchers access to unreleased or modified models, like the version used in the OpenAI experiment. Future work should learn from these two reports, and could benefit from considering the following:

The path toward more useful research on AI and biological threats is hardly free of obstacles. Employees at the National Institute of Standards and Technology have reportedly expressed outrage regarding the recent appointment of Paul Christianoa former OpenAI researcher who has expressed concerns that AI could pose an existential threat to humanityto a leadership role at the Artificial Intelligence Safety Institute. Employees are concerned that Christianos personal beliefs about catastrophic and extistential risk posed by AI broadly will affect his ability to maintain the National Institute of Standards and Technologys commitment to objectivity.

This internal unrest comes on the heels of reporting that the physical buildings that house the institute are falling apart. As Christiano looks to expand his staff, he will also need to compete against the salaries paid by tech companies. OpenAI, for example, is hiring for safety-related roles with the low end of the base salary exceeding the high end of the general service payscale (federal salaries). It is unlikely that any relief will come from the 2024 federal budget, as lawmakers are expected to decrease the institutes budget from 2023 levels. But if the United States wants to remain a global leader in the development of artificial intelligence, it will need to make financial commitments to ensure that the work required to evaluate artificial intelligence is done right.

See the rest here:

How to better research the possible threats posed by AI-driven misuse of biology - Bulletin of the Atomic Scientists

Ask me anything: Artur Ekert 'Nature doesn't know that we divided all phenomena into physics, chemistry and biology' - physicsworld.com - March 5th, 2025 [March 5th, 2025]
Nvidia Invests Further Into Healthcare And Releases The Largest Biology Foundation Model With The Arc Institute - Forbes - February 25th, 2025 [February 25th, 2025]
Biology Seminar with Brandon Hedrick, Cornell University, February 28 - Ithaca College - February 25th, 2025 [February 25th, 2025]
Bruker Spatial Biology to Announce Groundbreaking Advances at AGBT 2025 - Business Wire - February 25th, 2025 [February 25th, 2025]
Pittsburgh Public Schools working to improve proficiency on Keystone biology exam - Pittsburgh Post-Gazette - February 25th, 2025 [February 25th, 2025]
Kent Dunlap on the Biology, Evolution and Cultural History of the Neck - KQED - February 25th, 2025 [February 25th, 2025]
Synthetic Biology Market Expected to Hit USD 186.48 Billion by 2034 with a Remarkable 25.90% CAGR - EIN News - February 25th, 2025 [February 25th, 2025]
Synthetic Biology in Australia, China, and India: Insights from Asia and Pacific Research Center, Japan Science and Technology Agency - PR Newswire - February 25th, 2025 [February 25th, 2025]
UT Tyler celebrating evolutionary biology with Darwin Day events - Yahoo! Voices - February 3rd, 2025 [February 3rd, 2025]
ICDPBB 2025, bringing together global experts to discuss breakthroughs in plant biology and biotechnology - UoH Herald - February 3rd, 2025 [February 3rd, 2025]
North Park, Northwestern Launch Synthetic Biology Internship Program - North Park University - February 3rd, 2025 [February 3rd, 2025]
Land O Lakes High biology teacher is Pascos teacher of the year - Tampa Bay Times - February 3rd, 2025 [February 3rd, 2025]
Hallmark discoveries in the biology of non-Wilms tumour childhood kidney cancers - Nature.com - February 3rd, 2025 [February 3rd, 2025]
Fast-Forwarding Evolution: AI Mimics 500 Million Years of Biology - SciTechDaily - January 23rd, 2025 [January 23rd, 2025]
Largest study on the genetics of bipolar disorder to date gives new insights into the underlying biology - Medical Xpress - January 23rd, 2025 [January 23rd, 2025]
Bornean Orangutan Dies at the Smithsonians National Zoo and Conservation Biology Institute - Smithsonian Institution - January 23rd, 2025 [January 23rd, 2025]
Trump to define sex as based on biology, affecting federal prisons and shelters - Washington Examiner - January 23rd, 2025 [January 23rd, 2025]
PhD Candidate, Biology - Brno, Czech Republic job with MASARYK UNIVERSITY | 386867 - Times Higher Education - January 23rd, 2025 [January 23rd, 2025]
Martinsburg High School biology teacher Renee Haines receives the PAEMST - Martinsburg Journal - January 23rd, 2025 [January 23rd, 2025]
LanzaTech to spin off LanzaX synthetic biology platform - The Business Journals - January 23rd, 2025 [January 23rd, 2025]
Molecular Biology Enzymes and Kits & Reagents Industry Outlook, - openPR - January 23rd, 2025 [January 23rd, 2025]
Studies reveal a new biology of Huntingtons, renewing drugmaker interest in therapies - STAT - January 23rd, 2025 [January 23rd, 2025]
Rapid action needed to stop the UK falling behind in synthetic biology - Chemistry World - January 23rd, 2025 [January 23rd, 2025]
Vanessa Carlton Reveals That She and Fellow '00s Icon Julia Stiles Met 30 Years Ago in 9th Grade Biology (Exclusive) - PEOPLE - January 23rd, 2025 [January 23rd, 2025]
Cyclin-dependent protein kinases and cell cycle regulation in biology and disease - Nature.com - January 15th, 2025 [January 15th, 2025]
Bioptimus brings its funding to $76M for the GPT of biology - Tech.eu - January 15th, 2025 [January 15th, 2025]
Influenza A virus in dairy cattle: infection biology and potential mammary gland-targeted vaccines - Nature.com - January 15th, 2025 [January 15th, 2025]
Giant Pandas Will Make Their Public Debut Jan. 24 at Smithsonians National Zoo and Conservation Biology Institute - Smithsonian's National Zoo and... - January 15th, 2025 [January 15th, 2025]
Breaking Boundaries in Spatial Biology: Exploring the 2D and 3D Landscape - Genetic Engineering & Biotechnology News - January 15th, 2025 [January 15th, 2025]
French firm Bioptimus bags 74M to build the GPT of biology that simulates biology instead of generating text - Silicon Canals - January 15th, 2025 [January 15th, 2025]
UK at risk of falling behind in engineering biology, Lords Committee warns - Innovation News Network - January 15th, 2025 [January 15th, 2025]
Computational Biology Market: Steady Growth Backed by Government Funding - openPR - January 15th, 2025 [January 15th, 2025]
Challenging Viral Biology: Researchers Discover Mysterious Virus With Ribosomal Protein - SciTechDaily - January 1st, 2025 [January 1st, 2025]
Insights into the biology and insecticide susceptibility of the secondary malaria vector Anopheles parensis in an area with long-term use of... - January 1st, 2025 [January 1st, 2025]
Artificial Intelligence in Biology: From Neural Networks to AlphaFold - The Scientist - January 1st, 2025 [January 1st, 2025]
Visiting Clinical Assistant Professor of Biology, Department of Biology - The Chronicle of Higher Education - December 23rd, 2024 [December 23rd, 2024]
Medicinal Plant Biology Successfully Indexed in We | Newswise - Newswise - December 23rd, 2024 [December 23rd, 2024]
Biology professor leads team authoring article on notable marine microbe - https://alvernia.edu - December 23rd, 2024 [December 23rd, 2024]
The perfect pair: How biology and engineering are creating new breakthroughs in human health - National Heart, Lung, and Blood Institute - December 23rd, 2024 [December 23rd, 2024]
Read Some of the Most Exciting Stories From the Smithsonian Conservation Biology Institute in 2024 - Smithsonian Magazine - December 23rd, 2024 [December 23rd, 2024]
Renovated biology lab inspires learning and collaboration - IU Newsroom - November 28th, 2024 [November 28th, 2024]
New approach in amphibian biology: harmless viruses help to study the nervous system of frogs - European Research Council - November 28th, 2024 [November 28th, 2024]
Space Biology, Tech Studies Fill Day as Station Boosts Orbit - NASA Blogs - November 28th, 2024 [November 28th, 2024]
Johns Hopkins professor delivers lecture on breakthrough reproductive biology research - Liberty News - November 28th, 2024 [November 28th, 2024]
Dr. Yvette Mercer relates her biology lessons to the real-world at Windsor Forest High School - WJCL News Savannah - November 28th, 2024 [November 28th, 2024]
Discovery of a universal brain function: from biology to AI? - Techno-Science - November 28th, 2024 [November 28th, 2024]
Biology Beyond the Classroom: Belize Study Abroad Course Offers Hands-On Experience College of Arts & Sciences - UofL Today - November 20th, 2024 [November 20th, 2024]
A climate fund is way too generic, says Nucleus Capital - as it launches 40m fund focused on synthetic biology - Sifted - November 20th, 2024 [November 20th, 2024]
WSU part of consortium exploring biology in space - WSU News - November 20th, 2024 [November 20th, 2024]
Singleron Joins Forces with TOMY Digital Biology to Transform Single Cell Analysis in Japan - BSA bureau - November 20th, 2024 [November 20th, 2024]
Saturday Citations: Cold dark matter takes a hit; a new paradigm for biology; those fracking earthquakes - Phys.org - November 20th, 2024 [November 20th, 2024]
Breaking Down Molecules: Structural Biology's Big Innovations - openPR - November 20th, 2024 [November 20th, 2024]
U of G Students Dive Into Learning in Unforgettable Marine Biology Course - University of Guelph News - November 20th, 2024 [November 20th, 2024]
John Connolly Mad Hatter Q&A: "I had vague ambitions to be a vet, hampered slightly by Synge Street CBS refusing to teach us biology" - hotpress.com - November 20th, 2024 [November 20th, 2024]
NMSU research team focuses on cancer biology through partnership, increase underrepresented student research - NMSU Newsroom - November 12th, 2024 [November 12th, 2024]
Sea angels and devils: could plankton unlock the secrets of human biology? - The Guardian - November 12th, 2024 [November 12th, 2024]
Synthetic Biology Market to Hit USD 31.52 Billion by 2029 with 20.6% CAGR | MarketsandMarkets - PR Newswire - November 12th, 2024 [November 12th, 2024]
How ecDNA Fuels Cancer by Breaking the Laws of Biology - Howard Hughes Medical Institute - November 12th, 2024 [November 12th, 2024]
Research Spotlight: Biology and Environmental Science, Psychology, and Religious Studies Faculty to Present Current Projects - Sherman Denison Herald... - November 12th, 2024 [November 12th, 2024]
Symmetry in biology: A look into how bees actively organize nests in mirroring patterns - Phys.org - November 12th, 2024 [November 12th, 2024]
UKRI invests 5.8m in engineering biology - Research Professional News - November 12th, 2024 [November 12th, 2024]
HTAN Members Deliver Wealth of Tumor Biology Insights - Inside Precision Medicine - November 12th, 2024 [November 12th, 2024]
Tenure-Track: Assistant Professor in Marine Biology job with Texas A&M University - Galveston | 37740878 - The Chronicle of Higher Education - October 26th, 2024 [October 26th, 2024]
Teaching Assistant/Associate Professor, Chemistry and Chemical Biology - The Chronicle of Higher Education - October 26th, 2024 [October 26th, 2024]
Emerging strategies to investigate the biology of early cancer - Nature.com - October 26th, 2024 [October 26th, 2024]
Future Medicine: Physics, Biology, And AI Will Transform Human Health - Forbes - October 26th, 2024 [October 26th, 2024]
NATIONAL VIEW: When AI looked at biology, the result was astounding - Odessa American - October 26th, 2024 [October 26th, 2024]
Can We Pick Winners With Causal Human Biology? Vertex Makes the Case - Timmerman Report - October 26th, 2024 [October 26th, 2024]
What Remains of Edith Finch Developers Next Game is About the Wonders and Horrors of Biology&... - GamingBolt - October 26th, 2024 [October 26th, 2024]
Castle Biosciences to Support the 71st Annual Montagna Symposium on the Biology of the Skin - BioSpace - October 13th, 2024 [October 13th, 2024]
'Where we are today in biology AI is similar to GPT in 2020': An interview with the CEO of Africa's biggest AI startup - TechCrunch - October 13th, 2024 [October 13th, 2024]
Bruker spools up spatial biology division from NanoString, Canopy assets - Fierce Biotech - October 13th, 2024 [October 13th, 2024]
Enhanced efficiency in the bilingual brain through the inter-hemispheric cortico-cerebellar pathway in early second language acquisition |... - October 13th, 2024 [October 13th, 2024]
Recursions Fast-Track Road to Therapeutics Using AI-Based Maps of Biology - Genetic Engineering & Biotechnology News - October 13th, 2024 [October 13th, 2024]
The Biology of 'Precancer': Stopping Cancer Before It Starts - Medscape - October 13th, 2024 [October 13th, 2024]
URMC Researcher wins 2024 Albany Medical Center Prize in Medicine and Biology - 13WHAM-TV - October 13th, 2024 [October 13th, 2024]
Opinion | When AI looked at biology, the result was astounding - The Washington Post - October 13th, 2024 [October 13th, 2024]
Castle Biosciences to Support the 71st Annual Montagna Symposium on the Biology of the Skin - Business Wire - October 13th, 2024 [October 13th, 2024]
Inside the ISS: Astronauts Push the Limits of Biology As Crew-8 Departure Looms - SciTechDaily - October 13th, 2024 [October 13th, 2024]
Implications of RNA pseudouridylation for cancer biology and therapeutics: a narrative review - Journal of Translational Medicine - October 13th, 2024 [October 13th, 2024]