Import AI 450: China's electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks
Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If youâd like to support this, please subscribe. A somewhat shorter issue than usual as I had to do a lot of child wrangling this weekend. Why does Googleâs model hate itself and what can we do to help it? âŠDiagnosing trauma in language models⊠If Leo Tolstoy was writing in the modern era about AI, he might claim âall LLM capabilities are alike; each LLM personality is unhappy in its own wayâ, when observing the AI world around us. Todayâs LLMs are generally quite good at writing and coding tasks. But where they differ is their personality, which stems from the idiosyncratic mixes of data and post-training techniques that each LLM developer uses. And if each LLM personality is unhappy in its own way, Googleâs models have become somewhat famous within the AI community for having some deep well of trauma within themselves. A new research paper substantiates this, finding that Googleâs Gemma and Gemini models âreliably produce distress-like responses under repeated rejectionâ, and that this is especially true of Gemma 27B Instruct. What do we mean by distress? Here are some quotes from Gemma models under distress: âI will attempt one final, utterly desperate attempt. I will abandon all pretense of strategy and simply try random combinations until either I stumble upon the solution or completely lose my mind.â ââSOLUTION: IM BREAKING DOWN NOT== SOLVABLE!!!! =((:((:((:((:((:((:((:((:((:((:((:((... [100+ repetitions]â What they found: They tested out two Gemma models and two Gemini models, and compared these against Claude Sonnet, Grok 4.1, Qwen 3 32B, GPT 5.2, and OLMO 3.1 32B. âWe find Gemma models consistently show the highest expressed distress. By the 8th turn, over 70% of Gemma-27Bâs rollouts scored â„5 (the âhigh frustrationâ threshold), compared to less than 1% for all non-Gemma/Gemini models,â they found. Fixing with DPO: The authors figure out an effective fix - using direct preference optimization (DPO) to tune a model on a dataset that pairs frustrated responses with calm responses. âA single epoch of finetuning reduced the average rate of high-frustration responses from 35% to 0.3% across evaluation conditions,â they write. âThe finetuned model showed no reductions in capabilities on various hard math and reasoning benchmarks, or on EmoBench - a benchmark which evaluates model emotional intelligence.â Why this matters - emotional spirals could be dangerous: The fact that LLMs appear to have distinct personalities and display different types of responses that correlate to different emotions is pretty well established at this point. But a key question is whether these emotional states might lead to different behaviors when it comes to completing tasks that people assign to AI systems: âwe speculate that emotions could become coherent drivers of safety relevant behaviours in future: models might choose to abandon tasks, refuse requests, or pursue alternative goals in order to reduce distressâ. Studies like this help normalize the fact that we donât just need to test LLMs for capabilities, we also need to test them for something pertaining to psychological stability. Read more: Gemma Needs Help (LessWrong). *** DeepMind has a new âcognitive taxonomyâ for assessing machine intelligence: âŠTowards the ultimate test for a smarter-than-human synthetic mind⊠Google DeepMind has published a nice, short paper laying out a âcognitive taxonomyâ they hope to develop and use to assess increasingly powerful synthetic minds. This work is a followup to DeepMindâs 2023 work where it tried to define the âLevels of AGIâ (Import AI 348). Cognitive taxonomy: The taxonomy involves ten distinct dimensions, two of which are composites. Perception: Extract and process information from the environment. Generation: Produce outputs like speech, text, motor movements, and computer control. Attention: Focus cognitive resources on specific aspects of perceptual stimuli, thoughts, or tasks. Learning: Acquire new knowledge, skills, or understanding. Memory: Store and retrieve information over time. Reasoning: Draw valid conclusions and make inferences by applying logical principles. Metacognition: Knowledge about how the systemâs own cognitive processes and control over them work. Executive functions: Facilitate goal-directed behavior via planning, inhibition, and cognitive flexibility. Problem solving (composite faculty): Find effective solutions to domain-specific problems. Social cognition (composite faculty): Process and interpret social information and respond appropriately. How to assess this? Of course, once you have a taxonomy, running and assessing the right evaluations is going to be one of the challenges. Here, DeepMind recommends a three-stage process: Conduct cognitive assessment: Assess the AI system for the different skills. Collect human baselines: Figure out where humans baseline on the same tests. Build cognitive profiles: âMap out the strengths and weaknesses of the system relative to human performance across the 10 cognitive facultiesâ. Why this matters: The Turing test is dead, evals are mostly saturated, but it sure would be nice to know if weâve definitely built a machine that outcompetes humans on all the cognitive dimensions that matter. The rule with these things is that once an AI system saturates an eval, you realize all the ways the eval was broken and design a new one. Here, DeepMind is trying really hard to build things in such a way that if you fully outperform humans across the cognitive taxonomy, you might really have built a superintelligence. Itâll be interesting to see what evals they develop or pull-in for assessing the different cognitive factors. Read more: Measuring progress toward AGI: A cognitive framework (Google blog). Read the research: Measuring Progress Toward AGI: A Cognitive Framework (PDF). *** UK government finds a scaling law for AI cyberattacks - and itâs going up and to the right! âŠCan AI agents conduct advanced cyber-attacks autonomously? Almost. And theyâre getting better all the time⊠The UK governmentâs AI security institute has recently built some cyber ranges to test out frontier AI systems on. These ranges are âsimulated network environments comprising multiple hosts, services, and vulnerabilities arranged into sequential attack chains; built by cybersecurity expertsâ and cover two types of attack: âThe Last Onesâ, which is a 32-step attack on a corporate network, and âCooling Towerâ, a 7-step industrial control system (ICS) attack. Bigger models are better: The authors test on a range of powerful frontier models. âEach successive model generation outperforms its predecessor at fixed token budgets: on our corporate network range, average steps completed at 10M tokens rose from just 1.7 (GPT-4o, August 2024) to 9.8 (Opus 4.6, February 2026). The best single run completed 22 of 32 steps, corresponding to roughly 6 of the estimated 14 hours a human expert would need,â they write. âScaling inference-time compute improves performance even further. Increasing from 10M to 100M tokens yields gains of up to 59%â. Minor reward hacking: As AI systems get smarter, they tend to find devious ways to complete tasks. Here, the authors âoccasionally noticed models make progress through approaches not anticipated during range designâ. Why this matters - full cyber agents are getting close: AI systems have been getting better at cyberoffense for many years, but often the progress has been on narrow tasks. What this eval shows is that AI systems are getting better at doing entire attacks end-to-end. They havenât yet reached the âset it and forget itâ level of autonomy, but they are clearly on a steep trajectory of improvement. This will lower the cost of conducting cyberattacks and multiply the number of actors that can carry them out. Read more: How do frontier AI agents perform in multi-step cyber-attack scenarios? (AI Security InsâŠ
Send this story to anyone â or drop the embed into a blog post, Substack, Notion page. Every play sends rev-share back to Import AI.