My cheatsheet for a clean context
Hey folks Boarding my flight to SF very shortly, and I got an email to let me know - no WiFi today. Uh oh. I was kinda hoping my 11 hours uninterrupted hours without the kids would be productive for once (Iām usually a very OOO long-hauler, no internet). But I still have some work to polish this talk Iām giving on Tuesday. Iām also in town looking to deploy $100k cheques to dev tools and infra founders, plus see some of my wonderful LPs and meeting new ones. Benās Bites Fund II has already started investing. So my flight⦠Iāve had to hurriedly download a few local models so I can use my agents offline and I think, so far, Gemma 4: 26b is going to be my choice. Weāre so spoiled today with fast intelligence at our fingertips and itās funny how used to the new intelligence levels we get Local models are slow to boot up (youāve got to be more mindful of what context is being loaded on startup (so Iām running with no-skills to get it to go faster, I can call the skills when I want ā maybe Iād actually prefer to do that generally š¤). And they feel pretty slow to do work, but only because of said spoils. Iāve been in the weeds of context management recently because of the course Iām working on. And itās been useful to just remind myself about how prickly it can be; If an agent runs web searches - presumably you didnāt read them, its gobbling up context from content you do not know is 1. right, 2. not ai-slop, and 3. by a source youād recommend. Little (or big) lines of slop, misdirection, misinformation slip in to the context and compound over time Reaching ~60% of a context window is probably the limit of where you want to be Use other sessions as context-gathering sessions, if thereās lots of documents then create one summary file with the information (and try to read or at least skim it! - I am trying, promise) I donāt trust 1M context windows, thereās a great post by Thariq from Anthropic below about this window. I shouldnāt need my context for my tasks to need perfect recall beyond ~150k tokens, thatās a lot of words. Only until 1M context windows are the norm, the models dont forget anything and help clean polluted context along the way! Anyway, got to head to the gate! This was a little different of an intro, let me know if you liked it. I need to share more as Iām learning (or diving deeper). Benās Bites is brought to you by Attio, the AI CRM Honestly, no one gets excited about a CRM. But then they try Attio. It connects to Claude Code and n8n through its MCP server, completely bridging the gap between my customer data and apps. Wait, there's more, like flagging churn risk and turning customer feedback into Linear projects. Try it now. Claude Codeās desktop got a redesign. Brings many CLI-only features and more (like split windows for multiple sessions) to the desktop app. Big improvement, but still a lot is missing. It picks up some CLI sessions but not all, opening/editing files isnāt obvious, and it keeps asking for permission even with ābypassā settings on. Gemini also has a native Mac app now. But itās light on features - no Gems, no notebooks - and the design feels rough to say the least. New models - GPT-5.4-Cyber from OpenAI, fine-tuned for cybersecurity, with limited access to trusted partners. And Gemini 3.1 Flash TTS from Google - better voices, audio tags for controlling tone and pacing, and 70 languages. Routines in Claude Code are now in research preview - set up a prompt, a repo, and your connectors once, then run it on a schedule (or via API/GitHub trigger). Runs on Anthropicās infra, so you donāt need your laptop open. Basically, extended cron jobs. OpenClaw calls these heartbeats. With the latest update to OpenAIās Agents SDK, you can run Codex-style agents in production without building the whole harness yourself. You get sandboxed execution, computer-use, skills, memory, and compaction built in. Most RAG systems return wrong answers with complete confidence. Gauntlet's free Night School covers how production AI engineers actually fix that ā setup, evaluation, the full loop. Wednesday, April 22. Register free* Skills in Chrome let you save prompts as reusable one-click workflows that run on whatever page youāre viewing. Cursor can now respond with interactive canvases - dashboards and custom interfaces instead of just text. Resend shipped a new email editor with BYOA (bring your own agent). Thereās a built-in LLM, but you can also MCP into the editor with your own setup. Sparkle v4 from Every - let AI organise your filesystem like you would. Daniel pointed an agent at 5 years of home-building emails (511 events, 690 documents, 170 finance records) and got back a full project timeline in ~$500 of Opus tokens. Impeccable v2 - the design skill for coding agents. v2 adds a CLI scanner (works without an LLM), a Chrome extension, and a /shape command that runs a design interview before writing any code. Using Claude Code - guide on session management, compaction, and the 1M context window. 30 min tutorial on building software with agents in Cursor. Lindy AIās founder says GLM 5.1 will likely become their default over closed-source models for most use cases, saving them a bunch on inference (their biggest cost, more than payroll). OpenRouter now offers video generation models with one universal API across all video models. Copilot in Word now tracks changes and leaves comments. Windsurf 2.0 - Manage all your agents from one place and delegate work to the cloud with Devin. Gradient Bang - a fun multiplayer game with subagents in space. Built with Pipecat, Supabase, and open-source. Read about me and Benās Bites š· thumbnail by @keshavatearth * sponsors who make this newsletter possible :) Wanna partner with us for the next quarter? Email us at shanice@bensbites.com or k@bensbites.com
Send this story to anyone ā or drop the embed into a blog post, Substack, Notion page. Every play sends rev-share back to Ben's Bites.