Selectron

Open-sourcing a tool for AI-automated web parsing

May 05, 2025

Last time, I mentioned “Brocc: a BRowser-Observer Content-Collection Liquid-Interface”. I made some headway on this front, but the big update is that we’ve aligned on an idea! After several months in pivot mode, independently exploring ideas, we’ve decided to join forces on an exciting thing that Rob came up with. (More on that next time, or just ask me – we’re tentatively calling it Z).

That means I’m setting aside Brocc. But the soul of Brocc lives on – over the weekend I turned the core into an open-source library + terminal application.

Introducing Selectron:

Brocc was a desktop app built around Selectron, with AI features (semantic search + chat), and a focus on data privacy (local storage and local AI models). It worked, but local inference is still too slow to be practical in 2025 – I was planning on ditching the local-first focus. I think there’s still something there, but RIP for now.

How did this all start? I’ve generally been curious about use cases for ad-hoc codegen. I’ve also been doing a lot of web scraping lately – scraping is one of those mundane chores encountered by every programmer at some point, but I think it’s especially common when you’re in the idea maze (it’s an essential tool for research and prototyping). Scraping is much easier now than it used to be, because you can vibe code scrapers (just inspect element, copy the html, drop it in Cursor). There’s also a lot of buzz about AI browser tools. Briefly, my thoughts on the space:

AI browsers: Arc, Deta Surf, Perplexity Comet, etc
- Building an “AI browser” has always seemed to me like a fool’s errand – the Chromium family of browsers (Chrome, Edge, Opera, etc) make up 78% of global browser market share. So why not just connect to Chromium browsers over their shared protocol, CDP?
AI browser automation frameworks: Browser Use, Stagehand, etc
- Most frameworks essentially wrap Playwright with LLM calls. The basic pattern is simple: send some stripped-down version of the DOM to an LLM (plus maybe a screenshot), and prompt it to return page actions or extract data.
Stealth browser fleets: Browserbase, Steel, etc
- This is an evolution of the old world of rotating proxies like BrightData (which have better economics, but presumably worse ergonomics)

Selectron evolved from me testing a few ideas in this space:

An external app that connects to your existing Chrome instance is a pretty good format. You’ll have to walk the user through relaunching their browser, but it’s not too bad – I think this is an underutilized form factor.
Sometimes, I really just want to turn a list of content in my browser (feeds, listings, etc) into an “ad-hoc API”, stored in a database on my computer, queryable with SQL and vector search.
If Browser Use and Stagehand are “interpreter” approaches (using LLMs at runtime to parse websites and take actions), what would the “compiler” approach look like (using LLMs ahead-of-time to generate parsers). It’s certainly a lot cheaper (~free).

The end result is, I think, pretty cool. Even if you don’t have any immediate scraping needs, you might find it useful – sometimes I’ll fire up the Selectron CLI as I browse Twitter, so I can easily query what I’ve seen in DuckDB:

Steve Krouse

May 6

Cool! I was just asking for something like this on twitter: https://x.com/stevekrouse/status/1917568850343641522

Expand full comment

Rob Cobb

May 5

somehow reminiscent of Omar Riswan's TabFS -- https://omar.website/tabfs/

"let me get at my browser with my own programs" really does feel underexplored. browser extensions feel underpowered, scraping is a chore... excited to see where this goes!

Softerware

Discussion about this post