Alpha is open · sign in to start

Is Claude Cowork actually working
for your users and clients?

Argus captures every session of your users across your team or organization, then reads across thousands of them at once — surfacing where skills are holding, where they're quietly breaking, and which patterns are pointing at the next thing you should build.

Magic-link sign in. No password. Free during alpha.

arguslab.co/northwind/dashboard

AArgus/Nnorthwind

Sessions captured

142+12%

Cost

$58.41+8%

Tokens

18.4M+14%

Active users

8flat

Errors

3−47%

SessionsErrors

Spot checks

Cache hit rate86%

Avg session cost$0.41

P50 latency78 ms

Tools / session14.2

Recent sessions

workspace

Refactor the Stripe webhook handler to retry on idempotency conflicts…runningsara@northwind.io$0.84

Sync Linear backlog with the GitHub release milestone…completedjames@northwind.io$1.42

Draft the weekly customer-success digest from HubSpot + Slack threads…flaggedpriya@northwind.io$4.21

Live preview · Argus dashboard, one workspace

The problem

Once it ships, you go blind.

Claude Cowork's built-in telemetry tells you a skill was invoked. It can't tell you whether it worked, whether the user took the answer, whether you should ship a fix tomorrow.

№ 01

The counter that says nothing.

Cowork's built-in telemetry logs your skill as Skill: 3. Three invocations. Across 412 sessions for one client this month, you have a single number per skill — invocation count. Nothing about which versions ran, what they returned, whether the user accepted the answer or had to ask twice.

observed: 412 sessions
surfaced: 1 aggregate counter

№ 02

The skill that broke quietly.

You shipped weekly-review four weeks ago. Across the first 38 sessions, users accepted the answer on the first turn. Across the next 9, they re-asked, rephrased, switched tools. Something started failing on session 39. Nobody noticed — the cost line stayed flat and no single session looked broken on its own.

skill: weekly-review · v1.2.0
pattern: first-turn 96% → 22%

№ 03

The skill that wasn't there yet.

Across the team's traffic this month, “turn this Linear ticket into a release-notes entry” came up fourteen times in eight different phrasings. Each one got a different ad-hoc answer; one user gave up. A skill is waiting to be written there. No telemetry surface — yours or anyone else's — will ever find it.

pattern: “linear → release notes”
sessions: 14 · 8 phrasings · no skill

The instrument

What the counters can't see.

Argus runs as a Claude Cowork plugin. It captures every session — prompts, assistant responses, every tool call, every follow-up — in plain text, stitched back into the conversation the user actually had. The qualitative layer that makes everything else possible.

№ 01

Did the user accept the answer?

A skill that works ends the conversation. A skill that doesn't gets re-prompted, rephrased, abandoned. Argus captures the user's exact follow-ups so the Agent can see, at a glance across hundreds of sessions, which versions of which skill are landing on the first turn — and which aren't.

Captured · used in: first-turn acceptance, follow-up patterns

№ 02

Did the assistant ask the user to do its job?

Skills should answer questions, not ask new ones. When a skill is under-specified, the assistant stalls — “could you clarify…”, “which one did you mean…” — and the user does the work the skill was meant to do. Argus captures the stalls so the Agent can show where they cluster.

Captured · used in: stall frequency, under-specified prompts

№ 03

What did the tool actually return?

The MCP call succeeded. Status 200. But the payload was empty, or a 400-row dump, or a JSON that didn't match what the skill asked for. Argus captures the tool's plain-text output beside the assistant's response, so the Agent can flag the sessions where the skill kept going on bad input.

Captured · used in: tool-output mismatches, silent failures

№ 04

What did users ask for that no skill could handle?

The prompts your customisations don't yet cover. Same export, same lookup, same wrangle — captured verbatim, even when nothing answered them. The Agent reads across the unmet-prompts corpus and surfaces patterns ready to become the next skill.

Captured · used in: unmet-prompt clustering, skill candidates

The loop

From every session, a catalogue that improves itself.

Five moves, read bottom-up — raw work at the foundation, refined knowledge on top. Each layer rests on the one beneath it; the loop settles new and refined skills back into the next session.

Refined
knowledge

Raw
work

№ 04

Layer 5

Refine

Rate the work; an agent refines weak skills and drafts the ones your usage is asking for.

Agent · planned

rate · refine · draft

№ 03

Layer 4

Review

Replay grouped by skill; analyse every invocation across the org.

Org catalogue

SkillsMCPsAgentsPlugins

№ 02

Layer 3

Clean & structure

Redacted and tenant-isolated, then each session is rebuilt into a complete, structured record — the data model that makes the detail possible.

private · fully modelled

№ 01

Layer 2

Capture

Plugin hooks and OpenTelemetry capture every prompt, tool call, reply and metric — at the source.

plugin + OTEL

№ 00

Layer 1

Work

People run their everyday Cowork sessions — the foundation everything rests on.

where value is made

↻ Refined & new skills settle back into the work at the foundation.You stay in control/private /tag /scope /rate

The agent

Your work, made legible.

The Argus Agent reads across thousands of sessions of the same skill, MCP, or agent — and tells you the next move. Available in the Argus web app. Coming as a Claude Cowork plugin you can invoke during a working session.

№ 01

Sessions, organised.

Every Cowork session captured and indexed. Filter by user, project, skill, or status. Annotate any turn. Diff version-by-version what a skill produced.

The qualitative spine

№ 02

Usage, per skill and per MCP.

First-turn acceptance · follow-up rate · stall frequency · tool-output mismatches, per version, over time. The week something starts drifting, the chart bends before any single session looks broken.

Per skill · per version · per client

№ 03

Quality across versions. soon

The Agent reads across a skill's failed and refined sessions and proposes concrete edits to the SKILL.md — sharper trigger, missing tool, an example that would have caught the failure. You approve the patch; the new version lands in the marketplace as a pull request.

Patch · review · merge

№ 04

User needs, at scale. soon

The same Agent clusters the unmet prompts across the workspace and drafts candidate skills — the recurring asks your team didn't realise were systematic. Less guesswork. More shipped.

From recurring asks to draft skills

The Argus Agent is itself a Claude Cowork plugin

It runs the same way every other skill on the platform runs. It captures itself. It reviews itself. It refines itself. The thing the consultant is shipping is running on the rails the consultant is shipping.

The workshop uses the workshop.

A sample exhibit

A session, as it is filed.

One real session from a forward-deployed engineer's portfolio. Names redacted. This is what the Agent reads.

№ 0042revenue-ops · churn write-upflagged12m 04s · 18 turns · 52 tools · $4.21

Upriya@northwind.io·14:22:07

Pull Q2 churn from HubSpot and Stripe, cross-reference with NPS scores from [redacted-customer] in Intercom, and write up the top three churn drivers in our internal weekly-review voice.

Aopus-4-7·14:22:12$0.18

I'll pull the cancellation reasons from Stripe and the closed-lost notes from HubSpot in parallel, then join them against the NPS responses Intercom logged in the same window. Kicking off three Explore subagents — Stripe's rate-limited so it'll iterate.

subagentTask · explore14 tools · 241K cache · $0.43

mcp · stripestripe_list_canceled_subscriptionsstatus=200 · 1m 39s

skill/weekly-reviewv1.2.014.8K tokens · 8.2s

flagqa.flagcohort missing from HubSpot export · 3 customers

Tokens

Input26K

Output8.1K

Cache (read)1.34M

Cache (write)28K

Total1.40M

Cost breakdown

Main thread$3.78

Subagent · explore$0.43

Annotations · 2

matt · 18m ago

Churn write-up is solid, but the missing HubSpot cohort should have been caught at the join step. Track for workflow review.

Data & privacy

Where the data goes.

Argus captures the conversations your customisations run. We have to be careful with them. Four commitments we won't move on.

№ 01

Secrets never leave the machine.

The capture plugin scrubs API keys, OAuth tokens, Bearer headers, and common password patterns at the source — before the envelope leaves the user's computer. Anthropic, OpenAI, Supabase, GitHub, AWS, Slack patterns are caught by default; you can add your own.

Built · plugin-side · pre-transit

№ 02

One word makes a session private.

Type /private in Cowork at any point and the plugin stops capturing that session — and deletes anything already shipped. The escape hatch is the user's, not the agency's. No support ticket, no admin approval.

Built · the /private command

№ 03

Redact names before review.

Per-workspace patterns for emails, names, and custom regex run on every captured envelope before it's persisted. Reviewers see [redacted-customer], not the company name.

Coming · per-workspace rules

№ 04

Encrypted, isolated, never trained on.

TLS to the ingestion worker, AES-256 at rest in the database, workspace-isolated by row-level security on every query. Argus never uses your captured sessions to train any model — ours, Anthropic's, or anyone else's.

Standing policy · enforced at the database

Audience

Three rooms, one Argus.

The same captured session is read three ways. Each room sees what it needs to and nothing it doesn't.

The forward-deployed engineer

The agency or solo consultant

You ship custom skills to several clients. You need to know which versions are working at which client, where you're about to get a support ticket, and which user prompts are pointing at the next thing to build.

· Per-client dashboards
· Skill-version diffing
· Annotate any session
· Draft the next skill from unmet prompts

Primary user

The internal IT lead

Rolling Claude Cowork out at scale

You're standardising your org's MCP servers, you've published your first internal skills, and you need to know — across 200 engineers — which patterns work and which don't, before the leadership review.

· Per-team usage
· MCP health rollups
· Pattern detection at scale
· QA gates for new skill versions

Secondary user

The client stakeholder

The team paying for the work

You want to know your team's Cowork install is being used, that the skills you commissioned are landing, and you'd like to see one well-organised summary instead of a Slack thread.

· Read-only access
· Monthly quality brief
· Scoped to your projects only
· No raw session bodies

Read-only · scoped

Progress

Where we are in the work.

Where Argus stands as of June 2026. Numbers update as the private beta rolls forward.

Pilot workspaces

Two agencies, one internal IT team, one solo consultant. Coverage of all three audience types.

Sessions captured

142K+

From the live plugin running across pilot teams.

Public opening

June 26

Stable plugin, MCP-friendly skill catalog, version-diff QA — that's the bar.

Start

The alpha is open.

Sign in, create your workspace, drop the plugin into Claude Cowork — five minutes from a cold tab to your first captured session. Free during alpha, no card needed.

Step 1

Sign in with your email

Magic-link auth. We send a one-time link, you click it, you're in. No password to remember, no signup form to fill.

What happens next

Pick a workspace name, choose privacy defaults.
Download a pre-configured plugin bundle, drop it into Cowork.
Open a fresh Cowork session, say hi — the first capture lands in seconds.

We don't sell your data, we don't train models on it, and you can opt any session out at any time.

Is Claude Cowork actually working for your users and clients?

Once it ships, you go blind.

The counter that says nothing.

The skill that broke quietly.

The skill that wasn't there yet.

What the counters can't see.

Did the user accept the answer?

Did the assistant ask the user to do its job?

What did the tool actually return?

What did users ask for that no skill could handle?

From every session, a catalogue that improves itself.

Your work, made legible.

Sessions, organised.

Usage, per skill and per MCP.

Quality across versions. soon

User needs, at scale. soon

A session, as it is filed.

Where the data goes.

Secrets never leave the machine.

One word makes a session private.

Redact names before review.

Encrypted, isolated, never trained on.

Three rooms, one Argus.

The agency or solo consultant

Rolling Claude Cowork out at scale

The team paying for the work

Where we are in the work.

The alpha is open.

Sign in with your email

Is Claude Cowork actually working
for your users and clients?