Compliance as Code: How We Built Our Own Automated ISMS and Pen-Testing Framework
Every software vendor says it takes security seriously; the harder question is how you prove it. The usual answers - a certificate, an annual external penetration test - are point-in-time snapshots. We built something we think can be stronger and more honest: continuous, evidence-based proof, captured and explained so it can stand on its own. This is the framework, why we built it instead of buying one, and where it is taking us.
How do you actually prove software is secure?
Every software vendor says it takes security seriously. The honest question - the one a careful customer eventually asks - is simpler and harder: how do you prove it?
The traditional answers are a certification and an annual external penetration test. Both are genuinely valuable, and they are both worth pursuing. But it is worth being clear-eyed about what they are: point-in-time snapshots. A penetration tester spends a week, writes a report, and that report begins to go stale the moment it is handed over - the next dependency bump, the next feature, the next deploy is already outside its scope. A certificate attests to a moment on an audit date. That is real evidence, but it is evidence about one particular Tuesday.
We think there is a stronger way to demonstrate security - one that complements those snapshots and, done properly, can stand alongside them: continuous, evidence-based testing. Your controls exercised on every build. The results captured as real, inspectable artifacts rather than assertions. And, crucially, explained openly enough that a reviewer can see not just that you tested, but what you tested, what you found, and what you fixed. Properly evidenced and explained, that continuous proof can tell a security team more than a year-old report ever could.
That is the bet we made. Rather than wait for an annual external test to tell us where we stood, we built the capability to prove our security - to ourselves and to our customers - continuously. Underneath every framework, every certification, and every questionnaire is the same thing this engine produces on every build: evidence. So we built one.
Our journey: we looked at the platforms first
When we first set out to prove our security properly, the obvious move was to buy a compliance platform. We evaluated the well-known ones - Vanta, Drata, Secureframe. They are genuinely good products, and for a great many companies - especially ones racing to a first SOC 2 without spare engineering capacity - they are the right answer. We would recommend them without hesitation in that situation.
But the more closely we looked, the less they fit us specifically. Three things stood out.
- Cost. They are ongoing subscriptions whose price scales with your company size and the frameworks you pursue - and that spend grows at exactly the moment you would rather be putting the money into actual security engineering.
- Depth. They are built to collect and monitor the evidence an integration can reach - cloud configuration, access and MFA status, policy attestations and the like. That is real, useful, and broad. But it is a different focus from running your own deep security testing: exercising your penetration tests, mapping your test suite to the controls it proves, and tying dependency and code-level security into your own build. We wanted that code-level depth, and it is not what these platforms are built around.
- Ownership and grounding. The evidence lives in someone else's platform, on a subscription, and tends to be presented as a control status rather than the underlying artifact. We have written before about refusing to let our AI make claims it cannot ground in real data, and we were not about to hold our security evidence to a lower bar than our product - we wanted the proof to be the artifact itself, owned in our own repository.
We are a security- and engineering-led company with the capacity to build. So we made a different call: build our own evidence engine, and treat compliance the way we treat everything else - as code.
There was a deeper reason than cost and fit, though. A platform would have tracked our progress toward certifications; it would not have made us one bit harder to attack. We wanted the opposite order of priorities: become genuinely more secure first, and let the evidence - the kind that also earns certifications - fall out of that work. So we built tooling that does both at once: it hardens the product by continuously testing it, and it produces the proof that customers, and certifications, ask for. The hardening is the real thing, and we wanted to lead with the real thing.
The real problem is evidence, not controls
Anyone can write a control. The hard question - the one an external assessor or a customer's security team actually asks - is "show me it is true, last week, not last year." A control you cannot continuously evidence is a control you cannot really claim.
Here is what most engineering-led companies miss: the evidence usually already exists. Your test suite proves your tenant-isolation logic holds. Your CI proves your dependencies are patched. Your scans prove your cloud configuration is hardened. The gap between "we run more than twenty thousand automated tests" and "here is the evidence that secure development operates as a control" is not a gap in security - it is a gap in collection and mapping. Closing that gap by hand, right before a deadline, is what a compliance programme spends most of its time doing. We decided that if the evidence already lived in our pipeline, the collection should live there too.
What we built
The engine has four parts, all of them version-controlled and grounded in real artifacts.
An evidence tree. A committed, date-stamped, classified directory structure that mirrors our ISMS. Application-security regression runs, dependency-vulnerability scans, internal assessment reports, a reserved slot for external penetration-test reports, and remediation retests all live here, in the repository. Filing a report is a commit, not an upload to a portal.
A findings register. Every security finding - from a scan, an internal assessment, or an external test - gets a stable identifier, a severity, an owner, a status, a link to its evidence, and a remediation and retest trail. It is a structured file in the repository, so a finding's entire lifecycle (found, fixed, retested, verified) is visible in git history. Nothing falls through the cracks, because the cracks are version-controlled.
The other two parts - the penetration-testing toolkit and the assurance layer - each deserve a section.
Automated penetration testing, honestly scoped
The centrepiece is an in-repo dynamic-analysis (DAST) toolkit that runs a battery of test lanes against our own platform: reconnaissance and TLS posture, injection and server-side template injection, file-upload abuse, API security, authenticated and cross-tenant authorization, AI and LLM red-teaming, secret scanning, known-CVE dependency analysis, and cloud-configuration posture. One run exercises the breadth of the OWASP testing surface and writes the results out as structured evidence.
Two design choices matter more than the lane list.
It tests for real. We stand up dedicated, data-less, resettable copies of the production stack as throwaway test tenants. Because they hold no customer data, we can run genuinely destructive payloads against them - the things you would never dare aim at production - and then reset. That is a far sharper test than the "safe mode" most scanners are limited to against a live system.
It is scope-fenced and honest. The toolkit fails closed: it refuses to run against anything outside an explicitly authorised target. Every report records exactly what was tested - the lanes that actually ran, a written coverage note explaining what was not tested and why, and findings de-noised so only real production secrets surface rather than the test and mock keys that clutter a naive scanner. When a control legitimately blocks our own tooling - our login bot-protection refusing automated logins, for instance - that is recorded as a verified positive control, not quietly skipped. An evidence artifact that over-claims its own coverage is worse than no artifact at all, so ours is built to under-promise.
A point-in-time penetration test is a photograph. Continuous, evidence-based testing is a live feed. Both have their place - but the live feed is the part most security programmes are missing, and, properly evidenced and explained, it can carry far more of the assurance load than a once-a-year report.
The toolkit produces machine-readable JSON that feeds the register and our automation, and a presentable HTML report for auditors and customers. We are not claiming an automated scanner out-thinks a skilled human tester - a person still brings creative, business-logic exploit-chaining that tooling does not. But that human depth is most valuable layered on top of a continuous evidence base, not instead of one. Run on every build, captured as real evidence, and explained openly, the continuous layer can supplement an external test - and, for a good deal of what teams actually rely on a once-a-year assessment for, stand in for it.
Turning the test suite into control evidence
The part we are most pleased with is the assurance layer. It maps our existing unit and integration tests to the security controls they exercise, and reports them as evidence. Instead of asserting "we isolate tenants," the report can state: multi-tenant isolation - evidenced by 31 tests across six classes, all passing. The AI safety chain - 82 tests, backed by a dedicated OWASP LLM Top 10 benchmark. Add named, passing tests for PII redaction, cryptographic and device-trust controls, and audit-log scrubbing, with line coverage enforced as a hard gate on every build.
That is the real unlock. Writing good tests is work we would do anyway. The assurance layer makes that work double as proof that a control is both implemented and verified. We are not manufacturing compliance theatre; we are collecting proof that already existed and was simply never mapped to the framework that needed it.
Running it in the pipeline
Some of this already runs in CI. Our build pipeline runs OWASP Dependency-Check across the whole dependency tree on every change, generates a software bill of materials, and flags new CVEs. Crucially, false positives are handled with documented, auditable suppressions rather than silent exclusions - when we patch a critical dependency vulnerability, or suppress a provable mismatch between a CVE and the library it was matched against, the change and its full justification live in version control, where an assessor can read the reasoning, not just the result.
The direction is to run the whole engine continuously. The offline lanes - secret scanning, dependency CVEs, cloud-posture review, and the test-to-control assurance mapping - need no live target and run on every build. The live DAST against the throwaway tenants runs on a cadence: nightly, and before each release. We built the collector so it can run the dependency scan itself, exactly as the CI pipeline does, so the same evidence is produced whether a developer runs it locally or the pipeline runs it at three in the morning.
The payoff is the one thing screenshots can never give you: evidence that is never stale. When an auditor or a customer's security team asks for proof, the latest run is from the last build - not a screenshot from two quarters ago. Compliance stops being a periodic scramble and becomes a by-product of shipping.
Real security beats a wall of tickboxes
Here is the belief underneath all of this, and it is worth saying plainly: real, hardened security matters more than a wall of certification tickboxes. The certificates are valuable - they are independent validation, and worth pursuing deliberately - but a badge is a proxy. It is awarded at a moment in time, and a proxy can lag the truth in either direction: certified companies get breached, and careful teams run genuinely tight operations well before the certificate arrives. What actually protects a customer's data is not the certificate on the wall; it is hardened systems kept under continuous test.
So we built the substance first. We find and fix our own weaknesses, scan every dependency on every build, red-team our own AI, and prove our tenant isolation on every commit - and the certifications follow that work, in the right order, as its natural validation rather than its substitute.
That order is what makes this genuinely useful to a customer's security team. A certificate tells them a box was ticked on some date; our framework lets them see the real depth - what we test, how often, what we found, what we fixed, and the controls our own test suite proves on every commit. For a team doing real diligence, that substance is the very thing the badge is meant to stand for, shown directly.
Where we are today - and whether it could work for you
Where we are today: the evidence engine runs in production behind Graylark LRM, feeding a committed, version-controlled ISMS, and we are working toward formal certification from a position of real, demonstrable security rather than the other way around. The engine earns its keep on every enterprise security review we go through - it lets us answer "prove your controls work" with real, current evidence rather than assurances, and it holds our own claims to the same standard it holds its reports to: say what is true, and show it.
Could the approach work for you? If you need a first SOC 2 yesterday and have no engineering to spare, buy the platform; it will get you there faster than building anything. But if you are a product company where security is a genuine differentiator, your data is sensitive, and you have engineers who would rather own their tooling than rent a dashboard, compliance-as-code compounds in a way a subscription cannot. The evidence is the real artifact rather than an abstraction of it. It lives with your code, versioned and auditable forever, and you own it outright. It is grounded in what you actually built. And because it runs in your pipeline, it is never stale.
We built this engine for ourselves, but the pattern generalises, and we think it is the better answer for a certain kind of company. If you are wrestling with the same problem - weighing the same build-versus-buy decision - we are happy to compare notes. The mindset shift underneath it is small but total: stop treating evidence as something you gather before an audit, and start treating it as something your pipeline emits every day. Once the proof is a continuous output of how you build, the audit is mostly a matter of reading the logs.
Closing view
This is how we run security at Graylark today. The engine backs Graylark LRM, our multi-tenant labour-relations platform, where multinational customers trust us with some of their most sensitive employee-relations data - and proving our security continuously, with real evidence rather than once-a-year assurances, is a standard we hold ourselves to because it is the right way to run, not because a box demands it.
That is the pattern we believe in: compliance evidence generated by the same pipeline that ships the product, grounded in real artifacts, owned outright, and never out of date. For broader platform context, visit Graylark Technologies.