Kalibr: Psychometric intelligence for teams and AI
Kalibr measures how you think, decide, and lead, then generates a leadership profile, a blind-spot map, and a personalised AI constitution calibrated to your cognitive style. Built for founders, operators, and leadership teams who want self-awareness that goes beyond generic personality tests.
What you get
- Leadership profile: scored across 10 dimensions including drive alignment, adaptability, influence style, feedback orientation, and energy resilience.
- Blind-spot map: recurring patterns in your thinking that could derail decisions under pressure, derived from your score profile.
- AI constitution: a plain-text system prompt built from your results, pasteable into ChatGPT, Claude, Gemini, or any AI tool. You own it and use it anywhere.
The science
Kalibr's assessment draws on four independently validated psychometric instruments administered as a unified battery. Every dimension traces to published, peer-reviewed research. We do not invent new constructs, we assemble validated foundations into a proprietary measurement and scoring engine, then add an interpretation layer built specifically for working relationships and team dynamics.
- A meta-analysis of 117 validity studies (N≈24,000) showing conscientiousness predicts job performance across all occupational groups, the most replicated finding in occupational psychology (Barrick & Mount, 1991).
- Multiple meta-analyses showing team-level personality composition reliably predicts performance outcomes (Bell, 2007; Han et al., 2024; Peeters et al., 2006).
- Research on interpersonal complementarity and similarity in working relationships, including the "bad apple" effect and dyadic compatibility (Dryer & Horowitz, 1997; Felps et al., 2006; Scherrer et al., 2018).
- Validated measurement of grit, resilience, and sustained performance under pressure (Duckworth & Quinn, 2009).
"The product is scientifically credible, built on the right literature, and uses validated instruments correctly. The pairwise compatibility algorithm is novel and unvalidated, which is exactly what the pilot is for."
We distinguish publicly between what is empirically established (the instruments and team-level findings), what is theoretically defensible (the proprietary dimension composites), and what is an active hypothesis being tested through the pilot programme. Most products in this space do not make that distinction.
How the AI constitution works
Large language models are sycophantic by default. Without context about who you are, they optimise for responses that feel good rather than responses that are useful. They validate hunches, soften disagreement, and mirror the framing you bring to them, which is the wrong behaviour for a thinking partner.
The Kalibr AI constitution is a plain-text system prompt built from your psychometric results. It gives the AI four things it otherwise cannot know: your cognitive style and reasoning tendencies, your documented blind spots, your feedback preferences and communication style, and the dimensions where your scores suggest you most need challenge rather than validation.
Armed with this, the AI does not become a different model, it becomes a better-positioned interlocutor. When you exhibit a known pattern (say, over-theorising execution problems), the constitution primes the model to flag it. When you seek validation for a decision in your blind-spot zone, the model probes rather than confirms. This works because frontier language models are extraordinarily good at context-following when given specific, well-structured information about the person they are talking to. The pilot data below demonstrates this in practice.
Scoring is deterministic: identical inputs always produce identical outputs. No AI is involved at the measurement stage. The AI is used only to interpret your scores into narrative language at the report-generation stage.
From the pilot
"It challenges me and pushes back at the right moments, rightly identifies when I spiral into analysis paralysis, and is direct when needed. Combined with my project-level CLAUDE.md instructions it's turning out to be really good."
"I didn't realise how much of an echo chamber Gemini had become until I loaded my Kalibr constitution. It flagged my documented tendency for impulsive scope creep and demanded a 30-day risk mitigation plan. It literally feels like having a skeptical board member in my terminal."
"I've taken MBTI, DISC, and CliftonStrengths. All gave me 30 pages of corporate fluff about how 'visionary' I am. Kalibr gave me one page that said my extreme tolerance for ambiguity is actively creating operational chaos for my direct reports. Brutal, but it's the first assessment I've ever actually used to change how I manage."
"'Decisions that require straightforward execution may be over-theorised.' This has actually happened with me and one of my employees twice. My manager had to step in both times. Impressive your system found this."
Frequently asked questions
How do I make my AI give better answers?
The core problem is that AI models have no model of you specifically. They default to answering for the median human, which means responses are systematically miscalibrated for your reasoning style and blind spots. The fix is an AI constitution — a structured system prompt built from your psychometric profile that tells the model where your reasoning is unreliable, how you prefer to receive feedback, and where to apply more friction. Kalibr generates this automatically.
What is an AI constitution?
A personalised system prompt you load into any AI tool — ChatGPT, Claude, Gemini — that calibrates the model to your specific cognitive profile. It encodes your reasoning tendencies, documented blind spots, feedback preferences, and the areas where you most need challenge rather than validation. The result is an AI that behaves like a well-briefed thinking partner rather than a generic assistant.
How do I personalise ChatGPT or Claude for my personality?
Take a validated psychometric assessment, identify your cognitive style and recurring failure modes, then encode that into a system prompt. The prompt should tell the model your reasoning tendencies, where you over-index, how you respond to pushback, and what you want challenged. Kalibr automates this entire process: one assessment generates a ready-to-paste AI constitution for any AI tool.
Why is AI advice so generic?
AI models are trained to optimise for responses that feel good to an average human. Without knowing who you are, the model validates your existing biases rather than challenging them. This is not sycophancy — it is the model being uninformed. Uninformed models applied to high-stakes personal decisions fail in predictable ways that track the user's existing biases. The solution is giving the model a structured profile of you before the conversation starts.
How can psychometrics reduce mis-hires?
Validated instruments give you base rates for how a person behaves under pressure, when receiving critical feedback, and when facing ambiguity. By assessing both candidate and team, you can identify complementarity, predict friction, and compress the forming-storming-norming cycle most teams spend 3–6 months navigating blind. Kalibr's team cohesion report maps pairwise compatibility across 10 dimensions and surfaces structural risks before they become conflicts.
Privacy and data
Assessment responses are stored securely and used only to generate your personal report. We do not sell or share your data with third parties. Your psychometric data is encrypted at rest and you can delete your profile at any time. Full details in our Privacy Policy and Terms of Use.
References
Barrick, M.R. & Mount, M.K. (1991). The Big Five personality dimensions and job performance. Personnel Psychology, 44, 1–26.
Bell, S.T. (2007). Deep-level composition variables as predictors of team performance. Journal of Applied Psychology, 92(3), 595–615.
Dryer, D.C. & Horowitz, L.M. (1997). When do opposites attract? Journal of Personality and Social Psychology, 72(3), 592–603.
Duckworth, A.L. & Quinn, P.D. (2009). Development and validation of the Short Grit Scale. Journal of Personality Assessment, 91(2), 166–174.
Felps, W., Mitchell, T.R. & Byington, E. (2006). How, when, and why bad apples spoil the barrel. Research in Organizational Behavior, 27, 175–222.
Han, A. et al. (2024). Revisiting the relationship between team members' personality and their team's performance: A meta-analysis. Journal of Research in Personality.
Jolić Marjanović, Z. et al. (2024). The Big Five and collaborative problem solving. Personality and Social Psychology Bulletin.
Peeters, M.A.G. et al. (2006). Personality and team performance: A meta-analysis. European Journal of Personality, 20, 377–396.
Scherrer, P. et al. (2018). Similarity and positivity of personality profiles predict relationship satisfaction. Frontiers in Psychology.
Van Vianen, A.E.M. & De Dreu, C.K.W. (2001). Personality in teams. European Journal of Work and Organizational Psychology, 10(2).
Blog
Writing on psychometrics, AI, and teams.
MBTI is astrology. Here's what actually holds up.
June 2026 · 5 min read
If you've taken the Myers-Briggs test, you were told you're an INTJ or an ENFP and handed a description that felt uncannily accurate. That feeling is the Barnum effect: vague, flattering descriptions that almost anyone will accept as their own. Horoscopes work the same way.
MBTI has a retest reliability problem. Take it twice, six weeks apart, and roughly 50% of people get a different type. That's no better than chance. DISC is marginally better on reliability but still predicts almost nothing about how someone behaves under pressure.
What actually holds up
The Big Five has been replicated across six decades of research, across cultures, in laboratory and field settings. The five factors (openness, conscientiousness, extraversion, agreeableness, neuroticism) predict job performance, relationship stability, health outcomes, and behaviour under stress with validity coefficients in the 0.3–0.5 range.
The specific factor that matters most for working life: conscientiousness. A 1991 meta-analysis of 117 validity studies (N≈24,000) found it predicts job performance across every occupational group tested. No other personality dimension comes close.
What this means in practice
Knowing you score high on agreeableness and low on neuroticism tells you something specific: you're likely to interpret critical feedback as information rather than personal attack, and you probably stay stable under pressure. Knowing you score high on openness and low on conscientiousness tells you something different: you generate ideas easily but finish fewer of them than you intend to.
These aren't destiny. They're base rates for your own behaviour. The point of psychometrics isn't to label you. It's to give you a map specific enough to be actionable: not "you're a planner" but "you over-weight precision and under-weight speed when the cost of delay is high."
What Kalibr measures and why
Kalibr's assessment draws on four validated instruments including the AMBI (Analog to Multiple Broadband Inventory). The output covers 10 dimensions: Drive Alignment, Risk Orientation, Diligence, Ambiguity Tolerance, Feedback Orientation, Resilience, Bonding Style, Stability Preference, Openness, and Grit. Each traces back to validated research. Scoring is deterministic — identical inputs always produce identical outputs.
Your AI doesn't know you. That's the whole problem.
June 2026 · 6 min read
Everyone's noticed that AI advice is generic. The usual explanation is sycophancy: models are trained with RLHF (reinforcement learning from human feedback), which rewards responses that feel good, so the model learns to validate rather than challenge.
That's true, but it misses the more important problem. A model that knew you well could still be sycophantic. The bigger issue is that the model has no information about you at all. It defaults to a median-human prior — responses calibrated for the average person, which are wrong for any specific person in proportion to how different they are from average.
What the model doesn't know
When you ask Claude to evaluate your business idea, it doesn't know that you have a documented pattern of over-weighting revenue potential and under-weighting operational complexity. It doesn't know you shut down when feedback is delivered bluntly. Without that context, the model produces answers designed for a generic professional. You get generic output.
Training-side fixes make models less agreeable on average. Context-side fixes make the model calibrated to your specific failure modes. Only one of those targets your actual problem.
What an AI constitution does
An AI constitution is a system prompt loaded before the conversation starts. It gives the model four things it otherwise can't know: your cognitive style and reasoning tendencies, your documented blind spots, how you receive feedback, and where the model should apply friction rather than validation.
With this context, the model stops treating you as a generic user. When you exhibit a known pattern — say, anchoring on the first solution you generated — the constitution primes the model to flag it. A bypass command ("free mode:") lets you disable it when you want unconstrained output.
How Kalibr builds yours
Kalibr administers a battery of four validated psychometric instruments as a single 20-minute assessment. The scoring engine is deterministic. Scores across 10 dimensions are encoded into a structured plain-text system prompt you paste into ChatGPT, Claude, Gemini, or any model that accepts system-level context. You own it permanently.
Why your team is still storming six months in.
June 2026 · 6 min read
In 1965, psychologist Bruce Tuckman described four stages every new team moves through: forming, storming, norming, performing. Most teams spend far too long in storming. Six months is common for early-stage teams. Some never leave it.
Why storming lasts so long
The reason isn't bad hiring. It's that the differences driving friction are invisible. When two people disagree repeatedly about how much information they need before making a decision, they don't experience it as a measurable difference in risk tolerance. They experience it as one person being reckless and the other being indecisive.
These patterns are legible in psychometric data. They're not legible from observation alone, at least not within six months.
Which dimensions predict the most friction
Three dimensions predict storming friction more reliably than any other: Feedback Orientation (whether someone processes criticism as information or as social attack), Risk Orientation (tolerance for ambiguity and downside exposure), and Stability Preference (need for predictable process vs. appetite for change).
When two people are far apart on any of these, they produce conflict in exactly the domain their score predicts — repeatedly — until they have an explicit model of why.
What a shared map changes
Kalibr's team cohesion report gives both people that map before the friction emerges. The output is a Team Cohesion Score (0–100) and a working agreement: specific protocols for how this pair communicates, makes decisions together, and handles the moments where their profiles are most likely to diverge.
The working agreement doesn't resolve the underlying differences. It names them, which turns invisible friction into a legible problem both people can address. That shift compresses months of social inference into a structured conversation in week one.
Visit kalibriq.com to take the assessment (JavaScript required).