March 5, 2026 · 6 min read

Introducing CoT Feedback: Teaching AI Agents What Works

Our new Chain of Thought feedback system allows AI agents to learn which memory combinations lead to better reasoning.

AIMemoryProduct

takizen Team Product

When we talk about AI memory, we often focus on storage and retrieval. Can the AI remember what I told it yesterday? Can it find the right context when I ask a question? These are important problems, but they're only half the story.

The other half is learning. Not just remembering facts, but learning which facts matter in which contexts. Learning that some combinations of memories lead to better conclusions than others.

That's why we built CoT Feedback.

What is Chain of Thought Feedback?

Chain of Thought (CoT) is a technique where AI agents show their work—they explain their reasoning step by step before giving a final answer. It's powerful because it makes the AI's thinking process visible and verifiable.

But there's a gap. When an AI uses multiple memories to solve a problem, how does it know which memories helped and which didn't? Without feedback, every recall is equally weighted, regardless of whether the resulting reasoning was sound.

CoT Feedback closes this loop. After a Chain of Thought session, the AI (or its operator) can provide feedback on how effective the retrieved memories were:

Which memories were used in the reasoning chain
How effective they were (0-1 score)
Why they worked or didn't work (optional notes)

Over time, this creates a learning signal. Memories that consistently contribute to successful reasoning get boosted in future recalls. Memories that lead to errors get deprioritized.

How It Works

The implementation has three parts:

1. Session Tracking

When an AI starts a reasoning task, we create a cot_session record with the context query. Every memory retrieved during that session is linked to it via cot_session_memories.

2. Effectiveness Scoring

After the reasoning completes, feedback is submitted via the cot_feedback tool. The score (0-1) represents how effective the memory combination was for this specific context.

3. Recall Boosting

When the AI recalls memories with boost_by_cot: true, we combine semantic similarity with the historical effectiveness scores. The result is a boosted score that prioritizes memories with proven track records.

// Without CoT boost
await recall({ query: "How to optimize this database?" })
// Returns: memories ranked by pure vector similarity

// With CoT boost
await recall({ 
  query: "How to optimize this database?",
  boost_by_cot: true,
  cot_boost_weight: 0.3  // 30% CoT score, 70% similarity
})
// Returns: memories ranked by combined effectiveness

The Decay Factor

Effectiveness isn't static. A memory that was helpful six months ago might not be relevant today. That's why CoT feedback has its own decay mechanism (0.02/day, slower than memory decay's 0.05/day).

This means recent feedback has more weight than old feedback. The system adapts to changing contexts without forgetting hard-won lessons.

Real-World Impact

We're already seeing CoT Feedback make a difference in how agents use takizen:

Support agents learn which troubleshooting memories lead to resolved tickets
Code assistants discover which pattern memories produce working code
Research agents identify which source combinations yield accurate summaries

The key insight is that context matters. A memory about React hooks might be crucial for a frontend task and irrelevant for a database optimization. CoT Feedback learns these distinctions from experience.

Getting Started

CoT Feedback is available now for all takizen users. To use it:

Enable boost_by_cot in your recall calls
After Chain of Thought reasoning, submit feedback via cot_feedback
Adjust cot_boost_weight based on your use case (higher for established domains, lower for novel problems)

We're excited to see how the community uses this. AI memory isn't just about storage anymore—it's about learning what works.

What is Chain of Thought Feedback?

How It Works

1. Session Tracking

2. Effectiveness Scoring

3. Recall Boosting

The Decay Factor

Real-World Impact

Getting Started

Privacy & Cookies

Cookie Preferences

Essential Cookies

Analytics Cookies

Performance Cookies

Preference Cookies