Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Add Augmentation capability for context enrichment #142

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PederHP
Copy link

@PederHP PederHP commented Jan 15, 2025

Motivation and Context

This PR introduces an Augmentation capability to MCP, addressing a fundamental need in AI applications: Application-controlled context transformation. While MCP currently supports user-controlled context through Prompts, model-controlled actions through Tools, and static/semi-static data through Resources, there's no standardized way for applications to dynamically modify context based on their own logic and timing.

Key benefits of adding Augmentation as a distinct capability:

  1. Application Control: Enables immediate, efficient context modification when and how the application determines it's needed, without requiring model decisions or tool invocations.

  2. Standardization: Provides a protocol-level solution for common patterns like:

    • Real-time UI action tracking for context-aware assistance
    • Dynamic environment data injection (time, location, system state)
    • Content transformation and enrichment
    • Retrieval augmented generation
    • Pre/post processing of context
  3. Protocol Evolution: Rather than adding multiple feature-specific capabilities, this provides a flexible, generic mechanism for Application-controlled context transformation, keeping the protocol focused on fundamental capabilities.

The capability is designed to be:

  • Generic enough to support diverse use cases
  • Efficient (avoiding unnecessary model invocations)
  • Extensible through schemas and metadata
  • Easy to implement and understand

This approach allows applications to leverage standardized context modification patterns while maintaining flexibility in how they implement specific features.

The hint system provides powerful flexibility in how augmented context is applied:

  • Replace: Indicates the augmented content should replace the original (useful for PII masking, content moderation)
  • Append/Prepend: For traditional RAG systems adding retrieved context
  • Safe/Unsafe: For moderation systems indicating content status
  • Custom: Servers can define additional hints for specialized use cases

This approach avoids the need for complex delta protocols while maintaining flexibility. Each server can implement its own hint semantics, and clients can choose how to handle different hint types.

How Has This Been Tested?

It has not, as this is a starting point for discussion.

Breaking Changes

None.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Note: While document upload capabilities (e.g., for PDF processing) are a common requirement in RAG systems, this is a broader protocol consideration that affects multiple capabilities and would need to be addressed separately.

This PR is submitted in response to the discussion at https://github.com/orgs/modelcontextprotocol/discussions/138 . While schema changes are complete, proposed documentation will be added if there is agreement on the approach and implementation details.

Here are a number of examples showing the flexibility of this capability. It can basically be used for any kind of application-controlled content processing, filtering, transformation, etc.

Example 1: RAG

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "retrieve",
    "context": {
      "type": "text",
      "text": "What are the key features of our new electric vehicle model?"
    },
    "arguments": {
      "maxResults": 3,
      "minRelevance": 0.7
    }
  }
}

And the corresponding response could be:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "The 2025 Model A features a 400-mile range on a single charge and supports ultra-fast charging..."
        },
        "properties": {
          "relevance": 0.92
        }
      },
      {
        "content": {
          "type": "text",
          "text": "Safety features include advanced driver assistance with 360-degree sensor coverage..."
        },
        "properties": {
          "relevance": 0.85
        }
      },
      {
        "content": {
          "type": "text",
          "text": "The dual motor configuration delivers 0-60mph acceleration in 3.8 seconds..."
        },
        "properties": {
          "relevance": 0.78
        }
      }
    ],
    "hint": "prepend"
  }
}

Example 2: Moderation

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "moderate",
    "context": {
      "type": "audio",
      "data": "base64ABC...snip...",
      "mimeType": "audio/wav"
    },
    "arguments": {
      "categories": ["hate", "violence", "explicit"],
      "thresholds": {
        "hate": 0.7,
        "violence": 0.8,
        "explicit": 0.6
      }
    }
  }
}

Example response with clean content:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": { 
          "type": "text",
          "text": ""
        },
        "properties": {
          "status": "ok",
          "scores": {
            "hate": 0.1,
            "violence": 0.05,
            "explicit": 0.15
          }
        }
      }
    ]
  }
}

Example response with problematic content:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "I apologize, but I cannot process this audio as it appears to contain content that violates our content policies regarding explicit language."
        },
        "properties": {
          "status": "rejected",
          "scores": {
            "hate": 0.2,
            "violence": 0.3,
            "explicit": 0.85
          }
        }
      }
    ]
  }
}

Example 3: Tool Relevance Filtering

An example request to reduce the number of tools sent to an LLM by analyzing the request before sending it. Server could implement this via embeddings or a fast, cheap LLM.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "toolRelevance",
    "context": {
      "type": "text",
      "text": "Can you help me figure out why the performance of my investment portfolio has suddenly changed?"
    },
    "arguments": {
      "tools": [
        {
          "name": "search_news",
          "description": "Search recent news articles",
          "inputSchema": {
            "type": "object",
            "properties": {
              "query": { "type": "string" },
              "days": { "type": "number" }
            }
          }
        },
        {
          "name": "get_stock_data",
          "description": "Fetch stock market performance data",
          "inputSchema": {
            "type": "object",
            "properties": {
              "symbol": { "type": "string" },
              "period": { "type": "string" }
            }
          }
        },
        {
          "name": "weather_forecast",
          "description": "Get weather forecast for a location",
          "inputSchema": {
            "type": "object",
            "properties": {
              "location": { "type": "string" },
              "days": { "type": "number" }
            }
          }
        }
      ]
    }
  }
}

Response showing tool relevance scoring:

{
  "jsonrpc": "2.0",
  "id": 1,
  "response":
  {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "get_stock_data"
        },
        "properties": {
          "relevance": 0.95,
        }
      },
      {
        "content": {
          "type": "text",
          "text": "search_news"
        },
        "properties": {
          "relevance": 0.82,
        }
      },
      {
        "content": {
          "type": "text",
          "text": "weather_forecast"
        },
        "properties": {
          "relevance": 0.12,
        }
      }
    ],
    "hint": "filter"
  }
}

Example 4: User App Activity Injection

(TODO)

Example 5: Vehicle Assistance App

(TODO)

@PederHP PederHP changed the title Add Augmentation capability for context enrichment [proposal] Add Augmentation capability for context enrichment Jan 15, 2025
Copy link
Contributor

@gsabran gsabran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! Is paginating the augmentation response a concern? I don't have an opinion but it seemed worth asking.

schema/schema.ts Outdated Show resolved Hide resolved
@PederHP
Copy link
Author

PederHP commented Jan 16, 2025

This looks promising! Is paginating the augmentation response a concern? I don't have an opinion but it seemed worth asking.

It should match Sampling I think. I can't recall if those can be paginated. My intuition is that augmentations which need pagination can be batched instead.

I guess an augmentation which added a ton of context would benefit from pagination. I can't really think what that would actually be in practice. But I suppose message history could conceivably be a prepend augmentation that could get rather large. But on the other hand it will get sent as context to an LLM in a single request which speaks in favor of not paginating.

Comment on lines +777 to +780
/**
* Optional additional properties.
*/
[key: string]: unknown;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's good for specs to have those catch all properties. The point of the spec is to specify the interaction model. I know this exists already in MCP, but mostly in places where the schema is specified at runtime. For instance this freeform type exists in CallToolRequest, but it should match what the server describes in list tools

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that sounds like a good change, we can remove the properties wrapper and directly inline the few that are specified

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's important for this capability to have it, for the same reason that Tools has it. And there is a list augments method. I think it is important to support a wide variety of Application-controlled context modifications/injections. Even within RAG there are multiple variants. One might want to provide maxTokens, another might have a argument for whether to include keyword search, there could be reranker parameters, etc.

I realize that others might prefer a more strictly RAG aligned capability, and for that it would make sense to avoid complicating the schema. But I think there is a strong case for generalizing non-Model controlled context modification. Otherwise it might end up with a succession of capabilities being added for each new kind of context injection. And personally, I think the Tools paradigm is awesome, and something to try and emulate on an Application level. Because a capability which is that flexible would encompass so many things.

Yes, it does mean that a RAG MCP server needs to provide a schema and the Application needs to have logic in place that matches it. But wouldn't that always be the case for any kind of Application-controlled context modification? There are already so many different RAG variants. I think having a protocol that is as flexible as possible increases the chances it will be useful for any kind of RAG. Essentially a (context, properties) to (context, metadata) function. That fits so many things.

/**
* Relevance of the content, as often used in RAG systems.
*/
relevance?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other optional property worth considering would be uri to allow to link to the augmented content, as is a common pattern, eg:

Screenshot 2025-01-20 at 7 03 42 PM Screenshot 2025-01-20 at 7 04 42 PM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some RAG systems yes, this can make good sense. Not opposed to adding it, but I think it also plays into the whole generic vs RAG-specific discussion. If this is a more generic capability, I think there might be a point where too many convenience properties are added, because they could already be added through the property dictionary, and thus suit the individual RAG system.

Not opposed to this, just want to see what others think about the whole generic vs more RAG-specific before actually adding it to the PR. This is a starting point for discussion still, more than a very focused spec change request. Still have to convince the maintainers that additional Application-controlled capabilities are needed, for one thing, and comments elsewhere indicate that at least some of them aren't entirely onboard with that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants