Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support System Message injection by the MCP Servers for instruction injection and state exposure. #148

Closed
Altaflux opened this issue Jan 20, 2025 · 15 comments
Labels
enhancement New feature or request

Comments

@Altaflux
Copy link

Altaflux commented Jan 20, 2025

Problem Statement:

Sometimes tools do not give enough context to the Agent about how to best use the MCP server as they need general understanding of what this tools are used for together, such as a Toolkit of sorts. There may also be cases where servers are stateful and may want to provide the LLM with context that may help decide when and how to best use the tools.

Examples:

  • State of the entity the MCP server is controlling: Something is ON/OFF, a list of active resources.
  • Toolkit instructions: An instruction that tells the LLM which tools belong together and how to best use them as a whole.

Proposed Solution:

The proposed solution is to allow the MCP Client to request a server for a System prompt (or multiple), this request may be done for each interaction with the LLM so the System Message is always aligned with the state of the server in case stateful context needs to be shared with the Agent.

The prompts returned by the server can then be used and merged with the agent's main system prompt.

Alternatives:

An addition to the above or an alternative as a whole is to allow an server to attach as snippet of a user message prompt into the user message the agent is generating. This snippet would contain stateful information about the MCP server which might be relevant to the agent and informs it about the current state. One benefit of this approach is that the snippet of the user message provided by the server would likely be persisted by the agent in the conversation history, thus letting the LLM reason about how the state of the server has been changing.

Example Request to a Home Automation:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "instructions/get"
}

Example Response:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "Your have the ability to do home automation with the following tools..."
        }
      },
      {
        "content": {
          "type": "text",
          "text": "You can control the following home devices using the home automation tools $(LIST OF HOME DEVICES AND OTHER INFORMATION)"
        }
      }
    ]
  }
}
@Altaflux Altaflux added the enhancement New feature or request label Jan 20, 2025
@PederHP
Copy link

PederHP commented Jan 20, 2025

My Augmentation proposal would support this as an augmentation which returns the system message content to be injected. You could also do it now with the Prompt capability though, if there is no need for arguments or context in the request.

#142

@Altaflux
Copy link
Author

Altaflux commented Jan 20, 2025

Your augmentation proposal is definitely very interesting but I am not sure it covers the scope of my feature.
Your proposal seems to be based around input/out where the MPC server can answer questions and may return context, looks to me that your solution is more like Tools but for the user to use, which I think is a good proposal but does not solves my feature request.

My proposal is focused on the automatic injection of instructions and server's state into the LLM conversation thru System Messages or as an alternative user message snippets that get added to the next message.

I have looked into the Prompts capability but it is severely limited as the expectations are for prompts to be used to execute specific actions when in many cases a user may want to request the LLM something that is outside of the scope of the available list of prompts.

I also just had a discussion on the Discord server of how until now I have not seen a single MCP server implement Prompts correctly, it seems that no one understood their use case and implemented it on very strange ways.

https://discord.com/channels/1312302100125843476/1312302100125843479/1330667910422593640

@PederHP
Copy link

PederHP commented Jan 20, 2025

I haven't explain my proposal well enough then. It isn't intended to limited to RAG or another specific use case, but more as an application controlled context to context call, the behavior of which is server-defined.

So the client sends anything from a single message, the entire conversation to something entirely different and receives back content and a hint (if needed by the server). In your example, I could see an augment which returns a system message based on the context provided by the client.

Now the client still has to actually inject it. So the flow is:

  1. Client calls Augment with context
  2. Client received augment result with context (system message in this case) to be injected
  3. Client sends amended context to LLM

But that's because the design of MCP is that the client is the one interacting with the LLM (as with tools), whereas the server provides context to the client. Servers are not intended, as I understand the protocol design, to be able automate fully injection of context (be that tools, prompts or resources). So with Augmentation that same principle would be followed. Client requests what to add from the server.

I'll try to add an example to the proposal which covers prompt injection (system and/or user) tomorrow.

Fair enough if you find my proposal isn't the right way to do this, but from your answer I haven't clearly described how my proposal is not about a specific type of context or functionality (ie questions), it's about Application-controlled context injection as opposed to Model-controlled context injection (which is provided by Tools).

@Altaflux
Copy link
Author

thanks for your help, i would definitely put attention to your proposal and see that it covers my use case. I will ask some questions on your thread to better understand its use case, if possible please add an example you think tackles my use case. I will add a sample request/response to my ticket to make my use case clearer.

@allenporter
Copy link

MCP servers already support prompts, which can fully serve this purpose as far as I can tell.

@Altaflux
Copy link
Author

@allenporter I have checked using prompts and they do not serve in any sense my use case.
Prompts are there to help a user execute specific actions by forming a prompt to pass to the agent.

My use case is about constantly passing stateful context and general instructions to the LLM thru the conversation.

@allenporter
Copy link

allenporter commented Jan 21, 2025

My impression is that the expectation is to notify clients or prompt updates when state changes.

Prompts can also provide toolkit instructions.

The proposal is to allow clients to request updates to prompts, which clients can already do. A client is free to request updates to prompts using the existing APIs without any extension.

@Altaflux
Copy link
Author

Altaflux commented Jan 21, 2025

@allenporter my feature cannot be implemented with prompts.
The state, instructions I refer to is not about instructions of a specific action to execute (which is Prompt's usecase). My use cases gives context, state, and general instructions about the usage of the MCP server's functionality in a continuous manner by adding message content to the system prompt.

This information is not something that must be triggered manually by the user "which prompts are", prompts MUST be triggered by the user, see Overview https://modelcontextprotocol.io/docs/concepts/prompts

Prompt's use case is to give instructions to the LLM to execute a specific task triggered by the user. My proposal is completely different.

I added an example of the prompt message looks like in the issue.

@allenporter
Copy link

Yes, agree the user needs to select the prompt and tools. Either way, I don't think this deserves an entirely new entity type, when it's r ally a prompt.

It sounds more like a proposal to extend prompts to provide a way for automatically including them in context. However, that seems to imply the user trusts the server to add arbitrary data automatically, and that trust issue needs to be resolved as part of the proposal.

@Altaflux
Copy link
Author

Altaflux commented Jan 21, 2025

I would argue that the use case is quite different.
Prompts are there to represent an action requested by a user, where the message to the LLM is built by the MCP server.

My use case is to provide instructions on how to use the MCP server itself and to provide state about the objects being controlled by the server. In my use case I am not instructing the LLM to do anything but rather to tell it how to best use the capabilities of the server.

The only thing they have in common is that they inject content into the LLM conversation, but the how and why is completely different. Trying to bolt into Prompts two different invocation modes (user-triggered & automatic) for two different use cases may feature creep its usability. I would be willing to discuss an implementation of Prompts that solves my use case but I simply cannot imagine one without shoehorning it in a way where it makes it even more confusing and complicated than it already is.

There are even protocol differences by contract which would make it incompatible with Prompts, in my use case there are no "arguments" to send to the prompt (as shown in my example).

Security wise, worth discussing too, but whatever content this feature may inject into the conversation is just as secure as what Prompts and Tools already do. You cannot control what tools and prompts inject into the conversations.

@jspahrsummers
Copy link
Member

There's a top-level instructions field that servers can return during initialization, which clients could choose to add to the system prompt.

It's intentional that this field is returned once and not modified during the session lifetime, as the instructions should generally not vary during a conversation.

For varying bits of information, you may wish to add a tool that the model can invoke at will, which can return the most up-to-date information in response (e.g., a device list and instructions on how to use those devices).

@PederHP
Copy link

PederHP commented Jan 21, 2025

For varying bits of information, you may wish to add a tool that the model can invoke at will, which can return the most up-to-date information in response (e.g., a device list and instructions on how to use those devices).

But what if the flow of context is to be Application-controlled and not Model-controlled? I think there is a good argument for having a flexible Application-controlled context injection capability, whatever form it might take. Tools is a really strong Model-controlled way to do this, Prompts provides a User-controlled way, but there is a hole currently when it comes to Application-controlled context manipulation. Resource templates are very awkward to use for this.

An example: Injecting information from an MCP server into every user message in a chat platform. This could be dynamic data, such as an activity log since the last injection. A tool would require an extra round trip on every LLM invocation and would require instructions telling the model to call it on every invocation (which might not always be followed). A facility to let the Application control this via an MCP capability would make this much smoother. Especially if the injection is context-dependent or has arguments.

Of course, one can argue that one always do this in the host, but the goes for any of the current capabilities. Especially the Application and User controlled ones.

@jspahrsummers
Copy link
Member

I think that case would be very representable with resource priorities. Basically, establish a resource with a priority of 1, which strongly suggests the application should make use of it. Updates to the resource can be done through the usual "resource changed" notifications as well.

@PederHP
Copy link

PederHP commented Jan 21, 2025

I think that case would be very representable with resource priorities. Basically, establish a resource with a priority of 1, which strongly suggests the application should make use of it. Updates to the resource can be done through the usual "resource changed" notifications as well.

This works well for static context. I am not sure I think the current resource capability is well suited for dynamic context. If resources are to be the go-to for Application-controlled context, I think there is a need for more flexible parametrization than the resource templates allow.

I think it's worth exploring and considering more ways to have Application-controlled context than Resources, which conceptually also communicate something which is more data than function. I guess I'm really missing a parallel to Tools, because that's a really strong and expressive approach to context injection. Anyways, I should probably make an attempt to try and articulate that case with better examples on the spec PR I made. Because I think it is an important leg of a model context protocol to have this.

@Altaflux
Copy link
Author

I wasn't aware that initialize returns back instructions, this covers part of my feature as needed.
But it seems that in the Nodejs client there is no way to get the instructions at all. Any idea how to get the instructions?
https://github.com/modelcontextprotocol/typescript-sdk/blob/main/src/client/index.ts#L126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants