|
| 1 | +# Introducing NPi |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +Since ChatGPT's release, there has been a surge in AI applications designed for natural conversations. However, their |
| 6 | +practical usefulness is often limited by a lack of automatic action-taking capabilities |
| 7 | + |
| 8 | +The evolving concept of Agent addresses this gap. Agent AI is a class of interactive systems that can perceive visual |
| 9 | +stimuli, language inputs, and other environmentally grounded data, and can produce meaningful embodied actions<sub>[1, Agent AI Li Fei-Fei]</sub>. |
| 10 | + |
| 11 | +> Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software |
| 12 | +industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons. |
| 13 | +> |
| 14 | +> **The future of Agents, Bill Gates**. |
| 15 | +
|
| 16 | +A major advantage humans hold over other animals is using tools. This also one of the AI Agents' key abilities is noted by Andrew Ng<sub>[3]</sub>, |
| 17 | + |
| 18 | +However, building an AI agent with a robust ability to use tools is challenging due to the diversity of tools and the |
| 19 | +operational overhead involved. |
| 20 | + |
| 21 | +- Low-level primitives, such as HTTP APIs or SDKs, lead organizations to repeatedly writing similar code to integrate LLM with |
| 22 | +different applications |
| 23 | +- Maintaining non-business critical features, like State Management, Availability, and Authorization flows, incurs |
| 24 | +significant overhead. |
| 25 | +- Ensuring the security of AI Agents — making their actions controllable, predictable, and explainable — poses |
| 26 | +substantial challenges. |
| 27 | + |
| 28 | +This is why NPi was created: to offer AI Agent developers an easy-to-use and reliable platform that enhances their |
| 29 | +agents' robust tool-use capabilities. |
| 30 | + |
| 31 | +## What is NPi? |
| 32 | + |
| 33 | +On April 25, we launched NPi (`v0.0.1`) on [GitHub](https://github.com/npi-ai/npi), a free, open-source platform. |
| 34 | +NPi provides **Tool Use** APIs that empower AI agents to operate and interact with various software tools and applications. |
| 35 | + |
| 36 | +The primary goal of NPi is to offer a unified interface that allows Large Language Models to seamlessly integrate with |
| 37 | +the existing software and applications ecosystem through function calls. NPi serves as a gateway for these models to |
| 38 | +access the virtual world. |
| 39 | + |
| 40 | +The core principle of NPi is to focus on `in-app planning`. This requires users to break down tasks into `single-app` |
| 41 | +sub-tasks, meaning **each task is confined to one application**. NPi then interpreters these sub-tasks into a series of |
| 42 | +function calls, executing them in a rule-based manner to ensure precise control. |
| 43 | + |
| 44 | +This method, known as divide-and-conquer, is a common strategy for solving complex problems and is central to NPi's designing. |
| 45 | + |
| 46 | +To date, we have implemented the core functionalities of NPi, including: |
| 47 | + |
| 48 | +### Out-of-box multimodal Tool use APIs |
| 49 | + |
| 50 | +We provide ready-to-use APIs that allow large language models to interact with applications, demonstrated in the |
| 51 | +following examples: |
| 52 | + |
| 53 | +```python |
| 54 | +from npiai.app.google import Calendar |
| 55 | +from npiai.app.github import GitHub |
| 56 | +from npiai.app.twitter import Twitter |
| 57 | + |
| 58 | +calendar = Calendar() |
| 59 | +calendar.chat("...") |
| 60 | + |
| 61 | +github = GitHub() |
| 62 | +github.chat("...") |
| 63 | + |
| 64 | +# For non-API friendly cases, a visual-based approach leverages the web browser. |
| 65 | +twitter = Twitter(visual=True) |
| 66 | +twitter.chat("what's the @wellswfwang latest post?") |
| 67 | +``` |
| 68 | + |
| 69 | +Under the hood, NPi is pre-integrated with specific applications' SDKs or APIs, interpreting the given task into a |
| 70 | +sequence of function calls. |
| 71 | + |
| 72 | +We continuously monitor changes in these SDKs and APIs to stay aligned with them. This ensures NPi remains up-to-date, |
| 73 | +relieving you of the burden of tracking these changes yourself. |
| 74 | + |
| 75 | +### Multi-agent collaboration |
| 76 | + |
| 77 | +A clean and easy-to-use interface for building multi-agents applications. |
| 78 | + |
| 79 | +```python |
| 80 | +from npiai.core import Agent |
| 81 | + |
| 82 | +agent1 = Agent(prompt="...") |
| 83 | +agent1.use(Gmail(), Calendar()) |
| 84 | + |
| 85 | +agent2 = Agent(prompt="...") |
| 86 | +agent2.use(GitHub()) |
| 87 | + |
| 88 | +agent3 = Agent.collaborate(agent1, agent2) |
| 89 | +agent3.run(task="...") |
| 90 | +``` |
| 91 | + |
| 92 | +`agent3` acts as a coordinator, orchestrating the operations of `agent1` and `agent2`. |
| 93 | + |
| 94 | +### Human-in-the-loop(HITL) |
| 95 | + |
| 96 | + |
| 97 | +A simple and effective way to ensure human involvement in handling sensitive operations appropriately. |
| 98 | + |
| 99 | +> This "human in the loop" approach is an essential step in ensuring that language models behave responsibly, generate accurate responses, and align with ethical and safety standards |
| 100 | +> |
| 101 | +> Large Language Model: Data, Human in the Loop for Fine-Tuning, [4] |
| 102 | +
|
| 103 | + |
| 104 | +For example, in `Gmail` app, we pre-set the `sending email action` as sensitive, each calling of this action needs human |
| 105 | +to approve sending. |
| 106 | + |
| 107 | +```python |
| 108 | +# These HITL APIs will be released in v0.0.2 |
| 109 | +from npiai.core.hitl import HITLRequest, HITLResponse, RequestApproved, RequestDenied, Console |
| 110 | +from npiai_proto import api_pb2 |
| 111 | + |
| 112 | +def human_assist(req: api_pb2.HITLRequest) -> HITLResponse: |
| 113 | + console = Console(req) # you can integrate your own workflow |
| 114 | + if req.type == api_pb2.ActionType.SAFEGUARD: |
| 115 | + result = console.wait() |
| 116 | + if result.is_approved(): |
| 117 | + return RequestApproved |
| 118 | + if req.action == api_pb2.ActionType.MORE_INFORMATION: |
| 119 | + result = console.wait() |
| 120 | + return result.human_message() |
| 121 | + return RequestDenied |
| 122 | + |
| 123 | +gmail = Gmail() |
| 124 | +gmail.hitl_handler(human_assist) |
| 125 | +``` |
| 126 | + |
| 127 | +Additionally, you can easily change this behavior by providing customized configuration. |
| 128 | + |
| 129 | +### Minimize operational overhead |
| 130 | + |
| 131 | +Imagine developing an AI Agent to negotiate meeting times with stakeholders. Initially, the Agent proposes several time |
| 132 | +slots and emails stakeholders for confirmation. While awaiting responses, the process not always immediate, various |
| 133 | +low-probability issues such as unexpected shutdowns or network errors may occur. |
| 134 | + |
| 135 | +NPi is designed to manage these states, ensuring recovery despite such disruptions, freeing developers from handling |
| 136 | +these annoying edge cases themselves. |
| 137 | + |
| 138 | +Beyond these, Fine-tuning, evaluation, and cost-effectiveness are also in our roadmap. |
| 139 | + |
| 140 | +## How does NPi work? |
| 141 | + |
| 142 | +NPi has an architecture that consists of two primary components: the **Server** and the **Toolkits**. The developers use |
| 143 | +NPi SDKs to develop their AI Agents, and use the CLI or Web Console to customize NPi. |
| 144 | + |
| 145 | + |
| 146 | + |
| 147 | +The Server has two main responsibilities: |
| 148 | + |
| 149 | +- Management Functionalities: This includes App API management, authorization, and advanced options such as fine-tuning |
| 150 | +and evaluating function calls. |
| 151 | +- Function Calling Runtime: The server interprets tasks into a sequence of function calls based on in-app planning and |
| 152 | +executes them. It also handles the persistence of function call states if necessary and manages communications with the |
| 153 | +client side. This is particularly crucial for cross-agents communication and incorporating human input. |
| 154 | + |
| 155 | +The Toolkits are a collection of tools that designed to enhance the developer experience: |
| 156 | + |
| 157 | +- **Multi-language SDKs**: We provide SDKs in various programming languages, making it easy for developers to integrate |
| 158 | +NPi with their agents and enhance their tool-using capabilities. |
| 159 | +- **CLI**: Our command-line tool offers a straightforward interface for managing apps, authorizing, and exploring NPi’s |
| 160 | +core functionalities. |
| 161 | +- **Web Console**: A web-based interface that enables more intricate interactions with NPi, including calling-memory |
| 162 | +management, fine-tuning, evaluation, and observability. (Note: This feature has not been released yet.) |
| 163 | + |
| 164 | +## What's next? |
| 165 | + |
| 166 | +We are excited to announce the release of NPi, a platform still in its early stages, We are actively working on |
| 167 | +implementing features that mentioned previous sections. |
| 168 | + |
| 169 | +The mission of NPi is to act as the limbs for large language models, enabling AI Agents to interact seamlessly with |
| 170 | +the virtual world. Although AI Agents, particularly in utilizing the Tool Use pattern, have not yet been widely adopted, |
| 171 | +we are optimistic that NPi will accelerate this adoption and bring us closer to Artificial General Intelligence (AGI). |
| 172 | + |
| 173 | +To realize this vision, we are eager to involve more AI Agent developers and build a vibrant community to collaboratively |
| 174 | +shape the future of NPi. |
| 175 | + |
| 176 | +Explore our development plans on the [NPi Roadmap](https://docs.npi.ai/roadmap). Connect with us on [GitHub](https://github.com/npi-ai/npi), |
| 177 | +[X.com](https://x.com/npi_ai), or join our [Discord Community](https://discord.gg/MQTuXtbj). You can also reach out directly |
| 178 | +to [our CEO](https://twitter.com/wellswfwang) via X.com. |
| 179 | + |
| 180 | +Your support, insights, and feedback are highly valued and greatly appreciated! |
| 181 | + |
| 182 | +We look forward to meeting you in the NPi community and exploring this exciting future together. |
| 183 | + |
| 184 | +## Reference |
| 185 | + |
| 186 | +1. [Agent AI: Surveying the Horizons of Multimodal Interaction, Stanford](https://arxiv.org/abs/2401.03568) |
| 187 | +2. [The future of Agents, Bill Gates](https://www.gatesnotes.com/AI-agents) |
| 188 | +3. [Tool use, a key design pattern of AI agentic workflowsAI Agent, Andrew Ng](https://twitter.com/AndrewYNg/status/1775951610059141147) |
| 189 | +4. [Large Language Model: Data, Human in the Loop for Fine-Tuning](https://www.futurebeeai.com/blog/large-language-model-data-human-in-the-loop-for-fine-tuning) |
0 commit comments