Skip to content

Commit 75013e3

Browse files
authored
docs: add blog of Introducing NPi (#30)
1 parent 6139639 commit 75013e3

File tree

4 files changed

+193
-1
lines changed

4 files changed

+193
-1
lines changed

docs/assets/npi-arch.png

208 KB
Loading

docs/pages/_meta.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"browser-apps": "",
88
"cli": "CLI Reference",
99
"python": "Python SDK Reference",
10-
10+
"blog": "Blog",
1111
"contact": {
1212
"title": "Contact ↗",
1313
"type": "page",

docs/pages/blog/_meta.json

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"introducing-npi": "Introducing NPi"
3+
}

docs/pages/blog/introducing-npi.mdx

+189
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Introducing NPi
2+
3+
## Background
4+
5+
Since ChatGPT's release, there has been a surge in AI applications designed for natural conversations. However, their
6+
practical usefulness is often limited by a lack of automatic action-taking capabilities
7+
8+
The evolving concept of Agent addresses this gap. Agent AI is a class of interactive systems that can perceive visual
9+
stimuli, language inputs, and other environmentally grounded data, and can produce meaningful embodied actions<sub>[1, Agent AI Li Fei-Fei]</sub>.
10+
11+
> Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software
12+
industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons.
13+
>
14+
> **The future of Agents, Bill Gates**.
15+
16+
A major advantage humans hold over other animals is using tools. This also one of the AI Agents' key abilities is noted by Andrew Ng<sub>[3]</sub>,
17+
18+
However, building an AI agent with a robust ability to use tools is challenging due to the diversity of tools and the
19+
operational overhead involved.
20+
21+
- Low-level primitives, such as HTTP APIs or SDKs, lead organizations to repeatedly writing similar code to integrate LLM with
22+
different applications
23+
- Maintaining non-business critical features, like State Management, Availability, and Authorization flows, incurs
24+
significant overhead.
25+
- Ensuring the security of AI Agents — making their actions controllable, predictable, and explainable — poses
26+
substantial challenges.
27+
28+
This is why NPi was created: to offer AI Agent developers an easy-to-use and reliable platform that enhances their
29+
agents' robust tool-use capabilities.
30+
31+
## What is NPi?
32+
33+
On April 25, we launched NPi (`v0.0.1`) on [GitHub](https://github.com/npi-ai/npi), a free, open-source platform.
34+
NPi provides **Tool Use** APIs that empower AI agents to operate and interact with various software tools and applications.
35+
36+
The primary goal of NPi is to offer a unified interface that allows Large Language Models to seamlessly integrate with
37+
the existing software and applications ecosystem through function calls. NPi serves as a gateway for these models to
38+
access the virtual world.
39+
40+
The core principle of NPi is to focus on `in-app planning`. This requires users to break down tasks into `single-app`
41+
sub-tasks, meaning **each task is confined to one application**. NPi then interpreters these sub-tasks into a series of
42+
function calls, executing them in a rule-based manner to ensure precise control.
43+
44+
This method, known as divide-and-conquer, is a common strategy for solving complex problems and is central to NPi's designing.
45+
46+
To date, we have implemented the core functionalities of NPi, including:
47+
48+
### Out-of-box multimodal Tool use APIs
49+
50+
We provide ready-to-use APIs that allow large language models to interact with applications, demonstrated in the
51+
following examples:
52+
53+
```python
54+
from npiai.app.google import Calendar
55+
from npiai.app.github import GitHub
56+
from npiai.app.twitter import Twitter
57+
58+
calendar = Calendar()
59+
calendar.chat("...")
60+
61+
github = GitHub()
62+
github.chat("...")
63+
64+
# For non-API friendly cases, a visual-based approach leverages the web browser.
65+
twitter = Twitter(visual=True)
66+
twitter.chat("what's the @wellswfwang latest post?")
67+
```
68+
69+
Under the hood, NPi is pre-integrated with specific applications' SDKs or APIs, interpreting the given task into a
70+
sequence of function calls.
71+
72+
We continuously monitor changes in these SDKs and APIs to stay aligned with them. This ensures NPi remains up-to-date,
73+
relieving you of the burden of tracking these changes yourself.
74+
75+
### Multi-agent collaboration
76+
77+
A clean and easy-to-use interface for building multi-agents applications.
78+
79+
```python
80+
from npiai.core import Agent
81+
82+
agent1 = Agent(prompt="...")
83+
agent1.use(Gmail(), Calendar())
84+
85+
agent2 = Agent(prompt="...")
86+
agent2.use(GitHub())
87+
88+
agent3 = Agent.collaborate(agent1, agent2)
89+
agent3.run(task="...")
90+
```
91+
92+
`agent3` acts as a coordinator, orchestrating the operations of `agent1` and `agent2`.
93+
94+
### Human-in-the-loop(HITL)
95+
96+
97+
A simple and effective way to ensure human involvement in handling sensitive operations appropriately.
98+
99+
> This "human in the loop" approach is an essential step in ensuring that language models behave responsibly, generate accurate responses, and align with ethical and safety standards
100+
>
101+
> Large Language Model: Data, Human in the Loop for Fine-Tuning, [4]
102+
103+
104+
For example, in `Gmail` app, we pre-set the `sending email action` as sensitive, each calling of this action needs human
105+
to approve sending.
106+
107+
```python
108+
# These HITL APIs will be released in v0.0.2
109+
from npiai.core.hitl import HITLRequest, HITLResponse, RequestApproved, RequestDenied, Console
110+
from npiai_proto import api_pb2
111+
112+
def human_assist(req: api_pb2.HITLRequest) -> HITLResponse:
113+
console = Console(req) # you can integrate your own workflow
114+
if req.type == api_pb2.ActionType.SAFEGUARD:
115+
result = console.wait()
116+
if result.is_approved():
117+
return RequestApproved
118+
if req.action == api_pb2.ActionType.MORE_INFORMATION:
119+
result = console.wait()
120+
return result.human_message()
121+
return RequestDenied
122+
123+
gmail = Gmail()
124+
gmail.hitl_handler(human_assist)
125+
```
126+
127+
Additionally, you can easily change this behavior by providing customized configuration.
128+
129+
### Minimize operational overhead
130+
131+
Imagine developing an AI Agent to negotiate meeting times with stakeholders. Initially, the Agent proposes several time
132+
slots and emails stakeholders for confirmation. While awaiting responses, the process not always immediate, various
133+
low-probability issues such as unexpected shutdowns or network errors may occur.
134+
135+
NPi is designed to manage these states, ensuring recovery despite such disruptions, freeing developers from handling
136+
these annoying edge cases themselves.
137+
138+
Beyond these, Fine-tuning, evaluation, and cost-effectiveness are also in our roadmap.
139+
140+
## How does NPi work?
141+
142+
NPi has an architecture that consists of two primary components: the **Server** and the **Toolkits**. The developers use
143+
NPi SDKs to develop their AI Agents, and use the CLI or Web Console to customize NPi.
144+
145+
![../../assets/npi-arch.png](../../assets/npi-arch.png)
146+
147+
The Server has two main responsibilities:
148+
149+
- Management Functionalities: This includes App API management, authorization, and advanced options such as fine-tuning
150+
and evaluating function calls.
151+
- Function Calling Runtime: The server interprets tasks into a sequence of function calls based on in-app planning and
152+
executes them. It also handles the persistence of function call states if necessary and manages communications with the
153+
client side. This is particularly crucial for cross-agents communication and incorporating human input.
154+
155+
The Toolkits are a collection of tools that designed to enhance the developer experience:
156+
157+
- **Multi-language SDKs**: We provide SDKs in various programming languages, making it easy for developers to integrate
158+
NPi with their agents and enhance their tool-using capabilities.
159+
- **CLI**: Our command-line tool offers a straightforward interface for managing apps, authorizing, and exploring NPi’s
160+
core functionalities.
161+
- **Web Console**: A web-based interface that enables more intricate interactions with NPi, including calling-memory
162+
management, fine-tuning, evaluation, and observability. (Note: This feature has not been released yet.)
163+
164+
## What's next?
165+
166+
We are excited to announce the release of NPi, a platform still in its early stages, We are actively working on
167+
implementing features that mentioned previous sections.
168+
169+
The mission of NPi is to act as the limbs for large language models, enabling AI Agents to interact seamlessly with
170+
the virtual world. Although AI Agents, particularly in utilizing the Tool Use pattern, have not yet been widely adopted,
171+
we are optimistic that NPi will accelerate this adoption and bring us closer to Artificial General Intelligence (AGI).
172+
173+
To realize this vision, we are eager to involve more AI Agent developers and build a vibrant community to collaboratively
174+
shape the future of NPi.
175+
176+
Explore our development plans on the [NPi Roadmap](https://docs.npi.ai/roadmap). Connect with us on [GitHub](https://github.com/npi-ai/npi),
177+
[X.com](https://x.com/npi_ai), or join our [Discord Community](https://discord.gg/MQTuXtbj). You can also reach out directly
178+
to [our CEO](https://twitter.com/wellswfwang) via X.com.
179+
180+
Your support, insights, and feedback are highly valued and greatly appreciated!
181+
182+
We look forward to meeting you in the NPi community and exploring this exciting future together.
183+
184+
## Reference
185+
186+
1. [Agent AI: Surveying the Horizons of Multimodal Interaction, Stanford](https://arxiv.org/abs/2401.03568)
187+
2. [The future of Agents, Bill Gates](https://www.gatesnotes.com/AI-agents)
188+
3. [Tool use, a key design pattern of AI agentic workflowsAI Agent, Andrew Ng](https://twitter.com/AndrewYNg/status/1775951610059141147)
189+
4. [Large Language Model: Data, Human in the Loop for Fine-Tuning](https://www.futurebeeai.com/blog/large-language-model-data-human-in-the-loop-for-fine-tuning)

0 commit comments

Comments
 (0)