diff --git a/.gitignore b/.gitignore index b32a9ab..18d4938 100755 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,4 @@ temp.py *.png !/experiment/input/**/*.png *.pyc +!/docs/**/*.png \ No newline at end of file diff --git a/docs/figs/accuracy_vs_round_bias.png b/docs/figs/accuracy_vs_round_bias.png new file mode 100644 index 0000000..e14c121 Binary files /dev/null and b/docs/figs/accuracy_vs_round_bias.png differ diff --git a/docs/figs/accuracy_vs_round_structure.png b/docs/figs/accuracy_vs_round_structure.png new file mode 100644 index 0000000..f49267e Binary files /dev/null and b/docs/figs/accuracy_vs_round_structure.png differ diff --git a/docs/figs/architecture.png b/docs/figs/architecture.png new file mode 100644 index 0000000..5f8da58 Binary files /dev/null and b/docs/figs/architecture.png differ diff --git a/docs/figs/correct_prop_vs_network_type.png b/docs/figs/correct_prop_vs_network_type.png new file mode 100644 index 0000000..6bf0a7c Binary files /dev/null and b/docs/figs/correct_prop_vs_network_type.png differ diff --git a/docs/figs/neighbours_accuracy.png b/docs/figs/neighbours_accuracy.png new file mode 100644 index 0000000..6893481 Binary files /dev/null and b/docs/figs/neighbours_accuracy.png differ diff --git a/docs/figs/opinion_changes_fully_connected.png b/docs/figs/opinion_changes_fully_connected.png new file mode 100644 index 0000000..a4c45dd Binary files /dev/null and b/docs/figs/opinion_changes_fully_connected.png differ diff --git a/docs/figs/opinion_changes_fully_disconnected.png b/docs/figs/opinion_changes_fully_disconnected.png new file mode 100644 index 0000000..db51669 Binary files /dev/null and b/docs/figs/opinion_changes_fully_disconnected.png differ diff --git a/docs/figs/opinion_changes_random.png b/docs/figs/opinion_changes_random.png new file mode 100644 index 0000000..9b4782f Binary files /dev/null and b/docs/figs/opinion_changes_random.png differ diff --git a/docs/figs/opinion_changes_scale_free_correct_edge.png b/docs/figs/opinion_changes_scale_free_correct_edge.png new file mode 100644 index 0000000..4f84e15 Binary files /dev/null and b/docs/figs/opinion_changes_scale_free_correct_edge.png differ diff --git a/docs/figs/opinion_changes_scale_free_correct_hub.png b/docs/figs/opinion_changes_scale_free_correct_hub.png new file mode 100644 index 0000000..03f455b Binary files /dev/null and b/docs/figs/opinion_changes_scale_free_correct_hub.png differ diff --git a/docs/figs/opinion_changes_scale_free_incorrect_edge.png b/docs/figs/opinion_changes_scale_free_incorrect_edge.png new file mode 100644 index 0000000..4861197 Binary files /dev/null and b/docs/figs/opinion_changes_scale_free_incorrect_edge.png differ diff --git a/docs/figs/opinion_changes_scale_free_incorrect_hub.png b/docs/figs/opinion_changes_scale_free_incorrect_hub.png new file mode 100644 index 0000000..01208b7 Binary files /dev/null and b/docs/figs/opinion_changes_scale_free_incorrect_hub.png differ diff --git a/docs/figs/opinion_changes_scale_free_unbiased.png b/docs/figs/opinion_changes_scale_free_unbiased.png new file mode 100644 index 0000000..d965f9f Binary files /dev/null and b/docs/figs/opinion_changes_scale_free_unbiased.png differ diff --git a/docs/figs/random_networks.png b/docs/figs/random_networks.png new file mode 100644 index 0000000..eb2b4cb Binary files /dev/null and b/docs/figs/random_networks.png differ diff --git a/docs/figs/sf_networks.png b/docs/figs/sf_networks.png new file mode 100644 index 0000000..778d9f4 Binary files /dev/null and b/docs/figs/sf_networks.png differ diff --git a/docs/figs/simpson_fully_connected.png b/docs/figs/simpson_fully_connected.png new file mode 100644 index 0000000..2566f06 Binary files /dev/null and b/docs/figs/simpson_fully_connected.png differ diff --git a/docs/figs/simpson_fully_disconnected.png b/docs/figs/simpson_fully_disconnected.png new file mode 100644 index 0000000..663d458 Binary files /dev/null and b/docs/figs/simpson_fully_disconnected.png differ diff --git a/docs/figs/simpson_random.png b/docs/figs/simpson_random.png new file mode 100644 index 0000000..c7039b9 Binary files /dev/null and b/docs/figs/simpson_random.png differ diff --git a/docs/figs/simpson_scale_free_unbiased.png b/docs/figs/simpson_scale_free_unbiased.png new file mode 100644 index 0000000..f5aa04c Binary files /dev/null and b/docs/figs/simpson_scale_free_unbiased.png differ diff --git a/docs/index.html b/docs/index.html index a3c48e9..07ac436 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1,44 +1,239 @@ -
- -To improve the reasoning and question-answering capabilities of Large Language Models (LLMs), several multi-agent approaches have been introduced. While these methods enhance performance, the application of collective intelligence-based approaches to complex network structures and the dynamics of agent interactions remain underexplored. This work extends the concept of multi-agent debate to more general network topologies, measuring the question-answering accuracy, influence, consensus, and the effects of bias on the collective. The results show that random networks perform similarly to fully connected networks despite using significantly fewer tokens. Furthermore, a strong consensus among agents correlates with correct answers, whereas divided responses typically indicate incorrect answers. Analyzing the influence of the agents reveals a balance between self-reflection and interconnectedness; self-reflection aids when local interactions are incorrect, and local interactions aid when the agent itself is incorrect. Additionally, bias plays a strong role in system performance with correctly biased hub nodes boosting performance. These insights suggest that using random networks or scale-free networks with knowledgeable agents placed in central positions can enhance the overall performance of multi-agent systems.
+ +LLMs have demonstrated impressive performance on various tasks but still struggle with hallucinations and incorrect answers. Multi-agent approaches, inspired by human problem-solving, have been introduced to address these issues. Techniques like ReAct and Reflexion enable LLMs to engage in iterative reasoning and self-reflection. This work explores multi-agent systems on scale-free networks to understand agent influence and network topology effects on performance, extending the concept of multi-agent debate to these complex networks.
+ +LLM agents are represented as nodes in a network with edges indicating communication channels. In multi-agent debate, agents solve problems individually, then reconsider answers based on neighbors' responses and previous answers. This process repeats for several rounds, culminating in a majority vote for the collective answer. We introduce bias by providing certain agents with correct or incorrect answers and analyze their influence based on network position (hubs or edges).
+ +Experiments were conducted using three scale-free networks, each with 25 GPT-3.5-Turbo agents, engaging in four rounds of debate to answer 100 high-school mathematics questions from the MMLU dataset. The experiment was repeated three times for statistical significance. Bias was introduced into hub or edge nodes, comparing performance with unbiased networks to observe how biased nodes influence information spread and overall accuracy.
+ +Comparing the QA performance between different types of networks, it is evident that structure plays a role in the accuracy, as shown in Table 1. In particular, random networks achieve similar performance to fully connected networks, while using 250 times fewer input tokens per round of debate. Scale-free networks, which use a similar amount of input tokens per round to random networks, exhibit worse performance than random networks, suggesting the random network topology is superior for problem-solving tasks. In contrast, fully disconnected networks demonstrate the lowest performance, highlighting the importance of collaborative problem-solving.
+Network | +Tokens per Round | +Accuracy | +
---|---|---|
Fully Connected | +125000 | +67.7 ± 1.1% | +
Fully Disconnected | +5000 | +63.9 ± 0.4% | +
Random | +28600 | +68.2 ± 0.5% | +
Scale-Free | +21800 | +64.8 ± 1.0% | +
Comparing the performance between biased and unbiased systems, it is found that bias also plays a role in QA accuracy, shown in Table 2. Networks with correctly (incorrectly) biased nodes at their hubs perform significantly better (worse) than their unbiased counterpart. In particular, networks with correctly biased hub nodes performed twice as well when compared to networks with incorrectly biased hubs, with accuracies of 88.1 ± 0.5% and 43.8 ± 1.5% respectively. Although bias is expected to impact the performance, the significant decrease in accuracy for incorrectly biased networks highlights that it only takes a few biased and well-connected agents, two in this case, to impair the results significantly. Moreover, the stronger comparative performance of the unbiased system demonstrates that although agents may be capable of solving problems correctly, they are easily influenced by incorrect agents. In the case where bias is inserted on the edge of the network on the other hand, it is found that there is little effect on the QA performance.
+Network | +Accuracy | +
---|---|
Unbiased | +64.8 ± 1.0% | +
Correctly Biased (Hub) | +88.1 ± 0.5% | +
Incorrectly Biased (Hub) | +43.8 ± 1.5% | +
Correctly Biased (Edge) | +65.7 ± 1.1% | +
Incorrectly Biased (Edge) | +64.9 ± 1.3% | +
To understand how an agent may be influenced, the probability of the agent being correct in round n, given its previous response and the response of its neighbours in round n-1 is shown in Figure 5. The figure shows that as the number of correct neighbours increases, so too does the probability of the agent being correct. Furthermore, the tendency for green points to lie above red points highlights the positive impact of self-reflection; regardless of the neighbour's response, an agent is more likely to answer correctly if it was correct in the previous round. These findings highlight the importance of both individuality and collective thinking in multi-agent systems. That is, collaborative problem-solving improves the overall performance of the collective, while self-reflection acts to improve performance when local interactions are misguided.
+To further understand the dynamics of these systems, the way in which agents change their answers between rounds is shown in Figure 6. In the case of fully connected, scale-free, and random networks, the number of agents selecting and remaining on the correct answer increases with each round of debate. For fully disconnected networks, on the other hand, the number of agents remaining correct or incorrect is near-constant, with agents continuing to switch between the correct and incorrect answers. Considering bias, networks correctly biased at their hubs exhibit a large number of agents switching from incorrect to correct answers after the first round, in agreement with Figure 4. These agents with correct answers tend to keep the correct response throughout the remaining rounds of debate. Networks incorrectly biased at their hubs, on the other hand, have an increasing number of agents switching from correct to incorrect after round two. This is a significant result, as it highlights the fact that agents may have the correct answer, but will be convinced to switch due to the influence of their biased neighbours.
+We investigate multi-agent approaches to enhance the reasoning and question-answering capabilities of Large Language Models (LLMs). Our study extends the concept of multi-agent debate to more complex network structures, specifically scale-free networks. We measure the question-answering performance, the strength of the consensus formed, and the impact of bias within the network. Results indicate that correctly biased hub nodes significantly improve overall system performance, suggesting that strategically placing knowledgeable agents can boost collective intelligence.
- -Large Language Models (LLMs) have shown remarkable abilities in various tasks, yet they still struggle with hallucinations and incorrect answers. To address these issues, multi-agent approaches inspired by human problem-solving have been introduced. Techniques like ReAct and Reflexion enable LLMs to engage in iterative reasoning and self-reflection, improving their decision-making. However, these methods primarily use single agents. Our work explores multi-agent systems on scale-free networks, aiming to understand how agents influence each other and how network topology affects performance. We extend the concept of multi-agent debate to these complex networks to analyze their dynamics and effectiveness.
- -We represent LLM agents as nodes in a network, with edges indicating communication channels. In multi-agent debate, agents first solve problems individually, then reconsider their answers based on their neighbors' responses and their previous answers. This process repeats for several rounds, culminating in a majority vote to determine the collective answer. We introduce bias by providing certain agents with correct or incorrect answers and study the influence of these biased agents based on their network position (hubs or edges). The impact of these biases on the overall performance and the consensus within the network is analyzed.
- -We conducted experiments using three scale-free networks, each with 25 GPT-3.5-Turbo powered agents. These agents engaged in four rounds of debate to answer 100 high-school mathematics questions from the MMLU dataset. The experiment was repeated three times to ensure statistical significance. To study the effect of bias, we introduced correct and incorrect answers into either the hub or edge nodes and compared the performance with unbiased networks. The goal was to observe how biased nodes influenced the spread of information and the overall accuracy of the system.
- -The introduction of bias into hub nodes had a significant impact on performance. Correctly biased hubs increased the system's accuracy from 64% to 86%, while incorrectly biased hubs reduced it to 42%. This shows that agents are strongly influenced by their neighbors' responses. Networks with biased edge nodes showed little change in performance, indicating that influence is more significant when the biased nodes are centrally located. Our analysis revealed that agents tend to form a consensus when the system answers correctly, but responses are split when the system is incorrect. The presence of bias reduced consensus in incorrect answers, increasing the variability in responses.
- -Our study demonstrates that the strategic placement of knowledgeable agents in central network positions can enhance the overall performance of multi-agent systems. This finding suggests that future multi-agent systems should leverage network topology to optimize collective intelligence. By placing larger, more capable models at network hubs and smaller models at the periphery, it is possible to improve performance without a significant increase in computational cost. Future research should explore different network structures and larger systems to generalize these findings further.
- -This study has important implications for designing future multi-agent systems. However, it is limited by the number of agents, questions, and rounds used due to computational constraints. Future work should explore a broader range of network structures, including random and small-world networks, and increase the number of agents to better understand the dynamics and performance of these systems. Despite these limitations, our findings provide valuable insights into how bias and network topology influence collective problem-solving and consensus formation in multi-agent systems.
-While the accuracy gave us an insight into the average QA performance of the system, it provides little information on how the answers are distributed inside the network during any given round and whether or not the agents agree. In fact, the network can be correct with less than half of its agents giving the correct answer, due to majority voting. This section explores how and under which conditions a consensus is formed. The percentage of agents in the network that answered the question correctly in the final round is shown in Figure 7. Although this metric highlights the relationship between the consensus towards the correct answer and the overall QA performance, which captures the total number of questions answered correctly, little information is provided on how the answers are distributed.
+To gain insight into the distribution of answers, the Simpson index is used to estimate the level of consensus within the collective. The Simpson index, which is used to quantify diversity, measures the probability that any two randomly selected agents give the same answer in the final round of the experiment. High values for fully connected networks, followed by those for random and scale-free networks, indicate a relationship between network connectivity and the agreement among agents. The results show that a high degree of consensus among agents correlates with correct answers, indicating greater certainty. Conversely, when consensus is lower, the system is more likely to be incorrect.
+Strategic placement of knowledgeable agents in central network positions can enhance multi-agent systems' performance. Future systems should leverage network topology to optimize collective intelligence by placing larger, more capable models at network hubs and smaller models at the periphery, improving performance without significant computational cost. Further research should explore different network structures and larger systems to generalize these findings.
+ +This study has important implications for designing future multi-agent systems but is limited by the number of agents, questions, and rounds due to computational constraints. Future work should explore a broader range of network structures, including random and small-world networks, and increase the number of agents to better understand these systems' dynamics and performance. Despite limitations, findings provide valuable insights into how bias and network topology influence collective problem-solving and consensus formation in multi-agent systems.
+