TODOs

Support other verifiers in addition to Dafny:
- Coq
  - The current system struggles with Coq proofs.
Support other LLM infrastructures in addition to Hugging Face:
- Ollama
Support test cases.
Design a steerable interaction to give human or tool feedback to the LLM.
- Confirm completion at each step, and add a comment to steer.
- Edit entire text, e.g. to add imports.
- Give the verifier feedback.
  - Rudimentary display of goal context in case of error:
    - Coq works better than Dafny.
Design higher level steerable schemes:
- Synthesize Coq proofs.
The LLM can get stuck repeating the same completion over and over again, if it doesn't have a diverse set of outputs.
Design a reinforcement learning scheme, whereas the LLM learns from trial.
- Evaluate whether the model after PPO suffers degradation for some tasks, even unrelated.
- Force the PPO solution to converge to an optimal known one, using it entirely for training rather than discovery.
Get wandb to work.
Maybe add a per language blacklist, so that we can rule out uses of Admitted in Coq.
Refactor the shared arguments between llm.py and ppo.py.
Support progressive proofs: proving with weaker consequences, and strengthening them after succeeding in steps.
Improve the syntax guidance.
Glitch: sometimes, the code is not in triple quotes, causing long running completion generations.
Why Coq is much harder than Dafny:
- Dafny has more automation.
- Coq can have idempotent tactics (like simpl) which can be repeated uselessly indefinitely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

TODOs

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls

TODOs