Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch VLLM when building Docker #92

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 7 additions & 10 deletions benchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,20 +51,17 @@ python utils/prompt_client_cli.py \
```

### using vllm/benchmarking/benchmark_serving.py
Within the Docker container, use the benchmark_serving.patch file:
Within the Docker container, in one shell start the server:
```
cd ~/app/src
python run_vllm_api_server.py
source $PYTHON_ENV_DIR/bin/activate # activate python env
python /home/$CONTAINER_APP_USERNAME/app/src/run_vllm_api_server.py
```
This simply stops the benchmarking script from sending the `best_of` arg which is not supported and causes issues.

To run the benchmarks, in another shell into the Docker container:
Then in another shell run the benchmarks:

```
cd ~/vllm
git apply ~/app/benchmarking/benchmark_serving.patch
cd ~/app
export PYTHONPATH=$PYTHONPATH:$PWD
python benchmarking/vllm_online_benchmark.py
source $PYTHON_ENV_DIR/bin/activate # activate python env as well
python /home/$CONTAINER_APP_USERNAME/app/benchmarking/vllm_online_benchmark.py
```

The output will be available for each input/output sequence length defined and time stamped.
Expand Down
4 changes: 4 additions & 0 deletions vllm-tt-metal-llama3/vllm.llama3.src.dev.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,10 @@ COPY --chown=${CONTAINER_APP_USERNAME}:${CONTAINER_APP_USERNAME} "locust" "${APP
RUN /bin/bash -c "source ${PYTHON_ENV_DIR}/bin/activate \
&& pip install --default-timeout=240 --no-cache-dir -r requirements.txt"

# apply patch to remove best-of argument from vllm
WORKDIR ${vllm_dir}
RUN git apply ${APP_DIR}/benchmarking/benchmark_serving.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch file is meant to be temporary, if we're going to be using it longer then we should just add the patch to our vLLM fork, or obviate it with proper support for the options it removes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this with the intention to streamline the user experience, as it's currently one manual step in order to run the benchmarks. I thought since it's in this repository, that it's purpose was to workaround VLLM change.

All good! I'll reject this PR.
I created an issue as well: #93


WORKDIR "${APP_DIR}/src"

# Switch back to root for entrypoint
Expand Down