Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dicp][ascend] Optimization for dynamic shape code logic. #791

Merged

Conversation

pdx1989
Copy link
Collaborator

@pdx1989 pdx1989 commented Apr 24, 2024

Optimize dynamic shape handling:

  1. Adjust relationship between SymInt and InputArgs.
  2. Refine variable replacement in in/out shape structure.
  3. Enable complete expression calculation in SymInt replacement.
  4. Some code refinement merging duplicated executing path.
  5. Support lightllm llama 7B dynamic shape version.

@pdx1989 pdx1989 requested a review from jinminxi104 as a code owner April 24, 2024 07:08
@pdx1989 pdx1989 changed the title [WIP][dicp][ascend] Optimization for dynamic shape code logic. [dicp][ascend] Optimization for dynamic shape code logic. May 14, 2024
@jinminxi104 jinminxi104 merged commit 2947a6a into DeepLink-org:main May 29, 2024
7 of 8 checks passed
caikun-pjlab pushed a commit to DeepLink-org/deeplink.framework.dev that referenced this pull request Jun 11, 2024
…rg#791)

* Refine code structure of dynamic shape handling.

* Adjust symint_to_args relationship code logic.

* Remove redundant code.

* Enable 70B get_qkv stage dynamic shape.

* Fix complex size append.

* Change load_and_run in/out shape assignment.

* Refine variable replacement in in/out shape structure.

* Fix merge bugs.

* Change one comment and variable name.

* Fix an array assignment change.

* Code refinement including:

  1.Remove redundant Cast operator.

  2.Change logic of Expand shape proxy.

  3.Merge output stride executing path.

* Get clear idea for expand Cast situation.

* Apply some idea from Gpt AI.

* Revert "Apply some idea from Gpt AI."

This reverts commit 9025019.

* Remove dead use, replace const proxy.

* Support 7B dynamic shape version.

* Pass 1st dynamic graph model.

* Pass both two graph model for 7B dynamic shape version.

* Fix ci case incre_flash_attention.

* Change split execution path for both shape mode.

* Add execution path for copy_with_offset.

* Merge copy_with_offset shape path mode.

* Add const proxy for int split_size.

* Move some common functions into util.

* Add path for flash_attention to pass head, kvhead and dim in.

* Cancel path split for slice start proxy form.

* Add sequenceAt & triu test unit case.

* Return several code logic back to original design, and fix flash_incre_attention unit test.

* Modify the logic of split implementation.

* Add split dynamic case.

* Remove identity additional logic, wrap convert into promote_dtype.

* Pass ge unit test.

* Modify logic of lt dtype, and prompt_attention fp16 conversion.

* Add promote_dtype priority logic.

* Fix promote_dtype bug.

* Cast back fa to float32 if dtype not consistent.

* Change to return q_dtype tensor.

* Improve promote_dtype logic.

* Add const proxy logic for promote_dtype.

* Fix flash_attention declaration.

* Remove Symint & Proxy from 7B static path.

* Change const judge method.

---------

Co-authored-by: chenchiyu <chenchiyu@pjlab.org.cn>
Wrench-Git pushed a commit to DeepLink-org/deeplink.framework.dev that referenced this pull request Jul 16, 2024
…rg#791)

* Refine code structure of dynamic shape handling.

* Adjust symint_to_args relationship code logic.

* Remove redundant code.

* Enable 70B get_qkv stage dynamic shape.

* Fix complex size append.

* Change load_and_run in/out shape assignment.

* Refine variable replacement in in/out shape structure.

* Fix merge bugs.

* Change one comment and variable name.

* Fix an array assignment change.

* Code refinement including:

  1.Remove redundant Cast operator.

  2.Change logic of Expand shape proxy.

  3.Merge output stride executing path.

* Get clear idea for expand Cast situation.

* Apply some idea from Gpt AI.

* Revert "Apply some idea from Gpt AI."

This reverts commit 9025019.

* Remove dead use, replace const proxy.

* Support 7B dynamic shape version.

* Pass 1st dynamic graph model.

* Pass both two graph model for 7B dynamic shape version.

* Fix ci case incre_flash_attention.

* Change split execution path for both shape mode.

* Add execution path for copy_with_offset.

* Merge copy_with_offset shape path mode.

* Add const proxy for int split_size.

* Move some common functions into util.

* Add path for flash_attention to pass head, kvhead and dim in.

* Cancel path split for slice start proxy form.

* Add sequenceAt & triu test unit case.

* Return several code logic back to original design, and fix flash_incre_attention unit test.

* Modify the logic of split implementation.

* Add split dynamic case.

* Remove identity additional logic, wrap convert into promote_dtype.

* Pass ge unit test.

* Modify logic of lt dtype, and prompt_attention fp16 conversion.

* Add promote_dtype priority logic.

* Fix promote_dtype bug.

* Cast back fa to float32 if dtype not consistent.

* Change to return q_dtype tensor.

* Improve promote_dtype logic.

* Add const proxy logic for promote_dtype.

* Fix flash_attention declaration.

* Remove Symint & Proxy from 7B static path.

* Change const judge method.

---------

Co-authored-by: chenchiyu <chenchiyu@pjlab.org.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants