Releases: sophgo/tpu-mlir
Releases · sophgo/tpu-mlir
v1.15-beta.0
feat: make lmem assignment stage more analyzable - define some commonly used LOG macro (Logger.h) - define some strinify function to show lmem type and timestep mode (LayerGroupDefs.h) - add show_timestep_table to print readable timestep table (BasicTimeStep.h/BasicTimeStep.cpp) - add many DEBUG_WITH_TYPE logs and comments in lmem assignment stage (BasicTimeStep.cpp/LmemAllocator.cpp/TimeStepMethod.cpp/SwPipeline.cpp) - rename some variables and function names for better represent the process(gen_all_mem_buffer_ts/tgt_min_address/...) - reduce assignLmemAddr cyclomatic complexity.(LmemAllocator.cpp:989) Change-Id: I31dadb9424be334da481f9dfbd45985ca89dc058
v1.14
[doc] refine user interface Change-Id: I887ad481b2a3b1f7dce4fe993399ec2afa093bb4
v1.14-beta.0
fix bug in build ppl Change-Id: Ib93341da7fa6b420f9fb9cd9e4b61dc21aeaf001
v1.13
add a16 matmul multi_core Change-Id: I10a9097ee52e324555f4a505ce18d7fe9b665803
v1.13-beta.0
[doc] layergroup opt intro Change-Id: I0797b73e4d020e9556da29d1c1a743b8c80a83ad
v1.12
Features
- Support for backend operators implemented using PPL.
- TPUv7-runtime CModel integrated with TPU-MLIR for BM1690 model CModel inference.
- Optimized inference speed for BM1690 Stable Diffusion 3.0 at 512 resolution to 0.72 img/s (Mac utilization: 41.9%).
- Support for training graph compilation of ResNet50-v1 through FxGraphConverter.
Bug Fixes
- Performance: Fixed the issue of performance degradation in SegNet.
- Functionality: Resolved the compilation comparison issue for BM1688 DeppLabv3P.
Known Issues
- Performance: Slight performance degradation observed in BM1690 YOLOv5-6 with 4 batch INT8 on eight cores.
v1.12-beta.0
combine slice and concate to new Rope ConcatToRope Change-Id: Ib15b12fe97117b96c6fe7267c96c3f714aac6ec4
v1.11
[python] distinguish data path model-zoo from regression Change-Id: I98fa0df1524f0b38d91cda02ab5d49876f7caee8 (cherry picked from commit fa082d0b29df8a82af77839df86349aabab86949)
v1.11-beta.0
[soc_dump] add doc Change-Id: Icaf313113415a9bf0ad9c75abdcb609d661c815b
TPU-MLIR v1.10 Release
Release Note
Enhancements:
- Added CUDA support for various operations like conv2d, MatMul, dwconv, pool2d, and more.
- Improved performance for operations like MeanStdScale and softmax.
- Enhanced multi-core batch mm and added support for bm168x with CUDA.
- Refined CUDA code style and adjusted interfaces for various operations.
Bug Fixes:
- Fixed issues with matmul, calibration failures, conv pad problems, and various performance problems.
- Addressed bugs in model transformations, calibration, and various pattern issues.
- Resolved bugs in different model backends like ssd, vit, detr, and yolov5.
New Features:
- Added support for new models like resnet50, mobilenet_v2, shufflenet_v2, and yolox_s/alphapose_res50.
- Introduced new operations like RequantIntAxisOp and Depth2Space with CUDA support.
- Implemented new functionalities for better model inference and compilation.
Documentation Updates:
- Updated weight.md, calibration sections, and user interface details.
- Improved documentation for quick start, developer manual, and various tpulang interfaces.
- Enhanced documentation for model transformation parameters and tensor data arrangements.
Miscellaneous:
- Added new npz tools, modelzoo regression, and support for bmodel encryption.
- Fixed issues with various model performance, shape inference, and CUDA backend optimizations.
- Revived performance for models like yolov5s-6, bm1690 swin multicore, and more.