Skip to content

Releases: sophgo/tpu-mlir

Technical Preview

21 Aug 14:23
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

New Features and Enhancements

  • Support for Various Operations: Added support for exp, erf, gelu, loopop, and other operations for specific platforms.
  • Tooling and Visualization: Renamed profile.py, added visual tools for weights, and enhanced debugging capabilities.
  • Model Support and Adjustments: Added daily release models, scripts, and support for specific model types like yolov8, yolov4s.
  • Distribution and Multicore Support: Implemented distribution steps, multicore support, and group convolution transformation.

Bug Fixes and Resolutions

  • Model and Parsing Fixes: Resolved issues in emvd models, parsing errors, slice bugs, and fixed specific issues in bm1684 and bm1686.
  • Codegen and Canonicalization Fixes: Addressed type errors, canonicalization failures, and operand kind checks.
  • Inference and Optimization Fixes: Fixed inference issues in max, where, and slice operations, and refined canonicalization.

Documentation and Cleanup

  • Documentation Updates: Refined tpu-mlir docs, added supposed ops document, and updated specific documents.
  • Code Cleanup and Refactoring: Removed unnecessary code, reconstructed permute move canonicalization, and prepared for LLVM upgrade.

Other Changes

  • Testing and Calibration: Added test cases, calibration updates, and support for regression and tag in TDB.
  • Backend and Runtime Adjustments: Updated backend, added support for auto-increase op, and fixed minor bugs.

Technical Preview

26 Jul 09:25
Compare
Choose a tag to compare

Features:
BM1686: support post handle op, provided parallelOp codegen, add DivOp for f16/bf16.
BM1684: Support dynamic compilation load tensor from L2mem, implement GROUP_3D local layer function, support more dynamic ops, like MinConst, MaxConst, Lut; and some static ops, like deform_conv2d.
CV18XX: Support more ops like equalOp.
Support IfOp for f16/bf16/int8 mode.
Implement post process function of sensitive layer, unranked tensor and dynamic tensor at frontend, add empty and baddbmm torch converter/interpreter.
Support weight split when layer group if op is broadcastbinary, suppoprt parse ops of each layer in top.mlir, support int32 to i/u8 inference for modeol_runner.py.
Remove onnx-sim and use unranked_type for all ops.
Implement more graph opimize: merge matmul + add to matmul if float type, fuse same operation pass, weight trans when permute+add.
Support more torch ops, like rmsnorm, ceil, remainder.
Other new operations: lowering of GatherElements, multi-input Add.

Bug Fixes:
Fix chatglm2 rmsnorm untransformed prob, ScaleOp inference error, bmodel_dis format bin, shape inference of matmul, subnet output order mismatch cause error in dynamic runtime.
Avoid duplicate name of inserted CastOp, distinguish caffe matmul shape.

Code Refactoring:
Use llvm::md5, llvm::sha256.
Use Clang to speed up code compilation.
Remove some unused header files.
Use rewriter.eraseOp instead of op->earse, use string to define padding mode.
Refine disassembler, refactor mix_precision.

Documentation Updates:
Update document version and change some model-zoo requirements.
Modified English part and modified developer_manual doc for visual.py part.

Testing and Verification:
Updated list of test models supported by BM1684X.

Technical Preview

19 Jun 03:40
Compare
Choose a tag to compare

Features:
Supported 'Conv3D', 'Pool3D', 'Pow2(n^x)', 'Softplus', 'GRU', 'Scale' for BM1684, more models available like wenet-encoder.
Supported some operations like 'DictConstruct', 'Sub', 'Ones_like', 'Zeros_like', 'ChannelShuffle', 'Activation', 'Conv3d', 'Compare', 'GroupNorm', 'InstanceNorm', 'Clamp' in PyTorchConverter.
New ONNX operations in OnnxConverter, like 'GridSample', 'CompareCst'.
Supported more dynamic more operations like 'Arg', 'Active', 'Reduce', 'Min', 'Max' for BM1684.
Add depth2space to backward pass, 1684x yolov5 postprocess, CopyMultiUseWeight pattern before shape_infer.
Improved the previous subnets's type check logic, add some parallel in learning quant.

Bug Fixes:
Running functions have improved and fixed: weight display problem in visual tool, model_deploy -- test_reference is none.
BM1684: fix8b large dilation weight reorder, MulConst, AddConst, SubConst local buffer size, mulshift local buffer.
BM1684X: 5dim broadcast add, attention and utest bug, scatternd support 5dim, YoloDetection inference bug, strideslice op need begin_mask/end_mask for dynamic shape.
CV18XX: fix gray fuse preprocess, fix TgScaleLutKernel pass.
OnnxConverter: convert_add_op fix broadcast channel when r_dim is 1, infer subgraph to get shape and fix attr:'axes' not in squeeze.
Others: fix sdk demo problem, hanging prob caused by assert in cmodel, group overlap tensor id error, fix python array with random data.

Code Refactoring:
Redesigned subnet splitting, sorting, merging and running order.
Refine 18xx codegen, conv quantization, gather lowering and debugger's dictionary-structure.
Rename bdc to tiu.
Reset pattern of onnx subconst op.
Simplify layernorm to single output.

Documentation Updates:
Fix quick_start typo.
Update yolov3_tiny output_names.
Refine yolov5 postprocess chapter, cv18xx quick start doc.

Testing and Verification:
Update yolov3 regression test, bayer2RGB model sample, squeezenet_v1.1_cf.
Save a copy of bert_base 2.11 version config for cali.
Add timeout checkout and model test timeout for test.
Add many cv18xx model regression.
Align cv18xx detect samples and YOLODetection Func.

Technical Preview

29 May 04:33
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Features:

Added a feature called "bmodel_checker", which aids in checking the correction and functionality of the BModels.
Supported LSTM (Long Short-Term Memory) for bm1684, indicating improved capabilities for handling sequence data.
Added support for the ONNX Loop operation, expanding the range of operations that can be performed using the ONNX format.
Implemented support for operations like 'stack', 'new_zeros', 'new_ones' in PyTorch.
Added a new visual tool for analyzing the parameters or operation of the models.
Added support for TensorFlow's MobileBert model.

Bug Fixes:

Fixed a bug related to 'decode lmem address', which might have caused issues in decoding addresses.
Addressed the 'incomplete onnx shape info' bug, improving the reliability of using ONNX format models.
Resolved an issue with 'single thread of int4 regression test', enhancing the testing suite.
Fixed the 'group deconv' and 'deconv1d' issues, optimizing the performance of deconvolution operations.
Resolved an error in the ArgError[18xx] case in 'test_onnx.py'.
Corrected an issue causing MulConst overflow in certain cases.

Code Refactoring:

Refactored BModel_dis to make it more efficient or easier to understand.
Unified the codegen pass to simplify the code generation process.
Revised the argument structure of bmodel_checker for more logical and intuitive use.
Modified the PermutePadSwap function to accommodate more situations.
Refined memory usage for large models, improving efficiency and performance.
Removed unused files and refactored main_entry, run_model, and cfg files for more streamlined execution.

Documentation Updates:

Updated the README file to provide up-to-date information.
Synced with model-zoo to maintain the relevance of documentation.
Added a description for the visual tool parameter.
Added information on mlir precision test and target in the documentation.
Updated the quick start guide for PyTorch.
Added more detailed information about the new bmodel_checker tool and Tensor Location in the documentation.

Testing and Verification:

Added an inference test for 'stable diffusion.'
Added regression tests for ONNX on the 1684 chip.
Fixed an issue in the ArgError[18xx] case in 'test_onnx.py', improving the ONNX testing suite.
Added an operation regression test for Athena2.
Added a test for 'stable diffusion' to ensure its proper functionality.

Technical Preview

20 May 18:18
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Features:

  • Added a feature called "bmodel_checker", which aids in checking the correction and functionality of the BModels.
  • Supported LSTM (Long Short-Term Memory) for bm1684, indicating improved capabilities for handling sequence data.
  • Added support for the ONNX Loop operation, expanding the range of operations that can be performed using the ONNX format.
  • Implemented support for operations like 'stack', 'new_zeros', 'new_ones' in PyTorch.
  • Added a new visual tool for analyzing the parameters or operation of the models.
  • Added support for TensorFlow's MobileBert model.

Bug Fixes:

  • Fixed a bug related to 'decode lmem address', which might have caused issues in decoding addresses.
  • Addressed the 'incomplete onnx shape info' bug, likely improving the reliability of using ONNX format models.
  • Resolved an issue with 'single thread of int4 regression test', enhancing the testing suite.
  • Fixed the 'group deconv' and 'deconv1d' issues, optimizing the performance of deconvolution operations.
  • Resolved an error in the ArgError[18xx] case in 'test_onnx.py'.
  • Corrected an issue causing MulConst overflow in certain cases.

Code Refactoring:

  • Refactored BModel_dis to make it more efficient or easier to understand.
  • Unified the codegen pass to simplify the code generation process.
  • Revised the argument structure of bmodel_checker for more logical and intuitive use.
  • Modified the PermutePadSwap function to accommodate more situations.
  • Refined memory usage for large models, improving efficiency and performance.
  • Removed unused files and refactored main_entry, run_model, and cfg files for more streamlined execution.

Documentation Updates:

  • Updated the README file to provide up-to-date information.
  • Synced with model-zoo to maintain the relevance of documentation.
  • Added a description for the visual tool parameter.
  • Added information on mlir precision test and target in the documentation.
  • Updated the quick start guide for PyTorch.
  • Added more detailed information about the new bmodel_checker tool and Tensor Location in the documentation.

Testing and Verification:

  • Added an inference test for 'stable diffusion.'
  • Added regression tests for ONNX on the 1684 chip.
  • Fixed an issue in the ArgError[18xx] case in 'test_onnx.py', improving the ONNX testing suite.
  • Added an operation regression test for Athena2.
  • Added a test for 'stable diffusion' to ensure its proper functionality.
  • Fixed the issue with the daily build test, ensuring a more reliable continuous integration pipeline.

Technical Preview

02 Apr 10:13
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  1. Lots of bug fixes and performance improvements.
  2. TPU-MLIR supports importing Pytorch models (no need to convert to ONNX).
  3. Unified pre-processing for bm168x and cv18xx chips.
  4. Support for the bm1684 chip is underway.

Technical Preview

20 Mar 08:29
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  • Resolved pre-processing performance issues.
  • Added shape inference for dynamic input shapes.
  • Implemented constant folding to simplify the graph.
  • Improved performance, still working on optimizations.

Technical Preview

08 Mar 09:31
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  1. The image pre-processing will be offloaded to TPU, improving performance.
  2. Many bug fixes allow TPU-MLIR to support more neural networks.
  • fix pool sign error in v0.8-beta.3

Technical Preview

07 Mar 03:06
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  1. The image pre-processing will be offloaded to TPU, improving performance.
  2. Many bug fixes allow TPU-MLIR to support more neural networks.
  • Fix pre-processing conversion bug in v0.8-beta.2

Technical Preview

02 Mar 08:43
Compare
Choose a tag to compare
Technical Preview Pre-release
Pre-release

This beta version of TPU-MLIR is for testing purposes only—do not use it in production.

Notable changes:

  1. The image pre-processing will be offloaded to TPU, improving performance.
  2. Many bug fixes allow TPU-MLIR to support more neural networks.

* Fix reading pre-processing configuration bug in v0.8-beta.1