List of todos Lot of code generates new copies of tensors, which is not memory-efficient. We should try to avoid this. Add documentation. Add tests. fp16/bf16 support moment matching support