NPU (Ascend) Quick Start Guide
This document describes how to install and use the Twinkle framework in Huawei Ascend NPU environments.
Environment Requirements
Before getting started, please ensure your system meets the following requirements:
| Component | Version Requirement | Description |
|---|---|---|
| Python | >= 3.11, < 3.13 | Twinkle framework requirement |
| Ascend Firmware Driver (HDK) | Latest version recommended | Hardware driver and firmware |
| CANN Toolkit | 8.5.1 or higher | Heterogeneous Computing Architecture |
| PyTorch | 2.7.1 | Deep learning framework |
| torch_npu | 2.7.1 | Ascend PyTorch adapter plugin |
Important Notes:
torch and torch_npu versions must be exactly the same (e.g., both 2.7.1)
Python 3.11 is recommended for best compatibility
CANN toolkit requires approximately 10GB+ disk space
Supported Hardware
Twinkle currently supports the following Ascend NPU devices:
Ascend 910 series
Other compatible Ascend accelerator cards
Installation Steps
1. Install NPU Environment (Driver, CANN, torch_npu)
NPU environment installation includes Ascend driver, CANN toolkit, PyTorch, and torch_npu.
📖 Complete Installation Tutorial: torch_npu Official Installation Guide
This documentation includes:
Ascend driver (HDK) installation steps
CANN toolkit installation steps
PyTorch and torch_npu installation steps
Version compatibility instructions
Recommended Version Configuration:
Python: 3.11
PyTorch: 2.7.1
torch_npu: 2.7.1
CANN: 8.5.1 or higher
2. Install Twinkle
After NPU environment configuration is complete, install the Twinkle framework from source:
git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e ".[transformers,ray]"
3. Install vLLM and vLLM-Ascend (Optional)
If you need to use vLLMSampler for efficient inference, you can install vLLM and vLLM-Ascend.
Installation Steps:
# Step 1: Install vLLM
pip install vllm==0.14.0
# Step 2: Install vLLM-Ascend
pip install vllm-ascend==0.14.0rc1
Notes:
Install in the above order, ignoring possible dependency conflict warnings
Ensure CANN environment is activated before installation:
source /usr/local/Ascend/ascend-toolkit/set_env.shRecommended versions are vLLM 0.14.0 and vLLM-Ascend 0.14.0rc1
4. Verify Installation
Create test script verify_npu.py:
import torch
import torch_npu
print(f"PyTorch version: {torch.__version__}")
print(f"torch_npu version: {torch_npu.__version__}")
print(f"NPU available: {torch.npu.is_available()}")
print(f"NPU device count: {torch.npu.device_count()}")
if torch.npu.is_available():
print(f"Current NPU device: {torch.npu.current_device()}")
print(f"NPU device name: {torch.npu.get_device_name(0)}")
# Simple test
x = torch.randn(3, 3).npu()
y = torch.randn(3, 3).npu()
z = x + y
print(f"NPU computation test passed: {z.shape}")
Run verification:
python verify_npu.py
If the output shows NPU available: True and no errors, installation is successful!
Note: Twinkle does not currently provide NPU Docker images. Manual installation is recommended. For containerized deployment, please refer to official images from the Ascend community.
5. Install Megatron Backend Dependencies
Recommended versions:
Megatron-LM:
v0.15.3MindSpeed:
core_r0.15.3mcore-bridge: main branch or the version already validated in your Twinkle checkout
Installation steps:
# 1. Clone Megatron-LM and pin the compatible version
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
git checkout v0.15.3
cd ..
# 2. Clone and install MindSpeed
git clone https://gitcode.com/Ascend/MindSpeed.git
cd MindSpeed
git checkout core_r0.15.3
pip install -e .
cd ..
# 3. Clone and install mcore-bridge
git clone https://github.com/modelscope/mcore-bridge.git
cd mcore-bridge
pip install -e .
cd ..
# 4. Install Twinkle if needed
cd twinkle
pip install -e ".[transformers,ray]"
Runtime environment variables:
export PYTHONPATH=$PYTHONPATH:<path/to/Megatron-LM>
export MEGATRON_LM_PATH=</path/to/Megatron-LM>
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
Verification:
First run a minimal import check to make sure the current environment can resolve MindSpeed and Megatron-LM:
python -c "import mindspeed.megatron_adaptor; from twinkle.model.megatron._mindspeed_runtime import ensure_mindspeed_adaptor_patched; ensure_mindspeed_adaptor_patched(); print('✓ Megatron backend imports are ready')"
Quick Start
Important Notice: The following examples are from the cookbook/ directory and have been verified in actual NPU environments. It is recommended to run scripts directly from the cookbook rather than copying and pasting code snippets.
SFT LoRA Fine-tuning
The NPU document no longer provides this kind of SFT cookbook example; this capability should be described together with an actually available cookbook example or a future NPU script.
GRPO Reinforcement Learning Training
The NPU document no longer provides this kind of GRPO cookbook example; this capability should be described together with an actually available cookbook example or a future NPU script.
More Examples
Check the cookbook/remote/tinker/ascend/ directory for remote training server-side configuration.
Parallelization Strategies
Twinkle currently supports the following verified parallelization strategies on NPU:
| Parallel Type | Description | NPU Support | Verification Status |
|---|---|---|---|
| DP (Data Parallel) | Data parallelism | ✅ | No corresponding cookbook example |
| FSDP (Fully Sharded Data Parallel) | Fully sharded data parallelism | ✅ | No corresponding cookbook example |
| TP (Tensor Parallel) | Tensor parallelism (Megatron) | ✅ | Verified (see cookbook/megatron/ascend/tp_npu.py) |
| PP (Pipeline Parallel) | Pipeline parallelism (Megatron) | ✅ | Verified (see cookbook/megatron/ascend/tp_npu.py) |
| CP (Context Parallel) | Context parallelism | ✅ | Verified (see cookbook/megatron/ascend/tp_moe_cp_npu.py) |
| EP (Expert Parallel) | Expert parallelism (MoE) | ✅ | Verified (see cookbook/megatron/ascend/tp_moe_npu.py) |
Legend:
✅ Verified: Has actual running example code
🚧 To be verified: Theoretically supported but no NPU verification example yet
❌ Not supported: Not available in current version
DP + FSDP Example
The NPU document currently does not provide a corresponding cookbook code snippet.
Megatron backend note: Twinkle now provides runnable NPU smoke scripts for the Megatron backend. Please follow the installation section above before running the cookbook examples, and start with cookbook/megatron/ascend/tp_npu.py before moving on to cookbook/megatron/ascend/tp_moe_npu.py and cookbook/megatron/ascend/tp_moe_cp_npu.py.
Common Issues
1. torch_npu Version Mismatch
Problem: Version incompatibility warnings or errors after installing torch_npu.
Solution:
Ensure torch and torch_npu versions are exactly the same
Check if CANN version is compatible with torch_npu
# Check current versions
python -c "import torch; import torch_npu; print(torch.__version__, torch_npu.__version__)"
# Reinstall matching versions
pip uninstall torch torch_npu -y
pip install torch==2.7.1
pip install torch_npu-2.7.1-cp311-cp311-linux_aarch64.whl
2. CANN Toolkit Version Issue
Problem: CANN version incompatible with torch_npu.
Solution:
Install corresponding CANN toolkit version
Feature Support Status
Feature support matrix based on actual code verification:
| Feature | GPU | NPU | Verification Example | Description |
|---|---|---|---|---|
| SFT + LoRA | ✅ | ✅ | - | No corresponding cookbook example |
| GRPO | ✅ | ✅ | - | No corresponding cookbook example |
| DP Parallelism | ✅ | ✅ | - | No corresponding cookbook example |
| FSDP Parallelism | ✅ | ✅ | - | No corresponding cookbook example |
| Ray Distributed | ✅ | ✅ | - | No corresponding cookbook example |
| TorchSampler | ✅ | ✅ | - | No corresponding cookbook example |
| vLLMSampler | ✅ | ✅ | - | No corresponding cookbook example |
| Full Fine-tuning | ✅ | ✅ | - | Verified available |
| QLoRA | ✅ | ❌ | - | Quantization operators not yet supported |
| DPO | ✅ | 🚧 | - | Theoretically supported, to be verified |
| Megatron TP/PP | ✅ | 🚧 | - | To be adapted and verified |
| Flash Attention | ✅ | ⚠️ | - | Some operators not supported |
Legend:
✅ Verified: Has actual running example, confirmed available
🚧 To be verified: Theoretically supported but no NPU environment verification yet
⚠️ Partial support: Available but with limitations or performance differences
❌ Not supported: Not available in current version
Usage Recommendations:
Prioritize features marked as “Verified” for guaranteed stability
“To be verified” features can be attempted but may encounter compatibility issues
Refer to corresponding example code when encountering problems
Example Code
Twinkle’s verified NPU examples currently focus on the Megatron smoke path; the SFT and GRPO cookbook examples do not have corresponding files yet.
Remote Training (Tinker Protocol)
Server Configuration: cookbook/remote/tinker/ascend/
Provides HTTP API interface
Supports remote training and inference
Suitable for production environment deployment
Running Examples: No corresponding command examples are provided yet.
Reference Resources
Getting Help
If you encounter issues during use:
Check Logs: Set environment variable
ASCEND_GLOBAL_LOG_LEVEL=1for detailed logsSubmit Issue: Twinkle GitHub Issues
Community Discussion: Ascend Community Forum
Next Steps
📖 Read Quick Start for more training examples
📖 Read Installation Guide for other platform installations
🚀 Browse the
cookbook/directory for complete example code💡 Check Twinkle Documentation for advanced features