Overview
Twinkle provides a complete HTTP Server/Client architecture that supports deploying models as services and remotely calling them through clients to complete training, inference, and other tasks. This architecture decouples model hosting (Server side) and training logic (Client side), allowing multiple users to share the same base model for training.
Core Concepts
Server side: Deployed based on Ray Serve, hosts model weights and inference/training computation. The Server is responsible for managing model loading, forward/backward propagation, weight saving, sampling inference, etc. A single Server simultaneously supports both Twinkle Client and Tinker Client connections.
Client side: Runs locally, responsible for data preparation, training loop orchestration, hyperparameter configuration, etc. The Client communicates with the Server via HTTP, sending data and commands.
Two Model Backends
Model loading supports two backends:
| Backend | use_megatron | Description |
|---|---|---|
| Transformers | false |
Based on HuggingFace Transformers, suitable for most scenarios |
| Megatron | true |
Based on Megatron-LM, suitable for ultra-large-scale model training, supports more efficient parallelization strategies |
Two Client Modes
| Client | Initialization Method | Description |
|---|---|---|
| Twinkle Client | init_twinkle_client |
Native client, simply change from twinkle import to from twinkle_client import to migrate local training code to remote calls |
| Tinker Client | init_tinker_client |
Patches Tinker SDK, allowing existing Tinker training code to be directly reused |
How to Choose
Client Mode Selection
| Scenario | Recommendation |
|---|---|
| Existing Twinkle local training code, want to switch to remote | Twinkle Client — only need to change import paths |
| Existing Tinker training code, want to reuse | Tinker Client — only need to initialize patch |
| New project | Twinkle Client — simpler API |
Model Backend Selection
| Scenario | Recommendation |
|---|---|
| 7B/14B and other medium-small scale models | Transformers backend (use_megatron: false) |
| Ultra-large-scale models requiring advanced parallelization strategies | Megatron backend (use_megatron: true) |
| Rapid experimentation and prototype verification | Transformers backend (use_megatron: false) |
Cookbook Reference
Complete runnable examples are located in the cookbook/client/ directory:
cookbook/client/
├── server/ # Server startup configuration
│ ├── transformer/ # Transformers backend
│ │ ├── run.sh # Startup script
│ │ ├── server.py # Server entry point
│ │ └── server_config.yaml # Configuration file
│ └── megatron/ # Megatron backend
│ ├── run.sh
│ ├── server.py
│ ├── server_config.yaml
│ └── server_config_4b.yaml
├── twinkle/ # Twinkle Client examples
│ ├── self_host/ # Self-hosted Server
│ │ ├── dpo.py # DPO training client
│ │ ├── multi_modal.py # Multi-modal training client
│ │ ├── sample.py # Inference sampling client
│ │ ├── self_congnition.py # Self-cognition training client
│ │ └── short_math_grpo.py # GRPO math training client
│ └── modelscope/ # ModelScope managed service
│ ├── dpo.py
│ ├── multi_modal.py
│ └── self_congnition.py
└── tinker/ # Tinker Client examples
├── self_host/ # Self-hosted Server
│ ├── dpo.py # DPO training client
│ ├── lora.py # LoRA training client
│ ├── multi_modal.py # Multi-modal training client
│ ├── sample.py # Inference sampling client
│ ├── self_cognition.py # Self-cognition training client
│ └── short_math_grpo.py # GRPO math training client
└── modelscope/ # ModelScope managed service
├── dpo.py
├── sample.py
├── self_cognition.py
└── short_math_grpo.py
Running steps:
# 1. Start Server first
python cookbook/client/server/megatron/server.py
# 2. Run Client in another terminal (Tinker Client example)
python cookbook/client/tinker/self_host/self_cognition.py
# Or use Twinkle Client
python cookbook/client/twinkle/self_host/self_cognition.py