模型结构
Causal Attention
GQA
Grouped Query Attention(GQA机制)
SparseMOE
LoRA
LLM | LoRA(Low-Rank Adaptation of large language models)
Perceiver
论文:Perceiver - General Perception with Iterative Attention
ZeRO
Zero Redundancy Optimizer(ZeRO)内存优化技术
工程架构
Triton
Triton:OpenAI开发的编写高效GPU内核(kernel)的语言和编译器框架,Triton-based kernel