博客

RoPE（Rotary Position Embedding）外推

【推荐系统】Embedding、多值离散特征的embedding解决方案

Tree-structured Parzen Estimator（TPE）：一种基于贝叶斯优化的高效超参数优化算法

超参数优化

Optuna：一个强大的开源超参数优化框架

超参数优化

大模型微调的常见方案

LLM

SMOTE（Synthetic Minority Over-sampling Technique）一种广泛使用的过采样方法

采样

RFECV：递归特征消除（RFE）与交叉验证相结合的特征选择算法

RetroMAE：一种基于掩码自编码器（Masked Auto-Encoder，MAE）的检索导向预训练框架

LLM

BGE（BAAI General Embedding）

LLM

推荐系统中序列建模优化的思路和常见模型（DIN、DIEN等）

推荐系统序列建模

Symbolic reasoning（符号推理）

LLM

Cross attention（交叉注意力）

注意力机制

python 添加依赖包

python

字节LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning

论文旅行规划

命令行中`-`（单短横线）和`--`（双短横线）的使用区别

命令行

寒武纪CNToolkit

寒武纪

tf.nn.batch_normalization

TensorFlow TensorFlow函数

TensorFlow TensorFlow函数

tf.matmul函数

TensorFlow TensorFlow函数

043 tensorflow | Batch Normalization - 批标准化 - BN层

如何理解阿里Qwen3的发布，意味着大模型赛道迎来新变革？

LLM

【整理】多模态相关工作

多模态

TensorFlow函数

python库：Google的ABSL（Abseil）库

python库

Normalized Entropy (NE)

指标

如何选择服务器内存和CPU

架构

TensorFlow模型迭代实验记录模版

TensorFlow

大模型推理加速调研

LLM

google

JAX 全面解析：下一代科学计算与机器学习框架

Python库

AdaGrad（Adaptive Gradient Algorithm）

优化器

python

Symbol not found: ***

python 常见问题

寒武纪 MLU（Machine Learning Unit）

寒武纪

PPL（Perplexity，困惑度）

指标

Efficient Streaming Language Models with Attention Sinks

Asynchronous Stochastic Gradient Descent with Delay Compensation

论文优化器

TensorFlow

人类一生所学不过4GB？加州理工顶刊研究揭示大脑的“速度陷阱”与AI的未来挑战

热点

学习率超参的调整

tf.train.polynomial_decay实现学习率warmup

TensorFlow TensorFlow函数

Model Context Protocol (MCP)

LLM

论文：Perceiver - General Perception with Iterative Attention

论文 Transformer Google Deepmind

MoE（Mixture of Experts）模型中的Balance Loss

AdaF2M2 : Comprehensive Learning and Responsive Leveraging Features in Recommendation System

NCCL（NVIDIA Collective Communications Library） AllReduce

nccl 开发 NVIDIA

os.path.dirname

python

NVIDIA L20和NVIDIA A30

GPU

macOS 系统安装 Anaconda 的详细步骤及注意事项

Anaconda

«
1
2
3
4
5
…
48
49
»