Tianhao Gao

Email: gaotianhao@pku.edu.cn

Publications

Tianhao Gao, Jun Fang, Hanyu Liu, et al. (COLING 2022)
📄 Paper | 📽️ Slides | 📝 Introduction
LEGO-ABSA

Jiaoyang Li, Jun Fang, Tianhao Gao, et al. (AAAI 2026)
📄 Paper | 📝 Introduction
FANoise

Haochen Li, Tianhao Gao, Weiping Li, et al. *(Preprint, 2020)
📄 arXiv | 📝 Introduction
STE

Tianhao Gao, et al. “An Iterative Training Framework and Hierarchical Contrastive Loss for Graph-Text Matching” Submitted

multimodal-loss

🔹 Selected the optimal base model and architecture through a thorough comparison and evaluation of different alternatives.
🔹 Constructed high-quality training data using PySpark and Hive SQL techniques.
🔹 Trained a multi-modal tagging model and developed a recursive algorithm to determine the appropriate threshold, achieving an impressive 90% accuracy.

(图片待添加)

🔹 内衣类目完成 Qwen2.5-7b 模型 SFT，业务验收准确率达到 90.5%。完成内衣类目全量数据刷数，沉淀 1.2亿 数据资产。AB 实验初步结果体验得分绝对提升 0.25分，符合业务预期。
🔹 前台类目识别完成模型迭代方案升级，升级后的方案不依赖人工数据标注，可自动化进行难例挖掘。
🔹 服饰一级类目产品词挖掘，基于 180天全量搜索 query，产出约 1万产品词。

(图片待添加)

🔹 基于 GPT-4 完成商详问答 SFT 数据集构建，并建立测试数据集。
🔹 构建问答增强的 Bloom 模型，并基于 LangChain 构建问答全流程，算法自测准确率 88%，完成三个一级类目服务部署上线。
🔹 调研 RAG 领域主流做法。利用商详问答 LLM 模型生产监督数据，反向优化向量检索模型，将向量检索模型的 top-1 准确率由 76.24% 提升至 77.29%，提升 1.1个百分点，将 bge-small 模型的能力提升至 stella-large 水平。

(图片待添加)

🔹 RAG 模块采用 GLM3 模型，自测准确率 85%+。全流程 UAT 测试中，严格正确率为 81.1%，加上部分正确率则为 89.5%。服务已上线，业务 AB 测试中。
🔹 RAG 模块 baseline 模型升级，完成 Qwen1.5 和 Qwen2 系列模型在 Apple 的训练和效果验证，迭代上线推进中。
🔹 基于 LLM + 客服对话，构建信息抽取 pipeline，完成客服有效对话抽取能力建设，并在 Apple Watch 品类完成测试。基于该能力可生产的数据包括：
- 客服有效对话数据
- 问答数据对（用于编码模型训练）
- 商品信息结构化图谱
🔹 构建 Mac 品类训练数据，完成 Apple 问答模型增量训练，通过回归测试，Mac 品类上线，准确率达到 85%+ 目标。

(图片待添加)

(图片待添加，内容待完善)

(图片待添加)

🔹 Performed a comprehensive problem analysis and proposed a modular system architecture to decouple and solve various challenges.
🔹 Employed traditional algorithms, including edit distance, minimum window substring, and recursive algorithms combined with the CV2 library for effective counterfeit detection.