[2401.07013] Knowledge Distillation of Black-Box Large Language Models

🔥 Discover this insightful post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

[Submitted on 13 Jan 2024 (v1), last revised 9 Nov 2024 (this version, v2)]

View a PDF of the paper titled Knowledge Distillation of Black-Box Large Language Models, by Hongzhan Chen and 5 other authors

View PDF
HTML (experimental)

Abstract:Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

Submission history

From: Hongzhan Chen [view email]
[v1]
Sat, 13 Jan 2024 08:43:32 UTC (359 KB)
[v2]
Sat, 9 Nov 2024 01:35:32 UTC (8,288 KB)

🔥 **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Knowledge #Distillation #BlackBox #Large #Language #Models**

🕒 **Posted on**: 1782689532

🌟 **Want more?** Click here for more info! 🌟

[2401.07013] Knowledge Distillation of Black-Box Large Language Models

Submission history

By

Leave a Reply Cancel reply