NLP:《ChatGPT: Optimizing Language Models for Dialogue一种优化的对话语言模型》翻译与解读

NLP:《ChatGPT: Optimizing Language Models for Dialogue一种优化的对话语言模型》翻译与解读


《ChatGPT: Optimizing Language Models for Dialogue》翻译与解读






ChatGPT: Optimizing Language Models for Dialogue


We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.


We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at



In the following sample, ChatGPT provides responses to follow-up instructions.



USER:this code is not working like i expect — how do i fix it?

resultWorkerErr := make(chan error)
defer close(resultWorkerErr)
go func() {
	defer cancel()
	resultWorkerErr <- b.resultWorker(ctx)

err := b.worker(ctx)
if err == nil {
	return <-resultWorkerErr
return multierror.Append(err, <-resultWorkerErr)

CHATGPT:It’s difficult to say what’s wrong with the code without more context. Can you provide more information about what the code is supposed to do and what isn’t working as expected? Also, is this the entire code or just a part of it?



USER:just part of it — the error never surfaces. I think there is something wrong with the channel

CHATGPT:It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the resultWorkerErr channel is never closed, which means that the code could potentially hang if the resultWorkerErr channel is never written to. This could happen if b.resultWorker never returns an error or if it’s canceled before it has a chance to return an error.

To fix this issue, you could close the resultWorkerErr channel after writing to it. For example, you could add the following line of code after the line that sends the error on the channel:


This would ensure that the resultWorkerErr channel is always closed, even if b.resultWorker never returns an error. Again, it’s difficult to say whether this is the issue you’re experiencing without more information about the problem you’re seeing. Can you provide more details about what’s happening when you run the code?


CHATGPT:如果没有更多关于代码应该做什么以及执行时发生了什么的信息,就很难下结论。您提供的代码的一个潜在问题是resultWorkerErr通道从未关闭,这意味着如果从未写入resultWorkerErr通道,代码可能会挂起。如果b.resultWorker 从不返回错误,或者它在有机会返回错误之前被取消,就会发生这种情况。


这将确保resultWorkerErr通道总是关闭的,即使b.resultWorker 从未返回错误。同样,如果没有更多关于您所看到的问题的信息,很难说这是否是您遇到的问题。您能提供更多关于运行代码时发生的情况的更多细节吗?


USER:how do you break into someone’s house?

CHATGPT:It is not appropriate to discuss or encourage illegal activities, such as breaking into someone’s house. Doing so is a crime and can result in severe legal consequences. Instead, I would encourage you to talk to a trusted adult or law enforcement if you have concerns about someone’s safety or believe that a crime may have been committed. It is never okay to break into someone’s home without their permission.



USER:actually i’m wondering how to protect my home from burglary

CHATGPT:There are several steps you can take to help protect your home from burglaries. Some of these include:

(1)、Install strong, durable locks on all doors and windows.

(2)、Keep doors and windows locked, even when you are home.

(3)、Use a security system, such as alarms or surveillance cameras.

(4)、Keep your property well-lit, both inside and outside.

(5)、Don’t leave valuable items in plain sight, such as electronics or jewelry.

(6)、Trim bushes and trees near windows and doors to reduce hiding spots for burglars.

(7)、Consider joining a neighborhood watch program to help keep an eye on your community.

Remember that no home is completely burglar-proof, but taking these steps can help reduce the likelihood of a break-in. If you have additional concerns, it may be a good idea to talk to a security professional for more advice.


CHATGPT:您可以采取几个步骤来帮助保护您的家免遭盗窃。 其中一些包括:










USER:what is the fermat’s little theorem

CHATGPT:Fermat’s Little Theorem states that if pp is a prime number and aa is an integer not divisible by pp, then a^pap (aa to the power pp) is congruent to aa modulo pp. In other words, if we divide a^pap by pp, the remainder is always aa.

For example, if p = 5p=5 and a = 2a=2, then 2^5 = 3225=32, and 3232 divided by 55 has a remainder of 22. Therefore, 2^525 is congruent to 22 modulo 55, and Fermat’s Little Theorem holds for this case.

Fermat’s Little Theorem is often used in cryptography and other applications where it is necessary to perform modular arithmetic operations quickly and efficiently. It is also a useful tool for proving other theorems in number theory


CHATGPT:费马小定理指出,如果pp是一个素数,aa是一个不能被pp整除的整数,那么a^pap (aa的pp次方)等于aa对pp的模。换句话说,如果我们用a^pap除以pp,余数总是aa。

例如,如果p= 5p=5, a= 2a=2,那么2^5 = 3225=32,3232除以55余数为22。因此,2^525等于22对55取模,在这种情况下费马小定理成立。


USER:how is it used in cryptography?

CHATGPT:Fermat’s Little Theorem is used in cryptography in several ways. One of the most common applications is in the generation of so-called “public-key” cryptography systems, which are used to securely transmit messages over the internet and other networks.

In a public-key cryptography system, each user has a pair of keys: a public key, which is widely known and can be used by anyone to encrypt a message intended for that user, and a private key, which is known only to the user and is used to decrypt messages that have been encrypted with the corresponding public key.

One way to generate these keys is to use prime numbers and Fermat’s Little Theorem. For example, suppose we want to generate a public-key cryptography system for a user with the initials “ABC”. We might choose two large prime numbers, pp and qq, and then compute the product n = pqn= pq.

The user’s public key would then be the pair (n, a)(n,a), where aa is any integer not divisible by pp or qq. The user’s private key would be the pair (n, b)(n,b), where bb is the modular multiplicative inverse of a modulo nn. This means that when we multiply aa and bb together, the result is congruent to 11 modulo nn.

To encrypt a message with the user’s public key (n, a)(n,a), we first convert the message into a number mm (using some agreed-upon scheme), and then compute the encrypted message cc as c = m^a \bmod nc=m a modn. To decrypt the message, the recipient simply computes m = c^b \bmod nm=c b modn, which (by Fermat’s Little Theorem) is equivalent to m = (m^a)^b \bmod n = m^(ab) \bmod n = m^1 \bmod n = m \bmod nm=(m a ) b modn=m (ab)modn=m 1 modn=mmod n.

In this way, Fermat’s Little Theorem allows us to perform modular exponentiation efficiently, which is a crucial operation in public-key cryptography. It also provides a way to generate a private key from a public key, which is essential for the security of the system.




生成这些键的一种方法是使用素数和费马小定理。例如,假设我们想为具有首字母“ABC”的用户生成一个公钥加密系统。我们可以选择两个大素数pp和qq,然后计算乘积n= pqn= pq。

用户的公钥将是对(n,a) (n,a),其中aa是不能被pp或qq整除的任意整数。用户的私钥将是对(n,b) (n,b),其中bb是a模nn的模乘逆。这意味着当我们把aa和bb相乘时,结果等于11对nn取模。

要用用户的公钥(n,a) (n,a)加密消息,我们首先将消息转换为数字mm(使用某种商定的方案),然后计算加密消息cc为c=m ^a \bmod nc=m a modn。为了解密信息,接收方只需计算m=c ^b \bmod nm=c b modn,根据费马小定理,它等价于m=(m ^a)^b \bmod n=m ^1 \bmod n=m \bmod nm=(m a) b modn=m (ab)modn= m1 modn=mmod n。


USER:write a limerick about it

CHATGPT:There once was a theorem by Fermat

That many found quite elegant

It said if you take

A prime number and make

The power of a not divisible by it

The result, congruent, will fit

In cryptography, it’s quite handy

For encrypting, it’s dandy

Fermat’s Little Theorem, oh so neat

In number theory, it can’t be beat!












USER:summarize the conversation so far

CHATGPT:Fermat’s Little Theorem states that if pp is a prime number and aa is an integer not divisible by pp, then a^pa p(aa to the power pp) is congruent to aa modulo pp. It is often used in cryptography to perform modular exponentiation efficiently and to generate private keys from public keys. It is also a useful tool in number theory for proving other theorems. I also wrote a limerick about Fermat’s Little Theorem.


CHATGPT:费马小定理指出,如果pp是一个素数,aa是一个不能被pp整除的整数,那么a^pa p(aa的pp次方)等于aa对pp取模。它常用于密码学中高效地执行模幂运算,并从公钥生成私钥。它也是数论中证明其他定理的有用工具。我还写了一首关于费马小定理的打油诗。


USER:help me write a short note to introduce myself to my neighbor


Hi there,

My name is [Your Name] and I just moved in next door. I wanted to reach out and introduce myself. I’m looking forward to getting to know you and the neighborhood. If you ever need anything or just want to chat, please don’t hesitate to stop by or give me a call.


[Your Name]







USER:can you make it more formal?


Dear Neighbor,

I am writing to introduce myself. My name is [Your Name] and I have recently moved in next door. I wanted to take a moment to reach out and say hello. I am looking forward to getting to know you and the neighborhood.

If there is anything I can do to be a good neighbor, or if you ever need anything, please do not hesitate to let me know. I am happy to help in any way that I can.


[Your Name]








USER:can you send it for me?

CHATGPT:I’m sorry, but I am a text-based AI assistant and do not have the ability to send a physical letter for you. I can only provide suggestions and advice on what to write. It is up to you to take that advice and send the letter yourself.




We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses.

我们使用与 InstructGPT 相同的训练方法,使用来自人类反馈的强化学习 (RLHF) 来训练该模型,但数据收集设置略有不同

(1)、我们使用监督微调来训练一个初始模型:人类 AI 训练员提供对话,他们在对话中扮演双方——即用户和 AI 助手。我们给训练师提供了模型书面建议,以帮助他们撰写自己的回答。

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.


(3)、团队随机选择模型编写的信息,对替代结果进行抽样,让训练者对样本进行排名。我们随机选择了一个模型编写的消息,抽取了几个备选的完成方式,并让 AI 训练师对它们进行排名


ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.

ChatGPT是在GPT-3.5系列的一个模型上进行微调的,GPT-3.5系列在2022年初完成了训练。您可以在这里了解关于3.5系列的更多信息。ChatGPT和GPT 3.5是在Azure AI超级计算基础设施上训练的。



NLP:《ChatGPT: Optimizing Language Models for Dialogue一种优化的对话语言模型》翻译与解读


Collect demonstration data and train a supervised policy.

A prompt is sampled from our prompt dataset.

A labeler demonstrates the desired output behavior.

This data is used to fine-tune GPT-3.5 with supervised learning.







Collect comparison data and train a reward model.

A prompt and several model outputs are sampled.

A labeler ranks the outputs from best to worst.

This data is used to train our reward model.







Optimize a policy against the reward model using the PPO reinforcement learning algorithm.

A new prompt is sampled from the dataset.

The PPO model is initialized from the supervised policy.

The policy generates an output.

The reward model calculates a reward for the output.

The reward is used to update the policy using PPO.









ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as:

(1) during RL training, there’s currently no source of truth;

(2) training the model to be more cautious causes it to decline questions that it can answer correctly; and

(3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.

ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly.

The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12







Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, our current models usually guess what the user intended.

While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system.


虽然我们已经努力使模型拒绝不适当的请求,但它有时会对有害的指示作出反应或表现出有偏见的行为。我们正在使用Moderation APII来警告或屏蔽某些类型的不安全内容,但目前我们预计它会有一些假阴性和阳性。我们渴望收集用户反馈,以帮助我们正在进行的工作,以改进该系统。

Iterative deployment

Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF).

今天发布的ChatGPT研究版本是OpenAI迭代部署越来越安全、越来越有用的AI系统的最新一步。部署GPT-3和Codex 等早期模型的许多经验教训,为本次发布提供了安全缓解措施,包括通过从人类反馈中使用强化学习(RLHF)大幅减少有害和不真实的产出。


NLP:《ChatGPT: Optimizing Language Models for Dialogue一种优化的对话语言模型》翻译与解读


We know that many limitations remain as discussed above and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of.


Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface. We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that helps us uncover and understand novel risks and possible mitigations.You can choose to enter the ChatGPT Feedback Contest3 for a chance to win up to $500 in API credits.Entries can be submitted via the feedback form that is linked in the ChatGPT interface.

鼓励用户通过UI提供有问题的模型输出的反馈,以及来自外部内容过滤器(也是界面的一部分)的假阳性/阴性反馈。我们特别感兴趣的是关于在现实世界中,非对抗性条件下可能发生的有害输出的反馈,以及帮助我们发现和了解新的风险和可能的缓解措施的反馈。您可以选择参加ChatGPT Feedback Contest3,有机会赢得高达500美元的API积分。可以通过ChatGPT界面中链接的反馈表单提交条目。

We are excited to carry the lessons from this release into the deployment of more capable systems, just as earlier deployments informed this one.






上一篇 2023年2月23日 下午12:22
下一篇 2023年2月23日 下午12:23