文本生成的几种简单方法_MACKEI_文本生成

大大的周 02-07 4447

一、使用Chinese GPT2 Model from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-cluecorpussmall") model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-cluecorpussmall") text = TextGenerationPipeline(model, tokenizer) generate = text("2022年8月7日星期天，今天我收到了秋天的第一杯奶茶", max_length=100, do_sample=True) print(generate)

输出结果：

?uer/gpt2-chinese-cluecorpussmall · Hugging Face

二、使用transformers通用方法 from transformers import AutoTokenizer, AutoModelWithLMHead import torch, os import pandas as pd os.environ["CUDA_VISIBLE_DEVICES"] = "0" tokenizer = AutoTokenizer.from_pretrained("uer/gpt2-chinese-cluecorpussmall") model = AutoModelWithLMHead.from_pretrained("uer/gpt2-chinese-cluecorpussmall") config=model.config print(config) device = 'cuda' if torch.cuda.is_available() else 'cpu' model = model.to(device) # data = pd.read_excel('D:/pthon.1/NLP/bigbird/data/殖装_处理后.xlsx') # texts = data['content'].tolist() texts = ["用我三生烟火，换你一世迷离只缘感君一回顾，使我思君朝与暮", "一座城一个人，一盏花灯一场烟火，一棵古树一地雪玉残香，一人回眸凝望，一世繁华无殇", "纵然万劫不复，纵然相思入骨，我也待你眉眼如初，岁月如故", "于我虽一眼惊鸿，于你却似指尖清风"] encoding = tokenizer(texts, return_tensors='pt', padding=True).to(device) with torch.no_grad(): generated_ids = model.generate(**encoding, max_length=200, # 生成序列的最大长度 do_sample=True, #是否开启采样，默认是 False，即贪婪找最大条件概率的词 top_k=20, # top-k-filtering 算法保留多少个最高概率的词作为候选，默认50 repetition_penalty=1.0, #重复词惩罚 temperature=1.0 # 增加了高概率词的可能性，降低了低概率词的可能性 ) generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) for txt in generated_texts: print(txt)

输出结果：

基于 transformers 的 generate() 方法实现多样化文本生成：参数含义和算法原理解读_木尧大兄弟的博客-CSDN博客

三、simpletransformers

https://simpletransformers.ai/

文本生成_MACKEI的博客-CSDN博客