Journal of Graphics ›› 2024, Vol. 45 ›› Issue (6): 1243-1255.DOI: 10.11996/JG.j.2095-302X.2024061243
• Special Topic on “Large Models and Graphics Technology and Applications” • Previous Articles Next Articles
Received:
2024-05-06
Accepted:
2024-07-18
Online:
2024-12-31
Published:
2024-12-24
About author:
First author contact:WANG Changsheng (1995-), Ph.D. candidate. His main research interests cover AI painting and artificial intelligence art, et al.E-mail:137834933@qq.com
CLC Number:
WANG Changsheng. Research on prompt engineering for large model art image generation[J]. Journal of Graphics, 2024, 45(6): 1243-1255.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.txxb.com.cn/EN/10.11996/JG.j.2095-302X.2024061243
项 | 组成部分 |
---|---|
美学质量 | 好的因素+真实因素 |
好的因素 | 好的艺术-差的艺术 |
真实因素 | 真实艺术-虚假艺术 |
提示词相似度 | 余弦相似度(提示词嵌入,图像嵌入) |
Table 1 Algorithm design for aesthetic quality and prompt similarity
项 | 组成部分 |
---|---|
美学质量 | 好的因素+真实因素 |
好的因素 | 好的艺术-差的艺术 |
真实因素 | 真实艺术-虚假艺术 |
提示词相似度 | 余弦相似度(提示词嵌入,图像嵌入) |
主题 | 无艺术家 | 1位艺术家 | 2位艺术家 | 3位艺术家 |
---|---|---|---|---|
人像 | Awise white haired magic wizard portraits | A domineering Caribbean pirate portraits by Hiroshi Yoshida | Pastel painting of a handsome youth tide man portraits by Mucha+Da Vinci | A beautiful girl in the sun portraits by Childe Hassam+Edgar Payne+Sorolla |
风景 | Florida aesthetic impressionist scenery | Sunny and gentle sun in rural landscape farms by Hiroshi Yoshida | Night on Tokyo street by Mucha+Da Vinci | Chinese fujian fishing villages by Childe Hassam+Edgar Payne+Sorolla |
静物 | Impressionist oil painting of sunny and gentle fruit static still life oil painting | Still life painting of a bowl of fruit by Hiroshi Yoshida | Oil painting of a classical vintage still life by Mucha+Da Vinci | Oil painting of a flower still lifes by Childe Hassam+Edgar Payne+Sorolla |
抽象画 | Minimalism abstract painting by the sea | Minimalism abstract painting use Gold+black+white by Hiroshi Yoshida | A decorative modern abstract painting by Mucha+Da Vinci | Abstract painting of cityscape Toronto by Childe Hassam+Edgar Payne+Sorolla |
水彩 | Kerala watercolor landscape painting | A watercolor of an old man who long white hair by Hiroshi Yoshida | Tree ring print watercolor painring by Mucha+Da Vinci | Owl standing on spell books watercolor painting by Childe Hassam+Edgar Payne+Sorolla |
概念角色 | Chinese divine dragon vs. Western evil dragon showdown character design | An anime character with blond hair in a purple suit by Hiroshi Yoshida | A cute girl character concept art by Mucha+Da Vinci | Traditional Chinese painting girl character by Childe Hassam+Edgar Payne+Sorolla |
Table 2 Prompts used in the experiment
主题 | 无艺术家 | 1位艺术家 | 2位艺术家 | 3位艺术家 |
---|---|---|---|---|
人像 | Awise white haired magic wizard portraits | A domineering Caribbean pirate portraits by Hiroshi Yoshida | Pastel painting of a handsome youth tide man portraits by Mucha+Da Vinci | A beautiful girl in the sun portraits by Childe Hassam+Edgar Payne+Sorolla |
风景 | Florida aesthetic impressionist scenery | Sunny and gentle sun in rural landscape farms by Hiroshi Yoshida | Night on Tokyo street by Mucha+Da Vinci | Chinese fujian fishing villages by Childe Hassam+Edgar Payne+Sorolla |
静物 | Impressionist oil painting of sunny and gentle fruit static still life oil painting | Still life painting of a bowl of fruit by Hiroshi Yoshida | Oil painting of a classical vintage still life by Mucha+Da Vinci | Oil painting of a flower still lifes by Childe Hassam+Edgar Payne+Sorolla |
抽象画 | Minimalism abstract painting by the sea | Minimalism abstract painting use Gold+black+white by Hiroshi Yoshida | A decorative modern abstract painting by Mucha+Da Vinci | Abstract painting of cityscape Toronto by Childe Hassam+Edgar Payne+Sorolla |
水彩 | Kerala watercolor landscape painting | A watercolor of an old man who long white hair by Hiroshi Yoshida | Tree ring print watercolor painring by Mucha+Da Vinci | Owl standing on spell books watercolor painting by Childe Hassam+Edgar Payne+Sorolla |
概念角色 | Chinese divine dragon vs. Western evil dragon showdown character design | An anime character with blond hair in a purple suit by Hiroshi Yoshida | A cute girl character concept art by Mucha+Da Vinci | Traditional Chinese painting girl character by Childe Hassam+Edgar Payne+Sorolla |
Fig. 4 Scatter plot using CLIP to evaluate generated images ((a) Portrait; (b) Landscape; (c) Still life; (d) Abstract painting; (e) Watercolors; (f) Conceptual roles)
版本 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
V2 | 0.319 7 | 0.039 3 | 0.022 4 | 0.014 4 |
V3 | 0.310 8 | 0.036 7 | 0.026 8 | 0.011 8 |
V4 | 0.282 8 | 0.039 5 | 0.032 8 | 0.017 1 |
V5 | 0.285 9 | 0.035 4 | 0.031 0 | 0.015 7 |
NIJI CU | 0.241 0 | 0.039 2 | 0.024 3 | 0.014 0 |
NIJI EX | 0.261 5 | 0.036 6 | 0.020 6 | 0.016 0 |
NIJI4 | 0.276 6 | 0.045 5 | 0.022 6 | 0.016 1 |
NIJI5 | 0.264 3 | 0.039 6 | 0.029 1 | 0.013 0 |
Table 3 Analysis of prompt similarity and aesthetic quality across different versions
版本 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
V2 | 0.319 7 | 0.039 3 | 0.022 4 | 0.014 4 |
V3 | 0.310 8 | 0.036 7 | 0.026 8 | 0.011 8 |
V4 | 0.282 8 | 0.039 5 | 0.032 8 | 0.017 1 |
V5 | 0.285 9 | 0.035 4 | 0.031 0 | 0.015 7 |
NIJI CU | 0.241 0 | 0.039 2 | 0.024 3 | 0.014 0 |
NIJI EX | 0.261 5 | 0.036 6 | 0.020 6 | 0.016 0 |
NIJI4 | 0.276 6 | 0.045 5 | 0.022 6 | 0.016 1 |
NIJI5 | 0.264 3 | 0.039 6 | 0.029 1 | 0.013 0 |
主题 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
抽象画 | 0.262 5 | 0.041 5 | 0.017 8 | 0.013 1 |
概念角色 | 0.286 8 | 0.036 1 | 0.020 5 | 0.015 6 |
风景 | 0.287 2 | 0.049 6 | 0.029 0 | 0.014 9 |
人像 | 0.276 8 | 0.044 8 | 0.030 9 | 0.012 5 |
静物 | 0.269 9 | 0.044 9 | 0.030 4 | 0.012 7 |
水彩 | 0.298 9 | 0.047 9 | 0.028 7 | 0.017 5 |
Table 4 Analysis of prompt similarity and aesthetic quality across different themes
主题 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
抽象画 | 0.262 5 | 0.041 5 | 0.017 8 | 0.013 1 |
概念角色 | 0.286 8 | 0.036 1 | 0.020 5 | 0.015 6 |
风景 | 0.287 2 | 0.049 6 | 0.029 0 | 0.014 9 |
人像 | 0.276 8 | 0.044 8 | 0.030 9 | 0.012 5 |
静物 | 0.269 9 | 0.044 9 | 0.030 4 | 0.012 7 |
水彩 | 0.298 9 | 0.047 9 | 0.028 7 | 0.017 5 |
Fig. 6 Scatter plot of expert evaluation of generated images ((a) Portrait; (b) Landscape; (c) Still life; (d) Abstract painting; (e) Watercolors; (f) Conceptual roles)
版本 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
V2 | 2.52 | 0.71 | 2.70 | 0.38 |
V3 | 3.10 | 0.66 | 3.28 | 0.37 |
V4 | 3.95 | 0.36 | 4.23 | 0.29 |
V5 | 3.62 | 0.25 | 4.41 | 0.15 |
NIJI CU | 2.90 | 0.16 | 3.43 | 0.22 |
NIJI EX | 3.69 | 0.24 | 3.89 | 0.11 |
NIJI4 | 3.19 | 0.37 | 3.40 | 0.18 |
NIJI5 | 2.71 | 0.37 | 3.32 | 0.19 |
Table 5 Analysis of prompt similarity and aesthetic quality across different versions
版本 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
V2 | 2.52 | 0.71 | 2.70 | 0.38 |
V3 | 3.10 | 0.66 | 3.28 | 0.37 |
V4 | 3.95 | 0.36 | 4.23 | 0.29 |
V5 | 3.62 | 0.25 | 4.41 | 0.15 |
NIJI CU | 2.90 | 0.16 | 3.43 | 0.22 |
NIJI EX | 3.69 | 0.24 | 3.89 | 0.11 |
NIJI4 | 3.19 | 0.37 | 3.40 | 0.18 |
NIJI5 | 2.71 | 0.37 | 3.32 | 0.19 |
主题 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
抽象画 | 3.30 | 0.43 | 3.58 | 0.19 |
概念角色 | 3.02 | 0.30 | 3.40 | 0.23 |
风景 | 3.38 | 0.33 | 3.88 | 0.19 |
人像 | 3.21 | 0.30 | 3.54 | 0.10 |
静物 | 3.32 | 0.33 | 3.75 | 0.25 |
水彩 | 2.93 | 0.47 | 3.26 | 0.36 |
Table 6 Analysis of prompt similarity and aesthetic quality across different themes
主题 | 提示词相似度 | 美学质量 | ||
---|---|---|---|---|
平均值 | 标准差 | 平均值 | 标准差 | |
抽象画 | 3.30 | 0.43 | 3.58 | 0.19 |
概念角色 | 3.02 | 0.30 | 3.40 | 0.23 |
风景 | 3.38 | 0.33 | 3.88 | 0.19 |
人像 | 3.21 | 0.30 | 3.54 | 0.10 |
静物 | 3.32 | 0.33 | 3.75 | 0.25 |
水彩 | 2.93 | 0.47 | 3.26 | 0.36 |
构成 | 描述 |
---|---|
参考图像链接 | 参考图像、参考风格、参考角色(非必要) |
主体/主题词 | 图像的主要内容或主题,决定了图像的核心元素 |
细节词 | 丰富主体的细节 |
色彩词 | 控制画面色调或具体对象色彩 |
虚词 | 使图像更具情感和随机性 |
镜头词/构图 | 限定图像的构图和视角,如俯视图、特写等 |
灯光词 | 明确灯光类型和效果 |
风格词/艺术家词 | 决定图像的主要风格流派或受到哪些艺术家影响 |
模型版本 | 选择使用的Midjourney模型版本,如V系列或NIJI系列的具体版本 |
模型参数 | 否定词、种子值、样式值、混乱值、图像及提示词权重(IW)等 |
图像画幅 | 指定图像的宽高比例 |
Table 7 Prompt formula
构成 | 描述 |
---|---|
参考图像链接 | 参考图像、参考风格、参考角色(非必要) |
主体/主题词 | 图像的主要内容或主题,决定了图像的核心元素 |
细节词 | 丰富主体的细节 |
色彩词 | 控制画面色调或具体对象色彩 |
虚词 | 使图像更具情感和随机性 |
镜头词/构图 | 限定图像的构图和视角,如俯视图、特写等 |
灯光词 | 明确灯光类型和效果 |
风格词/艺术家词 | 决定图像的主要风格流派或受到哪些艺术家影响 |
模型版本 | 选择使用的Midjourney模型版本,如V系列或NIJI系列的具体版本 |
模型参数 | 否定词、种子值、样式值、混乱值、图像及提示词权重(IW)等 |
图像画幅 | 指定图像的宽高比例 |
[1] | XU J P, ZHANG X L, LI H, et al. Is everyone an artist? A study on user experience of AI-based painting system[J]. Applied Sciences, 2023, 13(11): 6496. |
[2] | WU Z H, JI D W, YU K W, et al. AI creativity and the human-AI co-creation model[C]// Thematic Area, HCI 2021, Held as Part of the 23rd HCI International Conference on Human-Computer Interaction. Theory, Methods and Tools. Cham: Springer, 2021: 171-190. |
[3] | WANG C S. Art innovation or plagiarism? Chinese students’ attitudes toward AI painting technology and influencing factors[J]. IEEE Access, 2024, 12: 85795-85805. |
[4] | REDDY A. Artificial everyday creativity: creative leaps with AI through critical making[J]. Digital Creativity, 2022, 33(4): 295-313. |
[5] | KANTOSALO A, RAVIKUMAR P T, GRACE K, et al. Modalities, styles and strategies: an interaction framework for human-computer co-creativity[C]// The 11th International Conference on Computational Creativity. New York: IEEE Press, 2020: 57-64. |
[6] | ZHOU Y C, MURESANU A I, HAN Z W, et al. Large language models are human-level prompt engineers[EB/OL]. (2022-11-03) [2023-08-12]. http://arxiv.org/abs/2211.01910. |
[7] | DECKERS N, FRÖBE M, KIESEL J, et al. The infinite index: information retrieval on generative text-to-image models[C]// The 2023 Conference on Human Information Interaction and Retrieval. New York: ACM, 2023: 172-186. |
[8] | OPPENLAENDER J, LINDER R, SILVENNOINEN J. Prompting AI art: an investigation into the creative skill of prompt engineering[EB/OL]. (2023-03-13) [2023-05-02]. http://arxiv.org/abs/2303.13534. |
[9] |
王常圣. 人工智能驱动的数字图像艺术创作: 方法与案例分析[J]. 智能科学与技术学报, 2023, 5(3): 406-414.
DOI |
WANG C S. AI-driven digital image art creation: methods and case analysis[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(3): 406-414. (in Chinese) | |
[10] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// The IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Press, 2022: 10684-10695. |
[11] |
厉向东, 夏涵飞, 单逸飞, 等. 图形用户界面自动生成技术的现状与挑战[J]. 图学学报, 2024, 45(3): 409-421.
DOI |
LI X D, XIA H F, SHAN Y F, et al. Opportunities and challenges: automatic generation technologies for graphical user interfaces[J]. Journal of Graphics, 2024, 45(3): 409-421. (in Chinese)
DOI |
|
[12] | BURAGA A P. The emergence of the type-generated AI art community: a netnographic and content analysis approach[EB/OL]. [2023-08-25]. https://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-54878. |
[13] | OPPENLAENDER J. The creativity of text-to-image generation[C]// The 25th International Academic Mindtrek Conference. New York: ACM, 2022: 192-202. |
[14] | LO L S. The art and science of prompt engineering: a new literacy in the information age[J]. Internet Reference Services Quarterly, 2023, 27(4): 203-210. |
[15] | SAHARIA C, CHAN W, SAXENA S, et al. Photorealistic text-to-image diffusion models with deep language understanding[C]// The 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 36479-36494. |
[16] | YUAN M K, PENG Y X. Bridge-GAN: interpretable representation learning for text-to-image synthesis[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(11): 4258-4268. |
[17] | RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]// The 38th International Conference on Machine Learning. 2021: 8821-8831. |
[18] | CETINIC E, SHE J. Understanding and creating art with AI: review and outlook[EB/OL]. [2023-11-13]. http://arxiv.org/abs/2102.09109. |
[19] |
古天骏, 熊苏雅, 林晓. 基于SASGAN的戏剧脸谱多样化生成[J]. 图学学报, 2024, 45(1): 102-111.
DOI |
GU T J, XIONG S Y, LIN X. Diversified generation of theatrical masks based on SASGAN[J]. Journal of Graphics, 2024, 45(1): 102-111. (in Chinese)
DOI |
|
[20] | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2023-11-13]. https://arxiv.org/abs/2103.00020v1. |
[21] | NICHOL A Q, DHARIWAL P, RAMESH A, et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models[EB/OL]. [2023-07-04]. http://arxiv.org/abs/2112.10741v3. |
[22] | BETKER J, GOH G, JING L, et al. Improving image generation with better captions[EB/OL]. [2023-08-19]. https://cdn.openai.com/papers/dall-e-3.pdf. |
[23] | LIU V, CHILTON L B. Design guidelines for prompt engineering text-to-image generative models[C]// The CHI Conference on Human Factors in Computing Systems. New York: ACM, 2022: 1-23. |
[24] | OPPENLAENDER J. A taxonomy of prompt modifiers for text-to-image generation[J]. Behaviour & Information Technology, 2023: 1-14. |
[25] | BROWNE K. Who (or what) is an AI artist?[J]. Leonardo, 2022, 55(2): 130-134. |
[26] | CHANG M, DRUGA S, FIANNACA A J, et al. The prompt artists[C]// The 15th Conference on Creativity and Cognition. New York: ACM, 2023: 75-87. |
[27] | GOENAGA M A. A critique of contemporary artificial intelligence art: who is ‘Edmond de Belamy’?[J]. AusArt, 2020, 8(1): 49-64. |
[28] | WANG J Y, CHAN K C K, LOY C C. Exploring CLIP for assessing the look and feel of images[EB/OL]. [2023-08-11]. https://doi.org/10.1609/aaai.v37i2.25353. |
[29] | HENTSCHEL S, KOBS K, HOTHO A. CLIP knows image aesthetics[J]. Frontiers in Artificial Intelligence, 2022, 5: 976235. |
[30] | GONSALVES R A. Digital art showdown: stable diffusion, DALL-E, and Midjourney[EB/OL]. [2023-08-25]. https://towardsdatascience.com/digital-art-showdown-stable-diffusion-dall-e-and-midjourney-db96d83d17cd. |
[31] | Museum of Bad Art. Museum of Bad Art - art too bad to be ignored[EB/OL]. [2023-08-24]. https://museumofbadart.org/. |
[32] | MEYER M A, BOOKER J M. Eliciting and analyzing expert judgment: a practical guide[M]. Philadelphia: SIAM, 2001: 3-4. |
[33] | BAUM S D, GOERTZEL B, GOERTZEL T G. How long until human-level AI? Results from an expert assessment[J]. Technological Forecasting and Social Change, 2011, 78(1): 185-195. |
[34] | 张佳婧, 彭韧, 王健, 等. 水墨画计算审美评估[J]. 软件学报, 2016, 27(S2): 220-233. |
ZHANG J J, PENG R, WANG J, et al. Computational aesthetic evaluation of Chinese wash paintings[J]. Journal of Software, 2016, 27(S2): 220-233. (in Chinese) | |
[35] | KOZINETS R V. Netnography[M]//MANSELL R, ANG P H. The International Encyclopedia of Digital Communication and Society. Chichester: Wiley, 2015: 1-8. |
[36] | KOZINETS R V. Immersive netnography: a novel method for service experience research in virtual reality, augmented reality and metaverse contexts[J]. Journal of Service Management, 2023, 34(1): 100-125. |
[37] | KOZINETS R V. Netnography: the essential guide to qualitative social media research[M]. Los Angeles: SAGE Publications Ltd, 2019: 226-227. |
[38] | DUMITRICA D D. Netnography. Doing ethnographic research online[J]. Canadian Journal of Communication, 2013, 38(1): 159-160. |
[39] | DUNCAN M. Autoethnography: critical appreciation of an emerging art[J]. International Journal of Qualitative Methods, 2004, 3(4): 28-39. |
[40] | 王常圣. Midjourney AI绘画艺术创作教程: 关键词设置、艺术家与风格应用175例[M]. 北京: 化学工业出版社, 2024: 1. |
WANG C S. Midjourney AI painting art creation tutorial: keyword setting, artists and style applications in 175 examples[M]. Beijing: Chemical Industry Press, 2024: 1. (in Chinese) | |
[41] | CHAMBERLAIN R, MULLIN C, SCHEERLINCK B, et al. Putting the art in artificial: aesthetic responses to computer-generated art[J]. Psychology of Aesthetics, Creativity, and the Arts, 2018, 12(2): 177-192. |
[42] | WANG C S. Value differences in image creation between human and AI and the underlying influences[J]. Journal of Korea Multimedia Society, 2024, 27(6): 729-744. |
[1] | YU Han, CHEN Zhiyuan, XIONG Xirui, DAI Yuanxing, CAI Hongming. Intelligent MBSE design approach based on retrieval augmented large language model [J]. Journal of Graphics, 2024, 45(6): 1188-1199. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||