A little over a year ago, I started learning about AI image generation. There are a lot of tools for creating AI images. Perhaps the most popular is Midjourney. You have to pay for Midjourney, but it produces very high-quality images without the need to know much about what you are doing.
The most powerful tool, however, is ComfyUI. ComfyUI is free, and it has a node-based interface that allows you to have maximum control and freedom to experiment. However, it has a definite learning curve, and you need a powerful computer to run the most recent models.
I started working with these tools a year ago, then got distracted with other matters, but now I’m back trying to work with these tools again. One thing I noticed a year ago, and which I still see now, is that Vietnam does not do well when it comes to AI image generation. Let me explain what I mean.
Today, one of the best AI image generation models is called Flux Dev, and when I asked it to produce for me the following:
“A fashionable young Vietnamese man and woman walking side by side down the street in 1960s Saigon. The man is wearing a white suit and sunglasses. The woman is wearing an ao dai with a retro design and sunglasses. The woman has a beehive hair style.”
. . . what Flux produced for me was images of what look more like Chinese couples (well, the guy on the right might be Vietnamese?) walking somewhere in the Chinese world. . .
Then I used another tool (Freepik – but it’s not free) that can create images from sketches. It has an “imagination” slider. As you move the slider to the right, it increasingly “imagines” a realistic image based on a sketch and prompt.
I uploaded an image from an old advertisement of a Vietnamese woman sewing and added the prompt “A Vietnamese woman in the 1930s sewing.”
And as I moved the slider to the right. . . it imagined. . . a Chinese woman in the 1930s sewing!!
Then a year ago, I asked Midjourney to create images of a Vietnamese woman in an ao dai reading in a house in 1930s Hanoi, or something like that.
And this is what I got:
Well, this person is not Chinese, but. . .
So, do you see the problem we have here? Whatever data AI image generation models are trained on obviously does not include much or good data on Vietnamese and Vietnam.
To some extent, this can be addressed by using a Low-Rank Adaptation (LoRA). LoRAs are small models that have been trained on a specific dataset. They “fine-tune” larger models by giving them specific information about something.
In the case of Vietnam, some people have created Vietnam-specific LoRAs, such as the ones below:
What do you see? Women! Why? Because the world of AI image generation is being driven by guys who like to look at pictures of beautiful girls.
That’s ok for our purposes here though, because in our original attempt to create an image of a fashionable couple in 1960s Saigon, that of course included a woman wearing an ao dai. So, I connected an ao dai LoRA to the Flux (Schnell) model and this is what I got:
That is not 1960s Saigon, but damn those ao dais are cool!! Madame Trần Lệ Xuân would have been very impressed!!
If someone made a movie about 1960s Saigon and had actors like these in it. . . I would watch that movie. Sure, it wouldn’t be completely accurate, but damn that couple looks cool!!
As for accuracy, I actually find that some of the older models did a better job, even though the quality was not as good.
Here are some images that I created using some older (SD1 and SDXL) models:
Although the quality is not as good, I think that these images do a better job of capturing Saigon in the 1960s, but they do so not because they know what Saigon looked like, but more because they are trained on generic imagery of cars and Asian buildings (maybe from Bangkok?) from the 1960s.
Meanwhile, the newer Flux models suffer from what we can call the “Midjourney effect,” that is, they have become more cinematic and glossy, like they are coming out of a fashion magazine.
Yes, I know that in my prompt I wrote a“fashionable” young Vietnamese man and woman, but even when I take the word “fashionable” out, with the newer models I still get images like these:
My point here is that if we consider that AI images will become increasingly common, then societies and countries should pay attention to the kinds of images AI produces of their society and country.
And if they don’t like what they see, then they should make the effort to train models or LoRAs that can do a better job.
As an historian, I generally feel unsatisfied with what I see. In my dream world, this is something that a branch in the professional field of History would focus on: visually creating the past by developing models and LoRAs for various historical periods and settings.
In reality, and as is already happening with people who create LoRAs for beautiful girls in ao dais, it will more likely be individuals interested in the past who will create LoRAs in ways that appeal to their tastes.
It will be fascinating to see where this all leads!!
In ten years time actors and cameras will have become redundant.
It looks that way, at least for a lot of them. . .
LMK: “That is not 1960s Saigon, but damn those ao dais are cool!! Madame Trần Lệ Xuân would have been very impressed!!”
Your claim regarding Madame Trần Lệ Xuân may not be accurate. The áo dàis in the AI-pictures have collar, whereas she preferred ones without collar. I remember reading in William Colby’s “Lost Victory” that the collar was considered by Madame TLX to be a remnant of Chinese influence, which had to be eradicated. Hence the introduction of the collarless Áo dài bà Nhu. In that sense, it amounted to an abortive attempt at Thoát Trung avant la lettre, given the enduring popularity of the áo dài with collar.
https://vania.com.vn/products/ao-dai-ba-nhu
Reading this is like reading a review that points to an issue that you knew wasn’t solid but you sent the paper off anyway. . . I forgot about the collar issue, but I also wasn’t sure if she would be pleased. 🙂
I’m an absolute layman in history but somewhat a student of AI. Generative AI is fun but it bugs me when the output only has the vibe but is inaccurate. Maybe there can be historians well-versed in AI who train models and create datasets of highly accurate generated images, and that becomes the standard so that AI users have something to base on. Otherwise they’d have a misinformed view of the past. This could be an interesting development for the academia in the near future.
Thanks for the comment. I completely agree with you. I don’t see academia contributing to this (way too slow at adapting to change). My guess is that individual history enthusiasts will be the ones who do this (and if you look at Civit Ai you can already see that happening, although it’s still very limited), and that in the end we’ll end up with a variety of models to work with (some better than others). I’m planning on getting started on this myself soon.
I think the challenge will be that most people will want to create something with a more Hollywood/TV historical drama feel and that there will be fewer people who will try to create something that is less dramatic and more (at least theoretically) accurate.
On TV and Hollywood, who do you think makes the best historical feeling dramas? I’ve heard that the Japanese invested a lot into their taiga and tried to make languages diverse, e.g bringing in (modern) Mandarin and Mongolian for a tasteful representation of historical figures. I haven’t watched any. I’m a fan of Hongkong’s Jin Yong wuxia adaptations. These Asian shows might not be accurate but at least they put in some effort. I have a feeling that Hollywood movies make their historical characters walk, talk, and think like modern Americans. I once watched a Vietnamese historical movie about Quang Trung with a nationalist theme, which is distasteful.