Thread: AI Art Generation |OT| Midjourney and beyond
Official Thread
FLUX NF4 Install
Animated GIF


For those who aren't keen on the Spaghetti nature of ComfyUI and might lack the Stiff GPU requirements to run the FLUX full Dev model, a new solution has come to the fore over the weekend.

lllyasviel who developed Web-Forge (A1111 variant) carried out a significant update to Web-Forge that utilizes some hitherto untapped Nivia Tech called NF4 and essentially released a new version of the FLUX F8.Dev model that runs pretty fast, even on machines with as little as 6GB of GPU RAM.

On my rather aged 3070Ti with 8GB of RAM I've found I'm able to run and get a generation of around 1280x720 in less than a minute, which isn't too shabby at all.

Sebastian Kamph has a video about it here



Details about how to get and install everything are below


There has been an update since he posted that Video and on top of Upscaling and Inpainting now being supported, LORA support has been added though I haven't tested that yet.

Anyway some Forge-generated pics, some with 1.5 upscaling (click on to enbiggen).

PYH3rqs.png
Rrgq5mM.png

n285uBt.png
PzmL3hw.png

qFUFGrQ.png
eaxZQER.png

RO0vB7D.png
6yjgFQ7.png
 
FLUX
New Open Source Contender arrives

FLUX (or FLUX.1 [schnell] to give it it's full title)

5bff488da88933a529825e4fd72e10c3824e22508dc1d6f36b49850fa517ac44


Below some example outputs I produced on their generator Demo Hugging Face Page (Some beauty shots and then ones trying to emulate, Alien/Aliens, Blade Runner and Empire Strikes Back Cinematic looks). The site gives you about 40 generations for free before it taps out and asks you to pay. You get 40 generations for a $1, but being online, you'll likely hit issues with anything NSFW I suspect. Plus it seems you can only put in a positive prompt, but not a negative, which seems like an oversight to me.

774fjed.png
ZQErMDL.png
EhmLaNc.png

1dflRpm.png
hUuAy7U.png
nDQQYAR.png

H1kwuYH.png

B13DeVC.png

kw3isOx.png


However, you can run it locally if you have at least an 8GB Nvidia GPU apparently. Although I have the model downloaded I haven't checked yet to see how it runs though on my 3070Ti. I suspect given the model size, it's going to be slow, however at the same time I suspect the prompting can be a lot better

Olivio has a video all about it below



If you do download You'll need to use ComfyUI as the actual model is a whopping 22GB in size, though maybe down the line A1111 will add support for it

It is available on Civitai now to download, as well as some comfyUI workflows people have created


Or at Huggingface where you can read up more about it at the hugging face

 
  • Like
Reactions: Toecutter
Stable Diffusion: Rough explanation of In-painting.
Alright, will try to subscribe then. Still, could you answer my question about multiple people in one image?

If you are on about having say an image with 2 distinct-looking people in it (for instance Scarlett Johansson and Kirsten Dunst) a baseline AI no matter the model cannot produce that natively. It might have an awareness of what the people look like if they are famous, but what it will likely produce is an amalgamation of them.

As you can see below, the output has given me two distinct-looking characters, but the faces are similar looking and we're not seeing much of Kirsten, but a lot of Scarlett.

zhOaTSG.png


To overcome that, they need to send your Text to image output (txt2img) to Image to image (Img2img) and do some inpainting. Which basically involves masking the part of the image you want to change, and adjusting the prompt.

So here I've done that selecting the left-hand face and updated the image. I wouldn't say it's a perfect likeness of Kirsten (this is a stylized model I'm using), but it's definitely a more distinct face, and I also changed the lipstick to pink in the prompt.

dfTX5Gs.png


I could then repeat the exercise with the right-hand face, and make it a bit more Scarlett looking.

oYxTQuR.png


However, maybe I'm like let's change Scarlett to Sasha Grey instead (again, not a great likeness, but this is a very stylized model I'm using).

hoNynas.png


The thing to understand with AI is basically that when you are running a prompt, the image is being generated by noise through the model and so it's not like it is painting with an awareness of knowing where the end result is going to be, it's just pulling the data from a sense of what something is visually. Faces and bodies it is generally good at, as well as objects, buildings, etc etc, but hands for instance, AI tends to suck at those as they are quite complex.

Hope that helps, but honestly, there are a bunch of YouTube channels I've listed previously that will give you a better understanding of how to do much of this. This post is more demonstrating the flexibility of something like Stable Diffusion.
 
Better Photographic Results in Stable Diffusion
I must admit I tend to noodle more with the semi-real/stylised models than the photorealistic ones, however, I chanced across this video in my youtube feed today and the guy has some great tips for getting quality Photographic results with some of the better realism models like Absolure Reality and Photon when it comes to prompting: -



He also has a free (or pay what you want) PDF Guide up at Gumroad which is really good and well worth getting: -

These were just done using his prompt recommendations using the SD1.5 Model

qsPfoYa.png
GD0TCnS.png

Th7z8sx.png
uM1m3UB.png


His approach should be good for SDXL as well though obviously less the negative prompts as they aren't required in the new model. I'll endeavour to post up some tests over the weekend.
 
  • Like
Reactions: Toecutter
Indepth with ComfyUI & InvokeAI 3.0
If you want to into ComfyUI in a bit more detail Scott Detweiler who now works for Stability AI has put out a couple of videos going into it in a bit more depth to show off all that the software can do:-



As well as a video about SDXL with some prompting advice given the Model is significantly more finetuned than SD1.5 or SD2.1



Also, Olivio put out a video recently covering the release of InvokeAI 3.0 which is a nice alternative to A1111



Personally, I think A1111 or InvokeAI are great starting points for image generation whilst ComfyUI is perhaps more for people who know exactly what they are doing or for advanced finetuning of an existing prompt. However they are ultimately all useful in their own way,
 
  • Like
Reactions: Toecutter
Quick Guide to Stable Diffusion on your PC
I'm sure people smarter than us will be logging network traffic to be sure so yeah the local installs ought to be ok. Btw any tips on where to start on improving output from Stable Diffusion? Like an idiots guides to prompts or what extensions you need to install and how to do it etc?

That's a rabbit hole and a half. However, I will endeavour to drop some wisdom (Better to answer in this thread as the other one is more about ChatAI)

/Autism mode engaged


I'll assume you have nothing and start from there

Firstly you'll need to download Python & Git.

Make sure you install Automatic1111 through git

I recommend either installing on a dedicated SSD or a chunky HDD. Don't install to your main OS drive because the models will fill up your hard drive space in no time once you have a few of them given they are anywhere from 2-6GB each and you almost certainly will build up a collection (I have 710GB of models currently, and that's after pruning them down )



That way you can update it easily via git pull. If it isn't then it's a case of manually downloading and overwriting files every time you want to update.


With my set up I have two installs on separate drives

1. For actual generation work (2GB SSD)
2. For testing models, etc etc ...before determining whether they are any good (2GB HDD Partition)


I only have the 2nd on set to automatically update, that way if there is an issue (which usually gets fixed quickly tbh) I'm not fucking up my main install.

My Win-User-Bat has these settings on the second less the 'git pull' on the first


Code:
@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --xformers --no-half-vae --opt-split-attention --disable-nan-check --autolaunch
git pull
call webui.bat


Update!! - if you have a pretty capable GPU then look here and check against the Cuda requirements as to whether you can run pytorch 2.0 and then follow the procedures there instead. No need for Xformers with Pytorch 2.0.

You can read up on the codes here: -


Your go-to place for models is here


Though be aware you have to wade through a lot of anime waifus (and the odd furry stuff...*shudders*), and so you might want safe search on, however if : -

i-also-like-to-live-dangerously-danger.gif


Safe mode off and too hell with the consequences

Unless you are planning on a career in mixing and merging models yourself don't bother downloading the 7GB versions you don't need them. Most pruned models are in the 2-4GB range and they are perfectly fine. You'll often find that ppl upload the big model straight away and then put up the pruned version a day or so later. Sometimes it is just worth doing a 'wait and see' on certain models. Esp as some creators have a bad habit of uploading a model then uploading an update to it a day or so later (it's like bruh.....seriously ),

You can just click on the model to download, however, someone made a handy extension here: -


That has a lot of functionality

The model landscape is always changing as new mixes and new models appear daily however some pretty dependable 1.5 models are these ones: -


One thing to get is a VAE which along with models improves outputs

There are about 3 of 4 different ones, some orientated towards anime, however, the main one is this: -


Often models will say they have a VAE, which you can download as well as the model, however, they're generally just the same 3-4 ones. If you want a simple life, just copy it to your model's folder and rename it with the model name + .var.ckpt or vae.safesensor and make sure under settings 'ignore selected VAE for stable diffusion checkpoints that have their own .vae.pt next to them'

Hopefully, at some point, some clever bastard will add some script that automatically can link this Model to that VAE without having to either manually select it or clone the same file, but right now it's not the case.

Anyway on settings: -

0. I don't bother with Grids
1. PNG always
2. Save text information about generation parameters as chunks to png files (super useful)
3. I like codeformers for face restoration, however increasingly models are way better at faces, so face restoration is less necessary but if you must go codeformers 0.5 0r 0.6 is usually sufficent
4. Ignore selected VAE for stable diffusion checkpoints that have their own .vae.pt next to them (as suggested earlier)
5. Enable quantization in K samplers for sharper and cleaner results. This may change existing seeds. Requires to restart to apply.
6. Add model hash to generation information
7. Add model name to generation information
8. When reading generation parameters from text into UI (from PNG info or pasted text), do not change the selected model/checkpoint. (sometimes you want to try out a prompt with a different model, not auto switch models).
9) Quicksettings list post this into the text box below : sd_model_checkpoint, CLIP_stop_at_last_layers, sd_vae
That will put the clipskip & VAE selection on the front end (where they should have been from the off)
10) anytime you move a model, embed or whatever in the files no need to restart SD, just refresh/reload the UI

Samplers I don't use I hide. I use the following: -

Euler A
The Karras ones
DDIM
PLMS (good for inpainting)
UnmiPC

None of the above should need to go beyond 40 steps to generate a decent image, and I wouldn't bother with additional steps with hi-rez fix (although use high-rez fix, more on that later)

CFG scale the lower it is the more random it gets the higher it is the more it tries to conform to the prompt. 7 - 10 are good ranges, if you want to be a bit looser then go lower (useful for Controlnet). I rarely ever go higher than 10.

Someone said a good rule of thumb is to take the CFG number ( say 7) and use that as the calc to work out your generation steps off of (so 7, 14, 21, 28, 35) but that could just be BS

under scripts, at the bottom of the UI you can X/Y a lot of stuff so set test generations to determine how different samplers compare etc etc

I wouldn't waste too much time with that though unless you want to get into finetuning an image to determine the best result.

From the official extensions, I'd recommend the following: -

https://github.com/Klace/stable-diffusion-webui-instruct-pix2pix

This allows you to change small things in an existing image easily without inpainting, though you do need to download the 7GB model for it work off of


Basically an in UI browser that allows you to review and rate your generations and add to favourites for easy recall.



Adds more bells and whilst to upscaling, available from the imagetoimage scripts tab

You might want to look at ControlNet, dynamic prompts, prmptgen, wildcards & additional networks, though some of these are installed by default IIRC.


I don't have a bulletproof guide from prompting, you can generally find the prompt info used for models at Civiai with the models

I use this


which requires the tamper monkey web extension to download quickly zip and download any model preview images, which I drop into a folder for easy reference and means I can load up the image via the PNG Info tab if required. All you do id refresh the page when on a civitai model page and the zip file appears under the downloads tab.

The main things I've learnt are: -

if using a 1.5 Model (which are the most popular) keep at least 1 dimension at 512 pixels. Typically I might generate images at 512x768 (2:3)

If you are using the 2.1 model then you have to have 768pixels as a minimum width

Don't get obsessed about trying to generate the same result as someone, even if you have all their generational data to hand. There are myriad factors involved the might have resulted in them getting the result they did versus yours from whether they used xformers to what build of A111 they were on.

Don't get too fussy about trying to massively upscale when initially generating using high res fix. Unless you have s 4090 your GPU will likely crap out when you go beyond trying to generate past x2. However you can upscale images after generation x4 and beyond easily either under the image to image or the extras tab.

It took me a long time to realise that. I do use the high rez fix as it boosts image quality and well as addresses errors, but I generally set it to 1.25 times, and then if I want to work the image I can always regenerate it using the seed into etc

There's a good overview of how to upscale effectively here: -


With high rez fix, it tends to default to 0.7, on the Denoising strength however the higher the number the bigger the regeneration and so you might want to use a lower scale like say 0.2-0.3 . as that way it's not radically remaking everything.

As mentioned before restore faces is not that useful. I generally turn it off now as the newer community models do a great job with faces and oftentimes, esp if you are using anime or semi-real models the outputs are worse for it.

Main thing is negative prompts are super useful, and you'll find that there are a lot of community-generated TI embed negative embeds.


For instance this guy has enhancement negative embeds for both 1.5 & 2.21 models which work really well, dependent on the image type


When it does come to upscaling you're not bound by the ones that come with SD you can find a load here: -


I really like

4x_foolhardy_Remacri
4xNMKD-Supeecale-SP_178000_G

Any anime/painterly/vector type stuff just use the R-ESRGAN-4X+ Anime6B which is installed as standard as that works pretty good.


Quick little insight into the UI. The purple tab toggles to your models/additional networks, and if you have the civitai helper installed it will add preview images for your models, embeds etc, as well as links to the civitai page, keywords, prompts, etc.

You can create subfolders within the various model's folders which is what I've done, so it's not one big old list of madness, although there is a search function. : -


bHLih36.jpg
dz4hlMZ.jpg



You'll also find a lot of tutorials at civitai and their discord is pretty good for tips & tricks and the like.

and it's worth keeping tabs on the following YouTubers






Anyway...hopefully that will give you some things to think about. Hit me up on Discord though if you have particular questions.


Couple of quick prompt tips

() add emphasis/weight to the words in a prompt, so (big eyes) or ((big eyes)) or (big eyes:1.2) all work better than big eyes, on their own. If you want to add emphasis select the word then CTRL+Up arrow or just add the brackets yourself

Also

(Thing1|Thing2) in a prompt means that the generation will swap between whatever thing1 is and whatever thing2 is with every step to create an amalgam so (Emma Stone|Scarlett Johansson) would produce a hybrid between them.

However for more control though use [] instead : -

[Thing 1:Thing 2:##number of steps or #.# decimal of] which generates off the first until it either hits the number of steps or the decimal , where in it switched to the second.

So for instance

[emma stone|Scarlett Johansson: 0.5] would mean whatever the generational step count, 50% of the way through the generation would swap over to Scarlett for the remainder of the generation.

The later approach is going to give you a much more distinct look versus the former, esp as you can adjust it to fine tune the results.

/Autism mode dis-engaged

Be aware there are other SD applications like InvokeAI and ComfyUI (the latter of which everyone is raving about as it's quite adaptable), however, A1111 is by far the most popular and widely supported currently.

Also, don't get precious over generations. Ultimately SD is about batching on mass, picking out a few winners and working them up and discarding the rest. What you make today might seem like the bee's knees but 3 months from now you'll be like 'WTF do I still have this BS on my Rig?' I have old folders full of thousands of generations that are just lurking around in need of me having the desire to give them a good clean-up tbh. Nowadays I delete daily after every generation.
 
SDXL Release & Overview
Stability AI publicly released its latest image Ai model SDXL yesterday

A big step up from SD1.5 and the misfire of SD2.1 which never really hit the ground running

You can read the blurb about it here: -


However, the main thing to know is if you want to run it locally you are going to need a GPU with at least 8GB of RAM (preferably Nvidia though A1111 might work with AMD).

Some very basic test images are below: -

PEipKIu.png
xiARLyD.png

EoshJDc.png
IoNi8lX.png
tVEQ4oP.png
ytba9u7.png

aEc0uh1.png
gIlI7rH.png


SDXL will run in Automatic1111, however, from my testing the performance isn't great presently in terms of render time and you have to jump through some hoops to get it to work well, however so if you do want to check it out then I suggest installing ComfyUI. As an Interface it can look a little bit daunting, however, it's basically an image-processing flow chart system and once you wrap your head around that aspect it makes sense.

I haven't yet had a chance to test out how SDXL performs with InvokeAI yet, but I suspect it lies somewhere in between A1111 & ComfyUI

Sebastian Kamph
has a couple of Videos, the first detailing how to install ComfyUI and the second about installing SDXL and getting it up and running (there are a couple of 6GB models to download). Note you can easily link ComfyUI to your Automatic1111/InvokeAI models folders so you don't need to double up: -



If you want to stick with Automatic1111 then Olivio Sarikas has a video about how to get SDXL up and running in that: -



I dare say the performance issues of A1111 will get ironed out in short order, however, the main problem is that whereas with ComfyUI you can simply load up different workflows dependent on what you want to do, A1111 frontloads everything and as SDXL doesn't play nice if a whole raft of SD1.5 extensions are installed you have to disable most of them, then reload the UI which is obviously a huge pain in the ass. On the flip side, A1111 is still the king in terms of user-friendly extensions such as image browser, and extra network previews.

Note that no SD1.5 embeds, Hypernetworks, Lora, Lycoris work with SDXL as it is built on baseline images being 1024 x1024 whereas SD1.5 was built on 512x512. With that said you can output images at say 1024x768, but at least one dimension needs to be larger than 1024 otherwise you will run into distortion issues.

Don't expect any sexy times from the base model. It is going to be a while before people within the Image AI community have trained up custom models fully for that sort of thing, however with that said some early custom models are already up at CivitAI such as AI legend Lykons Dreamshaper SDXL alpha, as well as a first stab by him at an Anime Art SDXL model: