This action requires saving a cookie to remember the page's state. You can read more about that on the cookies page.

 

Making Stable Diffusion go faster on old GPUs

Released on: 2023-08-26

When optimising something, you generally you want as few things as possible competing for the scarce resources. This is pretty routine in the server world, which is why it surprises me that I haven’t seen anyone talking about this. So let’s fix that now.

Table of contents

Corrections and updates

  • 2023-08-27 01:24: In the video I mentioned getting 4-5x the performance. I was wrong. It’s 3x. Still very worthwhile, but worth correcting.
  • 2023-08-28 13:49: Added note about, and link to a video that experiments with --medvram for SDLX, and my subsequent experimentation with that.

TL;DR

Stopping the GUI frees up enough VRAM to make a meaningful difference to the performance of StableDiffusion:

  1. Stop the GUI with something like sudo systemctl stop display-manager.
  2. Make sure that the Automatic1111 webui starts without the --lowvram option enabled.
  3. Connect to it remotely.

My setup

  • GTX 970 4GB.
  • 32GB system RAM.
  • AMD FX(tm)-8350 Eight-Core Processor @ 4 - 4.2Ghz.

Steps

Close apps

Close any apps that you have running and don’t want to loose data in. Basically what ever you’d do before shutting down/rebooting the machine.

SSH into the computer

SSH to the computer from another machine. This asserts that you comfortably have what you need to continue the next steps.

Check the current state

Run nvidia-smi to see what the GPU looks like at the moment. On my system, I see about 700MB used out of 4096MB (4GB).

Stop the GUI

This step may vary between different distributions, so you may need to google it. But on my OpenSUSE system, it’s sudo systemctl stop display-manager

Check the state again

Run nvidia-smi again, and you should now see that there is 0 VRAM usage.

Disable –lowvram

Edit your startup script that calls webui.sh (or webui.sh itself if you modified it).

Find --lowvram and remove it.

Start the Automatic1111 webui

Start up the webui as you would normally. If you normally add --lowvram here, don’t.

Make the port accessible to other devices

SSH method

You need to be able to access the port so that you can access the webui from another machine. I’ve mostly been using ssh -L like this:

ssh hostname -vL 7860:localhost:7860

Where hostname is the machine that is running the service that you want to connect to. Note that you don’t need to change localhost.

You will then connect to http://localhost:7860 in the browser on the machine that you are connecting from (ie the computer you SSH’d from).

Reverse proxy method

If you’d like to access it on other devices, like your phone, you can do so by setting up a reverse proxy with a tool like nginx, or socat.

Here’s a quick command that you can use to run socat:

socat -d TCP4-LISTEN:7861,fork TCP4:localhost:7860

Make sure to open port 7861 on your firewall.

You will then connect to http://hostname:7861 in the browser on the machine that you are connecting from (ie the computer you SSH’d from).

Make sure to change hostname to the name or IP address of the machine that is running the service.

Results

Two screenshots of runs in action.
Above: Two screenshots of runs in action.

  • Before: 00:52s-00:56s per 20 step image. -> 2.73s/it.
  • After: 00:16s per 20 step image. 1.22s/it.

Update: Interestingly, a day after I published this article, I came across this video where someone went from no manual VRAM optimisations to using --medvram to get an 11x speed boost with SDXL. They have more VRAM than I do, and they are doing it specifically with SDXL, which I am not. Regardless, I was curious to see how --medvram would work for my setup. The answer is that it is close enough to same, sometimes a second slower than without --medvram. It works a lot of the time without OOMing. This surprised me, because --medvram used OOM almost instantly for me a few months ago. So the fact that it works at all now is excellent progress.

Why?

By freeing up this VRAM, there is enough space to no longer need --lowvram for most use-cases. --lowvram invokes optimisations that prioritise using less memory, at the expense of processing time.

What about Windows and MacOS

I don’t know.

On Windows you might be able to get close to the same results by running Automatic1111 as a Windows service and then logging out. There will probably be several challenges to solve in doing this, and I don’t think that it will get down to 0 usage, but I suspect that you will still free up enough VRAM to be worth pursuing.

For MacOS, everything I want to say is based on assumptions on how things used to work, that may not be true now. It would be worth researching further to see what is possible.

If this isn’t viable for you

Generally speaking, even if you can’t use this technique, you want to remove as much competition for the GPU VRAM as possible. Close absolutely every window you can. Even if it’s only a calculator, it’s still eating VRAM.

Another technique

Another way of achieving this is to have 2 GPUs in your system. Dedicate one of them to running the GUI and applications, and one to running stable diffusion.

Wrapping up

When I first started using Automatic1111, --lowvram was absolutely essential to get it to work at all. The stable diffusion community is hard at work, and things are improving rapidly. So it’s been really interesting to see just how functional it is now without --lowvram, and how it’s able to gracefully degrade in many cases. There are still enough cases where it doesn’t gracefully degrade that I wouldn’t want to run it without --lowvram if I haven’t freed up VRAM first. But the progress is impressive none-the-less.

This blog post is about Automatic1111, but the method is applicable to any other AI service.

Have fun.

This post references

2022-03-07
The RandomKSandom series is the spiritual successor to FunnyHacks. Here, you can find all of the posts about it.
2023-08-22
After 4 non-stop intensely challenging years, it was time for a change. Here are some highlights so far.

Posts using the same tags

It's time to blow the dust off machine learning, and apply it to a dataset that I know.
My CV had gained so much weight that it was hard to do anything with it any more, and it was hard to read. So I did something about it...
Control your computer using a Leap Motion controller, but with an every-day-quality implementation.
If you want to share stable diffusion on your network without exposing any information to the outside world. This is one way to do it.
There's a trick for getting a big speed boost on old hardware that's so easy that I'm surprised I haven't heard people talking about it.
What began as 3 tripods on a hill, and hours per photo, ended with way better results in seconds, hand held.
2023-08-22
After 4 non-stop intensely challenging years, it was time for a change. Here are some highlights so far.
I accidentally automated Javelining a plane into the ground. And I learnt a huge amount along the way.
Over the last few years, there has been a lot of talk about whether you can make use of the full resolution on a phone with a 4K display. Let's dig in and actually understand this.
2022-11-17
Group of posts about the Astro Slide 5G.
I'm stopping my Patreon activity for now. Let's dive a little deeper into why.
4 easy phone hacks to make your phone more useful and fun
2022-03-07
The RandomKSandom series is the spiritual successor to FunnyHacks. Here, you can find all of the posts about it.
My dream office.
DoneIt is a time tracker for working out where your time is going. For me there were a few things I wanted to achieve with it - Be able to say what I've been doing all day. - See how much time is lost to time ...
Well over a year ago I introduced mass which is a tool for managing large numbers of servers. It's come along way in that time, and it's well and truly time for another look. From the users' point of view, the most interesting things are probably that you can now ...
Achel is a programming language that I have been working on for the past 13 years (I said 12 in the video, it has been a long time!) There has been growing interest in the programs I have been writing using it, so I have now released it open source. ...
Home | About | Contact | Cookies | Site map