Encode Server Farm Optimization

Anything and everything to do with DCP-o-matic.
Post Reply
bputney
Posts: 1
Joined: Fri Apr 19, 2019 6:55 pm
Contact:

Encode Server Farm Optimization

Post by bputney » Wed May 22, 2019 9:26 pm

OK, warning: I've only been using DCP-o-matic for out 8 months. I do content production for a Film Festival and we switched last year from BluRay to DCP.

Last year I had a 12 Core Mac Pro Cheese Grater and 120 films to process. Because of scheduling other stuff it got tight at times. I decided this year I'd add some horsepower to give me some breathing room.

So I finally found some Dell R820 servers. These things can support 4 Xeon processors and have a lot of memory and expansion space. The ones I could afford were 4 - Xeon 8 core 2.6 GHz processors with 128 GB RAM. I got them with the optional 10 GbeFX NIC card and installed a 512 GB SATA3 SSD as a boot drive. I bought a pair of these so I should in theory have 128 cores to do encode on. That's all these Dells have to do is encode frames. The Mac Pro 12 Core I've made the user interface that only runs DCP-o-matic and Batch Converter. (See the attached drawing)

I'm trying to figure out how to benchmark this and how to compare it to existing benchmarks. I ran a 1080p MP4 of a theatrical film through it and got something like 55 FPS. I saw that people were using Sintel to benchmark and aside from the ambiguity that there are 6 different resolutions and formats for that one film, I gave it a try. I got about 16 FPS on the DCP-o-matic screen on the Mac Pro. Not sure what was happening since the Encode Server screens were showing at times that they were cranking out 24 FPS each encoding the 2K MP4 of Sintel but DCP-o-matic never showed higher than 15 or 16 FPS.

These Dell servers have speed controlled fans, so the clue about how effectively you've loaded them is how loud their fans are. Doing the theatrical film that showed 55 FPS on DCP-o-matic, the fans idled all the way through. When I ran Sintel through the fans kicked up to about half speed intermittently, so I'm thinking they were working harder but not maxed out.

It looks like a lot of people are running Windows for their Encode Server arrays and I'm not sure if that's been optimized better than the Linux versions.

What I'm looking for is someone who has spent some time optimizing Encode Server arrays and can give me some pointers on optimization. I want the fans in these things to sound like 747 at takeoff thrust continuously, not a Super Cub.

Thanks, Bill
Attachments
DCP-o-matic.pdf
(67.35 KiB) Downloaded 35 times
Bill Putney
Port Townsend FIlm Festival
bill@ptfilmfest.com

Carsten
Posts: 1358
Joined: Tue Apr 15, 2014 9:11 pm
Location: Germany

Re: Encode Server Farm Optimization

Post by Carsten » Wed May 22, 2019 10:12 pm

You should try to run the encoding master on one of the servers instead of using the MacPro.
I understand the MacPro has two Gigabit network interfaces. You could try to use both, each one connected to one of your encode servers. Very often, the network speed is the bottleneck when using fast encode servers.

https://dcpomatic.com/benchmarks/ has the details on how to benchmark and compare your setup with the two 'standard' videos BigBuckBunny and Sintel (Bunny is better suited for slow machines, while Sintel is better for fast setups).

In general, the encode server approach scales nicely, but not endlessly. Very important is to have a 10Gig or faster network if you are using fast machines. MultiCPU machines are faster than networking single CPU machines.

Also, the master has to be fast enough to deliver enough frames to the encoding servers. That may depend on your source file resolution and codec complexity. A given machine can not decode e.g. 2k Prores at any rate. 50-60fps may already be maxing out the MacPro. When using many encode servers, it may be advisable to have a master with a very high single thread performance, as most of the decoding and prepping the master does is single threaded.
Obviously, in a setup like this, you want your source content and target on an SSD. SSD is overestimated for standard machines running DCP-o-matic, but for a high performance setup, you better use them.

I don't think Linux vs. WIN vs. OS X makes a difference for encode servers.

If you think that your combined CPU performance is not maxed out using the encode server approach, you should probably decide to split jobs between physical servers, that is, set up two masters on each machine and split the jobs between these. That will usually give the better overall performance.

A good estimate on what to expect is CPU passmark.

https://www.cpubenchmark.net/high_end_cpus.html

It scales very good with DCP-o-matic's JPEG encoding. As a rule of thumb, 1000 passmark units equal one fps of SINTEL encoding. e.g. a machine with a CPU passmark of about 8000-9000 will give you 8fps encoding Sintel. Add the CPU passmark numbers of all your CPUs, and you know the theoretical limit. I use a setup of two dual CPU Xeon machines, with a combined CPU passmark of about 32.000 (4*6coreHT CPUs alltogether), I get about 30fps encoding Sintel. Your 55fps would roughly equal probably an aggregated CPU passmark of 60.000 - which is not bad. In general, live footage, also in flat or 16:9 resolution, will compress somewhat slower than Sintels Scope sized animation footage.

What is the exact CPU type on these servers and the Mac Pro?

Also, still a valid hint - set the number of encoding threads a bit higher than your machine has physical cores. For my 24core machine, I typically set something like 28-36 encoding threads. You will hear the difference by listening to your fan speed ;-) Don't overdo it, it may slow down if using way too many threads, and also need a lot more memory (especially when doing 4k).

Do you read and write from/to the QNAP during encoding? That will put even more load on the network interfaces. If your MacPro has two separate network interfaces, use one towards the QNAP, and the other towards the encoding servers. Or add a 10G card to it. Oh, I see, there seems to be a 10G card in the PowerMac.

You may also try a local encode on the MacPro SSD, omitting the NAS traffic. You probably need to play through a few scenarios to find the sweet spot for your config.

- Carsten

rlsound
Posts: 2
Joined: Sat Dec 30, 2017 7:12 pm

Re: Encode Server Farm Optimization

Post by rlsound » Sun Jun 09, 2019 11:05 pm

Hi Bill,

With your Xeon rigs, I would check the bios to make sure all the cores are running. If they are ex-enterprise machines, I've read that they turn them down for whatever reason.

Next, check to see if you have a fresh version of Ubuntu 18.04.2 LTS installed. Then update the kernel using ukuu to 4.19.46. This is the official LTS kernel for linux.

Then, check to see where the CPU governor is set and make sure to turn it off and activate Turbo Boost ( if your CPUs support that ).

When I had to re-do our Ubuntu box a few weeks ago due to a login problem, I did all these and managed to squeeze extra FPS out of the system than that was noted before.

You can installed htop and monitor the threads that are active. If running dcp-o-matic with the thread count set to maximum, it should show that all the threads are running close to 100%.

Carsten
Posts: 1358
Joined: Tue Apr 15, 2014 9:11 pm
Location: Germany

Re: Encode Server Farm Optimization

Post by Carsten » Mon Jun 10, 2019 10:33 am

Also, the log in the project directory will show wether DCP-o-matic has detected all (local) cores:

Fri 17 May 15:55:37 2019: DCP-o-matic built in optimised mode.
Fri 17 May 15:55:37 2019: libdcp built in optimised mode.
Fri 17 May 15:55:37 2019: Built for 64-bit
Fri 17 May 15:55:37 2019: CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz, 8 processors

The 'processors' value will include physical cores, in this case, it is a 4coreHT CPU.

But I guess there are similar options using standard linux shell commands. The log will only be created on a machine running DCP-o-matic main, not on the encode servers.

I'd still like to learn about the exact CPU types from Bill, as that makes it possible to compute the best-case performance of such a set up, so you know there is room for improvement or not.

- Carsten

Post Reply