View Bug Details

IDProjectCategoryView StatusLast Update
0002636DCP-o-maticBugspublic2023-10-22 21:46
Reportercarl Assigned Tocarl  
PrioritynormalSeverityminorReproducibilityN/A
Status acknowledgedResolutionopen 
Target Version2.16.x 
Summary0002636: Check vectorisation of rgb_to_xyz in libdcp
Description

Does it get vectorised? If not, can we re-arrange it? And does doing any of that make any difference in a benchmark?

TagsNo tags attached.
Branch
Estimated weeks required
Estimated work requiredUndecided

Activities

carl

2023-10-20 01:56

administrator   ~0006038

Changed CXXFLAGS to

conf.env.append_value('CXXFLAGS', ['-O3', '-msse2', '-fopt-info-vec-missed', '-ftree-vectorize', '-fno-trapping-math'])

I think only -O3 is necessary..?
-fopt-info-vec-missed gives diagnostics about what wasn't vectorized

Typical runs of benchmark in the region of

carl@shankly:~/src/libdcp$ time run/benchmark rgb_to_xyz

real 0m11,145s
user 0m9,960s
sys 0m1,180s

Not vectorised because "control flow in loop"

Comment out the / Out gamma LUT / and the clamping and

carl@shankly:~/src/libdcp$ time run/benchmark rgb_to_xyz

real 0m1,708s
user 0m0,684s
sys 0m1,023s

~10x quicker (and no mention in the diagnostics of why it couldn't vectorise).

Maybe it's better not to use a LUT for that? it's piecewise and requires a ternary
to decide which part to look in...

carl

2023-10-20 01:56

administrator   ~0006039

Does __restrict__ help?

Bug History

Date Modified Username Field Change
2023-10-19 21:37 carl New Bug
2023-10-19 21:37 carl Assigned To => carl
2023-10-19 21:37 carl Status new => acknowledged
2023-10-20 01:56 carl Note Added: 0006038
2023-10-20 01:56 carl Note Added: 0006039
2023-10-22 21:46 carl Target Version 2.16.67 => 2.16.x
2023-10-22 21:46 carl Estimated work required => Undecided