View Bug Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002636 | DCP-o-matic | Bugs | public | 2023-10-19 21:37 | 2023-10-22 21:46 |
Reporter | carl | Assigned To | carl | ||
Priority | normal | Severity | minor | Reproducibility | N/A |
Status | acknowledged | Resolution | open | ||
Target Version | 2.16.x | ||||
Summary | 0002636: Check vectorisation of rgb_to_xyz in libdcp | ||||
Description | Does it get vectorised? If not, can we re-arrange it? And does doing any of that make any difference in a benchmark? | ||||
Tags | No tags attached. | ||||
Branch | |||||
Estimated weeks required | |||||
Estimated work required | Undecided | ||||
|
Changed CXXFLAGS to conf.env.append_value('CXXFLAGS', ['-O3', '-msse2', '-fopt-info-vec-missed', '-ftree-vectorize', '-fno-trapping-math']) I think only -O3 is necessary..? Typical runs of benchmark in the region of carl@shankly:~/src/libdcp$ time run/benchmark rgb_to_xyz real 0m11,145s Not vectorised because "control flow in loop" Comment out the / Out gamma LUT / and the clamping and carl@shankly:~/src/libdcp$ time run/benchmark rgb_to_xyz real 0m1,708s ~10x quicker (and no mention in the diagnostics of why it couldn't vectorise). Maybe it's better not to use a LUT for that? it's piecewise and requires a ternary |
|
Does |
Date Modified | Username | Field | Change |
---|---|---|---|
2023-10-19 21:37 | carl | New Bug | |
2023-10-19 21:37 | carl | Assigned To | => carl |
2023-10-19 21:37 | carl | Status | new => acknowledged |
2023-10-20 01:56 | carl | Note Added: 0006038 | |
2023-10-20 01:56 | carl | Note Added: 0006039 | |
2023-10-22 21:46 | carl | Target Version | 2.16.67 => 2.16.x |
2023-10-22 21:46 | carl | Estimated work required | => Undecided |