View Bug Details

IDProjectCategoryView StatusLast Update
0000400DCP-o-maticFeaturespublic2021-04-21 20:09
Reportercarl Assigned Tocarl  
PriorityimmediateSeverityfeatureReproducibilityhave not tried
Status acknowledgedResolutionopen 
Target Version2.16.0 
Summary0000400: Copy to external distribution drive
Description

Reports on 25/2/21 that it's very slow writing to a 500GB spinning disk on macOS (estimates of 53h to write a 175GB DCP).

Tagsdisk, major, next
Branch
Estimated weeks required8
Estimated work requiredMedium

Relationships

parent of 0001730 resolvedcarl Need to build lwext4 static on Linux or rename the .so 
parent of 0001731 resolvedcarl Improve layout of conversational dialogues in _disk 
parent of 0001738 resolvedcarl Installing the macOS package does not kill an existing _writer process 
parent of 0001739 resolvedcarl macOS _writer needs to start on demand 
parent of 0001740 resolvedcarl Some uninstaller needed for macOS package 
parent of 0001741 resolvedcarl Test macOS with a mounted USB stick, trying to write to a second stick 
parent of 0001742 new Does disk fail with Paragon installed? 
parent of 0001743 resolvedcarl Improve progress estimation in disk 
parent of 0001744 resolvedcarl Unmount fails on Linux when a drive's name has spaces 
parent of 0001745 resolvedcarl disk automated testing 
parent of 0001746 resolvedcarl Progress reporting for disk writes on Linux is bad 
parent of 0001747 new Consider security of disk / disk_writer / nanomsg setup with malicious actors 
parent of 0001916 resolvedcarl disk writer fails to start writing if the disk is asleep 
parent of 0001970 resolvedcarl polkit policy file is not installed for Linux 
parent of 0001969 resolvedcarl If you manually unmount a drive on Linux the disk writer doesn't notice 
parent of 0001968 resolvedcarl Disk writer doesn't escalate privileges to umount drives on Linux 
parent of 0001971 acknowledgedcarl Remove deadly warning from startup of disk writer, or stop distributing disk writer 
child of 0000624 acknowledged Wizard mode 
Not all the children of this issue are yet resolved or closed.

Activities

andrew.levine

2015-02-22 09:16

reporter   ~0000511

Nice idea :-)

carl

2019-03-12 22:19

administrator   ~0003139

https://www.ext2fsd.com/?tag=git
https://github.com/gkostka/lwext4

carl

2019-06-10 22:50

administrator   ~0003373

make mingw
cd build_mingw
make

builds lwext4 for Windows.

carl

2019-06-11 02:01

administrator   ~0003374

Last edited: 2019-12-02 15:36

Code that can make the partition table, filesystem & write a file in git:dcpdist. lwext4 pretty much works on Mac except that the call to ftello(dev_file) in file_dev_open returns 0 rather than the correct partition size - hardcoding that value seems to let it make a valid ext2 filesystem.

mhm

2019-10-25 00:13

reporter   ~0003502

The on disk filesystem should probably be ext3 for maximum compatibility. Please see:
https://www.isdcf.com/ISDCF_Public/HDD_Recommendations.html

Maybe it is already?

carl

2019-11-30 23:32

administrator   ~0003617

Last edited: 2019-12-01 14:30

Unknowns:

  • enumerating available drives (some work in hacks/drives.cc)
  • getting privileges

carl

2019-12-02 15:36

administrator   ~0003622

DoM branch is git:dist

carl

2020-01-20 00:14

administrator   ~0003716

Things to do:

  • Some UI for unmounting filesystems so they can be written to.
  • macOS / Windows support in the UI (getting available drives, and probably some other stuff).

carl

2020-03-16 23:37

administrator   ~0003752

Last edited: 2020-03-17 00:12

Improving the windows DE; exploring

cd ~/src
cdist -p dcpomatic -c dist --environment-version 2.15.x -t windows-64 -w dcpomatic-windows shell
cd dcpomatic-windows/src/dcpomatic
bash platform/windows/copy_deps.sh

Then on a VM with /home mounted as Z:

z:
cd \src\dcpomatic-windows\src\dcpomatic
platform\windows\set_paths.bat
build\src\tools\dcpomatic_dist.exe

carl

2020-04-28 22:05

administrator   ~0003786

Last edited: 2020-04-28 22:09

Seems to work but performance is terrible. Some notes...

The first slow thing is write_bgroups. It uses the block cache to write (I think) the inode and block bitmaps and the group descriptions into each block group. I don't think the block cache helps much here (BICBW). You can make the block cache larger (e.g. 65536) to make write_bgroups instantaneous but that just means you are delaying the write until ext4_block_cache_flush.

The second slow thing is init_bgs(), specifically the code which runs if ext4_bg_has_flag(bg, EXT4_BLOCK_GROUP_BLOCK_UNINIT) and ext4_bg_has_flag(bg, EXT4_BLOCK_GROUP_INODE_UNINIT). I'm not clear exactly what this is about or whether it's necessary.

I think the underlying problem is that writes are randomly spread over the disk and are done in small blocks. For example, the write_bgroups writes block group #0's bitmaps and then the group descriptor for block group #0 in block group 0000001, 0000002, 0000003, .... Then it does the same for group 0000001.

This might be mitigated by the block cache but that never seems to coalesce writes. So when you ext4_block_cache_flush it just dumps the blocks in any old order and does 512-byte fwrites. Sorting and coalescing a large cache seems like it might be slow. Doing that in ext4_block_cache_flush might be the easiest way to test this theory though.

It seems like the cleanest solution might be to remove the block cache altogether and just make sure that the whole of the block group "header" (i.e. group descriptions, block bitmap, inode bitmap, inode table) is written at the same time.

Or maybe ext4_block_cache_flush could walk lba_root, coalesecing and writing?

carl

2020-04-29 00:21

administrator   ~0003787

Last edited: 2020-04-29 00:23

Exploring the walk of lba_root but something is causing validation to fail; now running the same code with the old _flush to see if that fixes it. /dev/disk looks like it might be quicker than /dev/rdisk with the ordered writes.

carl

2020-05-02 18:33

administrator   ~0003793

Strangely, simply selecting a 4096 byte block size speeds up formatting a lot (hours to minutes). I tried most of the things suggested above:

  1. Walk lba_root rather than the other list, so that writes are ordered.
  2. Coalescing writes while doing this
  3. Using /dev/rdisk

and all of them either made no difference or made things worse.

carl

2021-03-03 21:03

administrator   ~0004146

Last edited: 2021-03-05 21:24

Testing again on Mac with a 1TB external (spinning) USB drive, formatting is again very slow. Disk read/write use of about 150KB reported by activity monitor. Stuck in write_bgroups.
The call to write_bgroup_block (every 128 times round the loop) takes a long time.
Each write_bgroup_block call's loop runs 4623 times, each loop is quite quick.
So it looks like O(n^2) here, 4623/128 calls to write_bgroup_block, each does 4623 internal loops.
Every internal loop iteration calls set_direct
carl2-opt branch improves speed of write_bgroups; using /dev/rdisk along with this makes the write_bgroups much faster.
Now init_bgs is slow. Looks like it's walking the disk again initialising block and inode bitmaps - can we not do this in write_bgroups?

carl

2021-03-08 00:19

administrator   ~0004147

Tried to add this to write_bgroups in 6313cdffcc56683effb4d568b274e4ebbbab7353 but it sems to reduce the speed back to where it was; need to check the disk access patterns of the new write_bgroups.

carl

2021-03-08 20:54

administrator   ~0004148

Last edited: 2021-03-08 20:57

Initialising the inode tables requires a lot of writes, and mke2fs does not do it unless lazy_itable_init is switched off (on by default); perhaps this is safe for us not to do also (though it does seem that kernel versions before 2.6.37 don't support lazy_itable_init [the kernel does the initialisation on first mount].

Another option might be to write the empty blocks using ext4_block_writebytes in ext4_fs_init_inode_table so that they're written in one lump. Not sure if this will improve things much.

carl

2021-03-08 22:49

administrator   ~0004149

Another possible optimisation: ext4_block_get_noread's call to ext4_block_cache_shake seems to cause a seek to 4096 and write for each block group when doing a mkfs.

carl

2021-03-09 00:13

administrator   ~0004150

Last edited: 2021-03-09 00:45

The carl2-opt and disk-opt branches do:

  • progress reporting during format
  • don't initialise inode tables
  • use writebytes in a couple of places
  • write block group headers in more sequential/disk-friendly ways

Test case of copying 1.1Gb DCP from internal (spinning) disk to external (spinnnig) 2TB disk via USB on my old mac mini running Catalina.
Total time: 12:04
Format time: 6:09
Write time: 5:16
Verify time: 0:38

Activity monitor reports about 4MB/s data write during write part.
About 33MB/s data read during verify.

Extrapolating to 175GB DCP: 14h
Extrapolating to 75GB DCP: 6h

Should test writing speed of this drive when mac formatted / throughput described by activity monitor; odd that we're seeing 10 times worse write speed. Also need to look again at what blocks are actually getting read/written during the write part.

carl

2021-03-11 23:33

administrator   ~0004151

Did a bit of this looking and made ed2692cd03672bc11e4b6644378f07ce7fc02fc0 which needs testing. On each block we update things like the inode and superblock free blocks count because we turn off caching around sections where we write direct to the disk. Unless I am missing something, this seems unnecessary; it's right that the cache be used just for updating inode descriptors/superblocks etc. without being cleaned out by a large data block write. But I think this is achieved by the metadata-manipulation methods using via-cache write calls and the others using direct calls. BICBW.

carl

2021-03-12 19:32

administrator   ~0004153

That patch seems to make no real difference.

carl

2021-03-12 19:45

administrator   ~0004154

Needless to say macOS can copy the DCP to the save drive (macOS formatted) in about 10s with >100MB write speed reported.

carl

2021-03-12 21:10

administrator   ~0004155

Doing a simple stdio fwrite() in 4k blocks, 1GB, takes 10m57s with /dev/disk and 4m55s with /dev/rdisk suggesting that the ext2 code isn't doing so badly.

Using posix write() to /dev/rdisk, otherwise the same test takes 1m8s ~4x quicker.

POSIX: block_size=4096 blocks=262144 gap=0 rdisk=0 nocache=0 time=64.286583
POSIX: block_size=8192 blocks=131072 gap=0 rdisk=0 nocache=0 time=64.255241
POSIX: block_size=16384 blocks=65536 gap=0 rdisk=0 nocache=0 time=68.197258
POSIX: block_size=32768 blocks=32768 gap=0 rdisk=0 nocache=0 time=68.389603
POSIX: block_size=4096 blocks=262144 gap=0 rdisk=0 nocache=1 time=66.222176
POSIX: block_size=8192 blocks=131072 gap=0 rdisk=0 nocache=1 time=64.539406
POSIX: block_size=16384 blocks=65536 gap=0 rdisk=0 nocache=1 time=64.662483
POSIX: block_size=32768 blocks=32768 gap=0 rdisk=0 nocache=1 time=66.905128
POSIX: block_size=4096 blocks=262144 gap=0 rdisk=1 nocache=0 time=69.026535
POSIX: block_size=8192 blocks=131072 gap=0 rdisk=1 nocache=0 time=35.261715
POSIX: block_size=16384 blocks=65536 gap=0 rdisk=1 nocache=0 time=23.672134
POSIX: block_size=32768 blocks=32768 gap=0 rdisk=1 nocache=0 time=11.774684
POSIX: block_size=4096 blocks=262144 gap=0 rdisk=1 nocache=1 time=69.525856
POSIX: block_size=8192 blocks=131072 gap=0 rdisk=1 nocache=1 time=35.356491
POSIX: block_size=16384 blocks=65536 gap=0 rdisk=1 nocache=1 time=23.697132
POSIX: block_size=32768 blocks=32768 gap=0 rdisk=1 nocache=1 time=11.814719
POSIX: block_size=65536 blocks=16384 gap=0 rdisk=1 nocache=1 time=8.268696
POSIX: block_size=131072 blocks=8192 gap=0 rdisk=1 nocache=1 time=6.767652
POSIX: block_size=262144 blocks=4096 gap=0 rdisk=1 nocache=1 time=7.698212
POSIX: block_size=524288 blocks=2048 gap=0 rdisk=1 nocache=1 time=6.834740

  • nocache seems to make no difference
  • rdisk is much faster, larger blocks are much faster until you get to around 128k then it bottoms out

carl

2021-03-12 21:20

administrator   ~0004156

Format time: 03:07
Copy time: 00:21
Verify time: 00:55

Total 04:21

Bug History

Date Modified Username Field Change
2014-08-11 18:24 carl New Bug
2014-09-13 00:04 carl Assigned To => carl
2014-09-13 00:04 carl Status new => acknowledged
2015-02-22 09:16 andrew.levine Note Added: 0000511
2015-06-12 12:20 carl Severity minor => feature
2015-06-12 16:32 carl Estimated work required => Major
2015-08-27 16:56 carl Target Version => 2.x
2017-05-12 23:11 carl Relationship added parent of 0000624
2019-01-30 00:56 carl Target Version => 2.16.0
2019-03-12 22:19 carl Note Added: 0003139
2019-06-10 22:50 carl Note Added: 0003373
2019-06-11 02:01 carl Note Added: 0003374
2019-10-24 22:57 carl Priority normal => urgent
2019-10-25 00:13 mhm Note Added: 0003502
2019-11-02 22:57 carl Tag Attached: next-2
2019-11-27 16:21 carl Tag Attached: major
2019-11-27 22:01 carl Estimated weeks required => 8
2019-11-30 23:25 carl Note Edited: 0003374
2019-11-30 23:32 carl Note Added: 0003617
2019-12-01 14:30 carl Note Edited: 0003617
2019-12-02 15:29 carl Note Edited: 0003374
2019-12-02 15:36 carl Note Edited: 0003374
2019-12-02 15:36 carl Note Added: 0003622
2019-12-17 20:57 carl Tag Detached: next-2
2019-12-17 20:57 carl Tag Attached: next
2020-01-20 00:14 carl Note Added: 0003716
2020-03-16 23:37 carl Note Added: 0003752
2020-03-17 00:12 carl Note Edited: 0003752
2020-04-12 23:11 carl Relationship added parent of 0001730
2020-04-12 23:12 carl Relationship added child of 0001731
2020-04-12 23:15 carl Relationship replaced parent of 0001731
2020-04-28 22:05 carl Note Added: 0003786
2020-04-28 22:09 carl Note Edited: 0003786
2020-04-29 00:21 carl Note Added: 0003787
2020-04-29 00:23 carl Note Edited: 0003787
2020-05-02 18:33 carl Note Added: 0003793
2020-05-02 18:39 carl Relationship deleted parent of 0000624
2020-05-02 18:39 carl Relationship added child of 0000624
2020-05-02 18:39 carl Relationship added parent of 0001738
2020-05-02 18:40 carl Relationship added parent of 0001739
2020-05-02 18:41 carl Relationship added parent of 0001740
2020-05-02 18:42 carl Relationship added parent of 0001741
2020-05-02 18:43 carl Relationship added parent of 0001742
2020-05-02 20:38 carl Relationship added parent of 0001743
2020-05-02 23:18 carl Relationship added parent of 0001744
2020-05-02 23:23 carl Relationship added parent of 0001745
2020-05-02 23:38 carl Relationship added parent of 0001746
2020-05-13 13:36 carl Relationship added parent of 0001747
2020-05-17 20:47 carl Tag Attached: required
2020-05-17 20:59 carl Tag Detached: required
2020-05-17 20:59 carl Priority urgent => immediate
2020-07-05 23:32 carl Tag Attached: disk
2021-01-07 23:55 carl Estimated work required Major => Medium
2021-02-25 23:00 carl Description Updated
2021-03-03 21:03 carl Note Added: 0004146
2021-03-03 21:03 carl Note Edited: 0004146
2021-03-03 21:08 carl Note Edited: 0004146
2021-03-03 21:11 carl Note Edited: 0004146
2021-03-03 21:12 carl Note Edited: 0004146
2021-03-03 21:18 carl Note Edited: 0004146
2021-03-04 00:36 carl Note Edited: 0004146
2021-03-05 21:12 carl Note Edited: 0004146
2021-03-05 21:13 carl Note Edited: 0004146
2021-03-05 21:24 carl Note Edited: 0004146
2021-03-08 00:19 carl Note Added: 0004147
2021-03-08 20:54 carl Note Added: 0004148
2021-03-08 20:56 carl Note Edited: 0004148
2021-03-08 20:57 carl Note Edited: 0004148
2021-03-08 21:45 carl Relationship added parent of 0001916
2021-03-08 22:49 carl Note Added: 0004149
2021-03-09 00:13 carl Note Added: 0004150
2021-03-09 00:15 carl Note Edited: 0004150
2021-03-09 00:44 carl Note Edited: 0004150
2021-03-09 00:44 carl Note Edited: 0004150
2021-03-09 00:45 carl Note Edited: 0004150
2021-03-11 23:33 carl Note Added: 0004151
2021-03-12 19:32 carl Note Added: 0004153
2021-03-12 19:45 carl Note Added: 0004154
2021-03-12 21:10 carl Note Added: 0004155
2021-03-12 21:20 carl Note Added: 0004156
2021-04-21 20:08 carl Relationship added parent of 0001970
2021-04-21 20:08 carl Relationship added parent of 0001969
2021-04-21 20:08 carl Relationship added parent of 0001968
2021-04-21 20:09 carl Relationship added parent of 0001971