Removing duplicate video file?
-
- Posts: 83
- Joined: Tue Apr 22, 2014 6:08 pm
Removing duplicate video file?
I've just performed an MD5 hash of my dcp-o-matic output. It's showing that I have a duplicate file. One copy is in the 'video' sub-folder - it has a long seemingly randomly generated file name ending in ".mxf". The other is in the sub-folder that goes to the screening room and it's the name of my film with "video.mxf" suffixed. I think it's obvious I have to keep the latter, but I'm uncertain about the other one. The file is 165 GB and so if I don't need both, then it would be good to have that disk space back. Is it safe to delete? TIA
-
- Site Admin
- Posts: 2548
- Joined: Thu Nov 14, 2013 2:53 pm
-
- Posts: 83
- Joined: Tue Apr 22, 2014 6:08 pm
Re: Removing duplicate video file?
Yes it does, Carl. Apologies for being thick - I didn't check the faq's... I promise I looked in the manual online (in the Generated Files section) before posting and I searched the forum. I've just re-reviewed chapter 11, and I think the manual text could be expressed differently, if I may be so bold?
Currently, the manual says:
Can I also check that when I copy the 'whole' DCP directory to another drive say, for backup purposes, that most operataing systems will in effect copy the same file over twice (overwriting the first copy)? I use Linux and Windows. So I need to exclude one of the 'duplicates' from being copied, potentially, if I'm to speed up copying. I copy my original file using my Linux system from an Ext4 disk to an NTFS formatted drive and I've come across an interesting issue if this is the case: My MD5 software has flagged one of the duplicates as not being identical to its corresponding version on my backup but it reports the other 'version' of the same file to be identical on both the original and the backup. Of course there are potentially many reasons for this, including the MD5 software is not working properly, there was a read error on one read and not on the second iteration, or that the files are being stored twice (once incorrectly on the originlal or the backup). This mismatch happens, though, on large files involving different source and target drives quite often (and I've used different checking software, so I think I can discount software issues) and although not a dcp-o-matic issue, it really concerns me that the OS does not seem to spot the issue when copying, if indeed it exists at the time.
I did find the faq's useful, having now looked at them. As I understand it, in respect of a completed DCP, the contents of the info and the j2c folders can be deleted (in my case, the former contains 144,000+ small .md5 files). If I can delete these then copying time will vastly improve and also data security. I find both NTFS and Ext4 not great at handling that number of files in a single directory. Even were I able to tar.gz them, I think copying time will be vastly improved.
My understanding is that to generate a KDM dcp-o-matic only needs access to the metadata.xml file in the directory immediately above the directory in which the DCP itself is stored, so can I also delete the ffprobe.log and the log files?
Currently, the manual says:
This suggests (to me anyway) that the video folder content and that of the DCP itself (where the 'second' copy of the data is stored) are two different files, one the finished video and the other a working file. May I suggest that the faq text is incorporated into the manual page?DCP-o-matic generates some working files as it goes along. These are as follows:
log is a list of notes that DCP-o-matic makes as it goes along. This can be useful for debugging purposes if something goes wrong.
metadata stores the settings that you have made for this film: things like cropping, output format and so on.
video is where DCP-o-matic writes the DCP's video data as it encodes it.
analysis is used to keep the results of audio analysis runs.
info contains details of each video frame that DCP-o-matic has written so far. This is used when an encoding operation is interrupted and DCP-o-matic must resume it.
Following this is the DCP itself: DCP-TEST_EN-XX_UK-U_51_2K_CSY_20130218_CSY_OV. This contains some small XML files, which describe the DCP, and two large MXF files, which contain the DCP's audio and video data. This folder (DCP-TEST_EN-XX_...) is what you should ingest, or pass to the cinema which is showing your DCP.
Can I also check that when I copy the 'whole' DCP directory to another drive say, for backup purposes, that most operataing systems will in effect copy the same file over twice (overwriting the first copy)? I use Linux and Windows. So I need to exclude one of the 'duplicates' from being copied, potentially, if I'm to speed up copying. I copy my original file using my Linux system from an Ext4 disk to an NTFS formatted drive and I've come across an interesting issue if this is the case: My MD5 software has flagged one of the duplicates as not being identical to its corresponding version on my backup but it reports the other 'version' of the same file to be identical on both the original and the backup. Of course there are potentially many reasons for this, including the MD5 software is not working properly, there was a read error on one read and not on the second iteration, or that the files are being stored twice (once incorrectly on the originlal or the backup). This mismatch happens, though, on large files involving different source and target drives quite often (and I've used different checking software, so I think I can discount software issues) and although not a dcp-o-matic issue, it really concerns me that the OS does not seem to spot the issue when copying, if indeed it exists at the time.
I did find the faq's useful, having now looked at them. As I understand it, in respect of a completed DCP, the contents of the info and the j2c folders can be deleted (in my case, the former contains 144,000+ small .md5 files). If I can delete these then copying time will vastly improve and also data security. I find both NTFS and Ext4 not great at handling that number of files in a single directory. Even were I able to tar.gz them, I think copying time will be vastly improved.
My understanding is that to generate a KDM dcp-o-matic only needs access to the metadata.xml file in the directory immediately above the directory in which the DCP itself is stored, so can I also delete the ffprobe.log and the log files?
-
- Posts: 2804
- Joined: Tue Apr 15, 2014 9:11 pm
- Location: Germany
Re: Removing duplicate video file?
A lot of beginners think that the DCP is the whole directory structure that is created by DCP-o-matic. A while ago we facilitated finding the DCP by using the 'Show DCP' DropDown Menu entry. It will open the project folder and highlight the most recent DCP folder. You only need to copy this folder to the distribution media.
When using a filesystem with hard link capability, the video-mxf in the video directory and the one in the DCP directory are essentially the same file, to save space.
Wether you want to keep the remaining files for archive purposes or not, is up to you. In most cases, it makes no sense, at least not after you have created a complete version of your DCP. I usually copy only the DCP folder to a transport disc or stick, then delete everything except the metadata XML file, so I can quickly create a new version without setting up everything from scratch.
- Carsten
When using a filesystem with hard link capability, the video-mxf in the video directory and the one in the DCP directory are essentially the same file, to save space.
Wether you want to keep the remaining files for archive purposes or not, is up to you. In most cases, it makes no sense, at least not after you have created a complete version of your DCP. I usually copy only the DCP folder to a transport disc or stick, then delete everything except the metadata XML file, so I can quickly create a new version without setting up everything from scratch.
- Carsten
-
- Posts: 1
- Joined: Wed Apr 19, 2017 9:14 am
Re: Removing duplicate video file?
Try Duplicate Files Deleter as it can remove any problems that you might have.