General¶
Object Processing¶
Commits¶
- Web users provide the system with metadata on objects as information basis for the distribution of money
- The system needs to balance the need of web users to change the metadata and the stability of the information basis
- Artists, Releases, Creations have to be committed by the web users as a conscious act
- No data is published before a commit
- Web users are able to distinguish public/private fields after a commit
- A commit of a set of objects (all fields, public fields) is saved as rendered text for evidence
- Creations do not need to have content to be committable
- Content with a relation to a Creation is autocommitted, when the creation is committed
- Before a commit, web users may edit the object freely
- Committing an object implies committing all objects down the hierarchy (see Cascades)
- After a commit, web users may
- edit only data not relevant for the distribution or object to frequent changes:
- adding/removing members of an Artist
- adding a Release to a Creation
- changing Release metadata
- trigger a dispute request for changing relevant data:
- deleting an object (implies creation of duplicate for references)
- adding/removing a contributor to a Creation
- adding/removing a original/derivative Creation
- adding/removing licenses to Releases of Creations
- adding/removing the Content assigned to a Creation
- removing a Release from a Creation
- edit only data not relevant for the distribution or object to frequent changes:
- After a commit, administrators may
- revise the object(s)
- reject the object(s)
- A successful claim dispute (see Workflows) causes the object to be uncommitted
- Uncommitted objects should be recommitted again as soon as possible by the web user
- if the artist name or creation title is changed on recommit, all admins of referencing objects are informed via email/webinterface
- if a referenced object is deleted before recommit, a duplicate for all references is created and referenced instead
- Created / Uncommitted objects
- are promoted on the web user dashboard
- excluded from searches (via add/list/etc), except
- web users can add own uncommitted objects to other own uncommitted objects
- web users can see but not add own uncommitted objects to other committed objects
Disputes¶
- A dispute is the process of mediation in a conflict between web users
- A dispute has a
- code: Sequence
- state: Selection (requested, assigned, resolved)
- case: Selection (list of usecases?)
- object: Reference (to the disputed objects)
- assignee: User
- request_party: Party
- request_text: Text (user statement)
- request_time: DateTime
- resolved_time: DateTime
- comments: Comment (many comments by administrators)
- A dispute comment has a
- dispute: Dispute
- text: Text
- time: DateTime
- A dispute may be triggered by a web user on several occasions
- A dispute can be requested, assigned and resolved (see Workflows)
- A dispute is requested via a dispute form including
- The disputed object
- The issue category
- The user motivation
- A dispute is handled by an assignee
- The assignee can add many comments
- The assignee can mark the dispute as resolved
Licensing¶
Tariff System¶
- A creation can have several tariff categories represented by different collecting societies
Collection¶
Allocation¶
* Invoice
Declaration¶
Utilisation¶
- Tariff
- UtilisationCreationList (Usecases)
- UtilisationIndicator (Base, Relevance, Adjustments)
- ContextIndicator
- ...
Indicators¶
- UtilisationIndicator
- ContextIndicators (for each tariff)
Repertoire¶
Rights Management¶
- A rightsholder
- must hold a right
- Copyright
- Ancillary Copyright
- must have an object to which the right belongs
- Creation
- Release
- must have a contribution depending on the object and right
- Creation
- Copyright
- Lyrics
- Composition
- Ancillary Copyright:
- Instrument
- Production
- Mixing
- Mastering
- Copyright
- Release
- Copyright
- Artwork
- Text
- Layout
- Ancillary Copyright
- Production
- Mixing
- Mastering
- Copyright
- Creation
- may have a start and end date
- may be restricted to a territory
- may have a successor
- may be represented by a collecting society
- may have a list of instruments for Creation -> Ancillary Copyright -> Instrument
- must hold a right
- Rightsholder subjects for creations and releases are Artists
Object processing¶
Claims¶
- Artist, Releases, and Creations are unclaimed, offered or claimed
- Unclaimed objects are objects, which don't belong to a web user
- Offered objects are objects, which might belong to a web user and are offered to be claimed by the web user (e.g. on registration)
- Claimed objects are objects, which "belong" to at least one web user
- Web users can (see Workflows) claim
- unclaimed and offered objects in general
- a solo artist for the current web user
- a group artist for a solo artist
- a compilation release for a solo artist (role: producer)
- a split/artist release for an artist
- a creation for an artist
- A claim implies a request for admin rights, where applicable
- Revised objects are visually promoted on the object list/details
- Unclaiming/Claiming/Revising an object implies unclaiming/claiming/revising all objects down the hierarchy (see Cascades)
- Claiming an unclaimed object results in an uncommitted object
- Claiming a claimed object
- results in a dispute
- may result in in an uncommitted object, if the disputing party proves to be right
Foreign objects¶
- During object creation, a web user may create several new, unclaimed foreign objects
- When a web user creates a new Artist, he may create many member Artists specified by a name and an email
- When a web user creates a new Creation, he may create
- many contributor Artists specified by group (yes/no), a name and an email
- many original/derivative Creations specified by the creation and the artist name, resulting in
- a new/referenced Artist object: artist name
- a new Creation object: Artist
- many track Creations specified by the creation and the artist name, resulting in
- a new/referenced Artist object: artist name
- a new Creation object: Artist
- Foreign objects are auto commited
- Foreign objects may be added by others
- resulting in a duplicate for information separation
- referencing the duplicated foreign object for deduplication
- Foreign objects are editable
- by the web user, which created the foreign object, if
- the object was created by the web user
- the object is not claimed
- the object was not part of a distribution, yet
- by the object admin of the object, which the foreign object was created for
- by the web user, which created the foreign object, if
File Processing¶
Intermediate storage for archive¶
- Files are stored on an intermediate file server
- For every stage of a file there is a corresponding folder
STAGE
- For each change of processing state, all files are moved into the next
STAGE
folder - For each folder
STAGE
there is a folder perUSER
- Folderstructure for optimized filesystemaccess
- Semantic information for manual administrative interventions
- The filenames of the files are
- the
HASH
of the filename for temporary uploads - the
UUID
of the content in the database for all other stages
- the
- For each file
[UUID|HASH]
there is a file[UUID|HASH].checksums
- Content: Checksums for each single upload chunk
- Format: CSV (begin, end, algorithm, checksum)
- For each file
[UUID|HASH]
there is a fileUUID.checksum
- Content: Checksum of the whole file
- Format: Plain text
- The full syntax for a file is
./STAGE/USER/UUID(.checksum(s))
./temporary/USER/HASH(.checksum(s))
- Examples of file paths
./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430
./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430.checksums
./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums
./previewed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksum
./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums
./fingerprinted/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./dropped/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./rejected/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
Permanent storage for user content¶
- User content is stored on a permanent file server
- For every content type of a file there is a corresponding folder
CONTENTYPE
(e.g. 'previews') - For each folder
CONTENTYPE
there is a folder perUSER
- Folderstructure for optimized filesystemaccess
- Semantic information for manual administrative interventions
- The filenames of the files are
- the
UUID
of the content in the database
- the
- The full syntax for a file is
./CONTENTTYPE/USER/UUID
- Examples of file paths
./previews/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
./excerpts/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
Stages¶
- For an overview of the workflow, see Workflow
- Each stage of file processing results in a corresponding processing state
- Upload (uploaded)
- Process
- Preview (previewed)
- Checksum (checksummed)
- Fingerprint (fingerprinted)
- Drop (dropped)
- Archive (archived)
- There is a special state for user requested deletions (tobedeleted)
- Before a file is being processed at any stage, a file .lock will be created to signal other processes to skip the file. The lockfile will be deleted after the file has been moved to the next stage folder.
Upload a file¶
- Users are allowed to upload content only, if it belongs to them
- Upload of a file in chunks
- The chunk size is 1MiB (1024*1024 Byte)
- The chunk position is given by the header
Content-Range
(chunk start, chunk end, total size) - A HASH of the user given filename is used as temporary filename
- The file is stored in
./storage/temporary/USER/HASH
- For each uploaded chunk
- A checksum of the chunk is calculated, while the chunk is still in RAM
- The checksum is appended to
./storage/temporary/USER/HASH.checksums
(CSV: begin, end, algorithm, checksum) - The database is queried for a duplicate of the checksum for early abuse detection
- The chunk checksum collisions are tracked in the user session
- Certain checksums are whitelisted (e.g. silence in different formats with/without headers)
- Above a configurable threshold (e.g. 150), the user upload is restricted temporarily
- The threshold violations are tracked in the database
- The chunk is appended to
./storage/temporary/USER/HASH
- When the upload is finished
- A Uuid is generated
- If the validation of the fileextension or mimetype fails, further processing is aborted
- The files are moved to
./storage/rejected/USER/UUID(.checksums)
- The Content is saved to the database (processing state: rejected, rejection reason: format_error, path)
- The files are moved to
- The files are moved to
./storage/uploaded/USER/UUID(.checksums)
- The Content is saved to the database (processing state: uploaded, path, storage_hostname)
- The Checksums are saved to the database
Create a preview¶
- For each file in
./storage/uploaded
- The file is locked during processing
- If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
- The files are moved to
./storage/unknown/(USER/)UUID(.checksums)
- The Content is updated, if possible (processing state: unkown, path)
- The files are moved to
- An excerpt for analysis and statistics is taken and stored in
./content/excerpt/UUID[1]/UUID[2]/UUID
- Lenght: 60 seconds out of the middle of the file
- Quality: Minimum for fingerprint recognition (11025 Hz, 16 bit, mono)
- A preview is created and stored in
./content/previews/UUID[1]/UUID[2]/UUID
- Quality: Minimum for acceptable user experience (12bit, mono, 16kHz, ogg)
- Configuration: fade in, fade out, segment interval, segment length, segment crossfade
- If preview or excerpt creation fails, further processing is aborted
- The files are moved to
./storage/rejected/USER/UUID(.checksums)
- The Content is rejected (rejection reason: format_error, path)
- The files are moved to
- The files are moved to
./storage/previewed/USER/UUID(.checksums)
- The audio properties are saved to the database (length, channels, sample rate, sample width)
- The Content is updated (processing state: previewed, path)
Calculate a checksum¶
- For each file in
./storage/previewed
- The file is locked during processing
- If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
- The files are moved to
./storage/unknown/(USER/)UUID(.checksums)
- The Content is updated, if possible (processing state: unkown, path)
- The files are moved to
- Checksums for each chunk of 1MiB (1024*1024 Byte) are calculated and saved to the database, if not present
- A checksum for the whole file is calculated and stored in
./storage/previewed/USER/UUID.checksum
- If the checksum is already present in the database, further processing is aborted
- The preview and excerpt is deleted
- The files are moved to
./storage/rejected/USER/UUID(.checksum(s))
- The Content is rejected (rejection reason: checksum_collision, duplicate of: Content, path)
- The files are moved to
./storage/checksummed/USER/UUID(.checksum(s))
- The Checksum is saved to the database
- The Content is updated (processing state: checksummed, path)
Calculate a fingerprint¶
- For each file in
./storage/checksummed
- The file is locked during processing
- If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
- The files are moved to
./storage/unknown/(USER/)UUID(.checksums)
- The Content is updated, if possible (processing state: unkown, path)
- The files are moved to
- The fingerprint is created
- If the fingerprint is already present in the database, further processing is aborted
- The preview and excerpt is deleted
- The files are moved to
./storage/rejected/USER/UUID(.checksum(s))
- The Content is rejected (rejection reason: fingerprint_collision, duplicate of: Content, path)
- The fingerprint is ingested into the database (primary key: Content Uuid)
- A FingerprintLog is saved to the database (timestamp, user, algorithm, version)
- The files are moved to
./storage/fingerprinted/USER/UUID(.checksum(s))
- The Content is updated (processing state: fingerprinted, path)
Drop a file¶
- For each file in
./storage/fingerprinted
- The file is locked during processing
- If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
- The files are moved to
./storage/unknown/(USER/)UUID(.checksums)
- The Content is updated, if possible (processing state: unkown, path)
- The files are moved to
- The files are moved to
./storage/dropped/USER/UUID(.checksum(s))
- The Content is updated (processing state: dropped, path)
Archive a file¶
- For further details, see
- For each file in
./storage/dropped
- The file is locked during processing
- If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
- The files are moved to
./storage/unknown/(USER/)UUID(.checksum(s))
- The Content is updated, if possible (processing state: unkown)
- The files are moved to
- The files are moved to
./storage/dropped.closed/UUID(.checksum(s))
- The Storehouse target
LOCATION
s are copied from./storage/targets/
to./storage/dropped.closed.targets/
- Until there is no
LOCATION
in./storage/dropped.closed.targets/
left- Until the checksums of the whole files are valid on the target machine
- The files
UUID(.checksum(s))
in./storage/dropped.closed/
are copied toLOCATION:./UUID[1]/UUID[2]/
- The files
- The target location
./storage/dropped.closed.targets/LOCATION
is deleted
- Until the checksums of the whole files are valid on the target machine
- The target location folder
./storage/dropped.closed.targets/
is deleted - The Content is updated (processing state: archived, archive: Archive, path)
Delete a file¶
- A user may request the deletion of uncommited Content
- A corresponding files might be only deleted, if the content is in the state 'uploaded' or 'rejected'
- Further the request for the deletion needs to trigger the deletion of
- the Content object
- the preview and excerpt file
- the entry in the echoprint server referencing the deleted Content object (maybe decentralized)
Archiving¶
Objects¶
Archiving¶
- The archiving process is coordinated via objects in the Database shown here as diagram
- Physical: Storehouse, Harddisk, Filesystem, Content
- Logical: HarddiskLabel, FilesystemLabel
- Integrity: Checksum, HarddiskTest
- The archiving objects are administered via tryton client and Scripts
Storehouse¶
Physical storage location
- has a code
- has an admin user
- may have a detailed description
- may have many Harddisks
Harddisk¶
Physical harddisks
- has uuids (host, harddisk)
- has checksums (harddisk)
- has a HarddiskLabel
- has a Storehouse
- has a version (for Harddisks with the same HarddiskLabel per Storehouse)
- has a closed state
- has a on-/offline state
- has an usage state
- has a creator (tryton user)
- has a function to generate a label sticker
- may have a local position (e.g. "Shelf1")
- may have many Filesystems
- may have many HarddiskTests
- may have a health state (result of the tests)
Filesystem¶
Filesystems on a harddisk
- has uuids (partition, raid, raid sub, crypto, lvm, filesystem)
- has checksums (partition, raid, raid sub, crypto, lvm, filesystem)
- has an FilesystemLabel
- has a Harddisk
- has a closed state
- has partitioning information (partition number)
- has raid information (raid type, raid number, raid total)
Content¶
Contents on a filesystem
- has a file
- may have one FilesystemLabel
HarddiskLabel¶
Label for Harddisks containing the same Filesystems
- has a code
- may have many Harddisks
FilesystemLabel¶
Label for Filesystems containing the same Contents
- has a code
- may have many Filesystems
- may have many Contents
Checksum¶
Checksum, e.g. sha256
- has a timestamp
- has a begin (first Byte)
- has an end (last Byte)
- has an algorithm
HarddiskTest¶
Integrity tests of harddisks
- has a timestamp
- has a user, which performed the test
- has a health state (sane + error for each checksum of Harddisk and Filesystem)
Identification with uuids¶
- A uuid is a Universally Unique Identifier
- Harddisks, Filesystems and Content are identified with uuids
- A Harddisk is identified with a combination of the uuids
- Host
- Harddisk
- A Filesystem is identified with with a combination of the uuids
- Partition
- Raid
- Raid Sub
- Crypto
- LVM
- Filesystem
- The uuids of a Harddisk and a contained Filesystem are strictly hierarchical:
- Host > Harddisk > Partition > Raid > Raid Sub > Crypto > LVM > Filesystem
- A Content is identified with exactly one uuid
Integrity tests with checksums¶
- When a Harddisk is finalized, the follwing Checksums are saved into the database
- Checksum of the Harddisk: Harddisk
- Checksums of the Filesystem: Partition, Raid, Raid Sub, Crypto, LVM, Filesystem
- For each Content the following Checksums are saved to database
- Checksum for the whole file
- Checksums for each upload chunk
- All checksums are additionally saved on the harddisk as second indepentend source
- Harddisk/Filesystem: on a metadata partition
- File/Chunks: on the filesystem
- A checksum of the metadata partition is saved only on the harddisk
- Regularly schedulled (e.g. biannual) integrity tests ensure the logterm integrity of all Harddisks
- For each test
- Check of uuids
- Check of checksum of metadata partition
- Activation of crypto and raid
- Check of checksums of filesystems /dev/disk/by-uuid/...
- On error:
- Feedback Admin
- Write HarddiskTest (state: error_filesystem)
- Check of Checksumms of all files
- Write HarddiskTest (state: sane)
- An error might be tracked down to the resolution of the uploads chunks, if neccessary
- For each test
Archiving of the files¶
- The files associated with a Content are stored on a filesystem
- Each file is named after the Content Uuid:
UUID
- Each file is associated with two other files by convention:
UUID.checksum
: checksum of the whole fileUUID.checksums
: checksums of all upload chunks
- Each file is stored in the folder
./UUID[0]/UUID[1]/
- Folderstructure for optimized filesystemaccess
- No semantic information (e.g. USER) to avoid the need for corrections
- The full syntax for a file is
./UUID[0]/UUID[1]/UUID(.checksum(s))
- Examples of file paths
./3/5/35f0c169-6594-4bf3-b285-451d2aa8c61e
Orchestration of the archiving¶
- Files on an intermediate storage may be archived in many storehouses
- The URLs to the storehouse machines are stored in STOREHOUSECODE target files
- The mappings of all intermediate storages to storehouses are administered on an central orchestration machine
- Syntax:
./STORAGE/STOREHOUSECODE
- Example:
./storage001/DÜ1
- Syntax:
- The mapping of a single intermediate storage to storehouses are mirrored to this intermediate storage
- Syntax:
./storage/targets/STOREHOUSECODE
- Example:
./storage/targets/DÜ1
- Syntax:
- The files on the intermediate storage are synchronized with the files on the orchestration machine for orchestration
Human readable label sticker¶
- For each Harddisk, a label sticker may be generated
- Header:
PURPOSE-STOREHOUSECODE-CONTAINERLABELCODE (RAIDTYPE: RAIDNUMBER/RAIDTOTAL)
PURPOSE
: "UCR" ("User Content Repertoire")STOREHOUSECODE
: Code of the Storehouse, e.g. "DÜ1" for the first storehouse in DüsseldorfCONTAINERLABELCODE
: Incremental, padded to 5 digits, e.g. 00001RAIDTYPE
: Typ of the raidRAIDNUMBER
: number of the harddisk in the raidRAIDTOTAL
: total number of harddisks in the raid
- Details: List of
ARCHIVELABELCODE
sARCHIVELABELCODE
: Label of the Archive
Sort¶
- Only more liberal licenses should be allowed.