Specification » History » Version 4

Version 3 (Alexander Blum, 12/10/2019 07:15 PM) → Version 4/6 (Alexander Blum, 12/10/2019 07:16 PM)

{{toc}}

# General

## Object Processing

### Commits

* Web users provide the system with metadata on objects as information basis for the distribution of money
* The system needs to balance the need of web users to change the metadata and the stability of the information basis
* Artists, Releases, Creations have to be committed by the web users as a conscious act
* No data is published before a commit
* Web users are able to distinguish public/private fields after a commit
* A commit of a set of objects (all fields, public fields) is saved as rendered text for evidence
* Creations do not need to have content to be committable
* Content with a relation to a Creation is autocommitted, when the creation is committed
* Before a commit, web users may edit the object freely
* Committing an object implies committing all objects down the hierarchy (see [[Workflows#Licenser-Cascades|Cascades]])
* After a commit, web users may
* edit only data not relevant for the distribution or object to frequent changes:
* adding/removing members of an Artist
* adding a Release to a Creation
* changing Release metadata
* trigger a dispute request for changing relevant data:
* deleting an object (implies creation of duplicate for references)
* adding/removing a contributor to a Creation
* adding/removing a original/derivative Creation
* adding/removing licenses to Releases of Creations
* adding/removing the Content assigned to a Creation
* removing a Release from a Creation
* After a commit, administrators may
* revise the object(s)
* reject the object(s)
* A successful claim dispute (see [[Workflows#Commit]]) causes the object to be uncommitted
* Uncommitted objects should be recommitted again as soon as possible by the web user
* if the artist name or creation title is changed on recommit, all admins of referencing objects are informed via email/webinterface
* if a referenced object is deleted before recommit, a duplicate for all references is created and referenced instead
* Created / Uncommitted objects
* are promoted on the web user dashboard
* excluded from searches (via add/list/etc), except
* web users can add own uncommitted objects to other own uncommitted objects
* web users can see but not add own uncommitted objects to other committed objects

### Disputes

* A dispute is the process of mediation in a conflict between web users
* A dispute has a
* code: Sequence
* state: Selection (requested, assigned, resolved)
* case: Selection (list of usecases?)
* object: Reference (to the disputed objects)
* assignee: User
* request_party: Party
* request_text: Text (user statement)
* request_time: DateTime
* resolved_time: DateTime
* comments: Comment (many comments by administrators)
* A dispute comment has a
* dispute: Dispute
* text: Text
* time: DateTime
* A dispute may be triggered by a web user on several occasions
* The web user [[Specification#Claims|claims]] ownership of an already claimed object
* The web user claims authorship of a Content marked as a duplicate
* The web user requests a change/deletion of [[Specification#Commits|commited]] content
* A dispute can be requested, assigned and resolved (see [[Workflows#Dispute]])
* A dispute is requested via a dispute form including
* The disputed object
* The issue category
* The user motivation
* A dispute is handled by an assignee
* The assignee can add many comments
* The assignee can mark the dispute as resolved



# Licensing

## Tariff System

* A creation can have several tariff categories represented by different collecting societies


## API

# Repertoire

## Rights Management

* A rightsholder
* must hold a right
* Copyright
* Ancillary Copyright
* must have an object to which the right belongs
* Creation
* Release
* must have a contribution depending on the object and right
* Creation
* Copyright
* Lyrics
* Composition
* Ancillary Copyright:
* Instrument
* Production
* Mixing
* Mastering
* Release
* Copyright
* Artwork
* Text
* Layout
* Ancillary Copyright
* Production
* Mixing
* Mastering
* may have a start and end date
* may be restricted to a territory
* may have a successor
* may be represented by a collecting society
* may have a list of instruments for Creation -> Ancillary Copyright -> Instrument
* Rightsholder subjects for creations and releases are Artists

## Object processing

### Claims

* Artist, Releases, and Creations are unclaimed, offered or claimed
* Unclaimed objects are objects, which don't belong to a web user
* Offered objects are objects, which might belong to a web user and are offered to be claimed by the web user (e.g. on registration)
* Claimed objects are objects, which "belong" to at least one web user
* Web users can (see [[Workflows#Claim]]) claim
* unclaimed and offered objects in general
* a solo artist for the current web user
* a group artist for a solo artist
* a compilation release for a solo artist (role: producer)
* a split/artist release for an artist
* a creation for an artist
* A claim implies a request for admin rights, where applicable
* Revised objects are visually promoted on the object list/details
* Unclaiming/Claiming/Revising an object implies unclaiming/claiming/revising all objects down the hierarchy (see [[Workflows#Creative-Cascades|Cascades]])
* Claiming an unclaimed object results in an uncommitted object
* Claiming a claimed object
* results in a [[Specification#Disputes|dispute]]
* may result in in an uncommitted object, if the disputing party proves to be right

### Foreign objects

* During object creation, a web user may create several new, [[Specification#Claims|unclaimed]] foreign objects
* When a web user creates a new Artist, he may create many member Artists specified by a name and an email
* When a web user creates a new Creation, he may create
* many contributor Artists specified by group (yes/no), a name and an email
* many original/derivative Creations specified by the creation and the artist name, resulting in
* a new/referenced Artist object: artist name
* a new Creation object: Artist
* many track Creations specified by the creation and the artist name, resulting in
* a new/referenced Artist object: artist name
* a new Creation object: Artist
* Foreign objects are auto [[Specification#Commits|commited]]
* Foreign objects may be added by others
* resulting in a duplicate for information separation
* referencing the duplicated foreign object for deduplication
* Foreign objects are editable
* by the web user, which created the foreign object, if
* the object was created by the web user
* the object is not [[Specification#Claims|claimed]]
* the object was not part of a distribution, yet
* by the object admin of the object, which the foreign object was created for

## File Processing

### Intermediate storage for archive

* Files are stored on an intermediate file server
* For every stage of a file there is a corresponding folder `STAGE`
* For each change of processing state, all files are moved into the next `STAGE` folder
* For each folder `STAGE` there is a folder per `USER`
* Folderstructure for optimized filesystemaccess
* Semantic information for manual administrative interventions
* The filenames of the files are
* the `HASH` of the filename for temporary uploads
* the `UUID` of the content in the database for all other stages
* For each file `[UUID|HASH]` there is a file `[UUID|HASH].checksums`
* Content: Checksums for each single upload chunk
* Format: CSV (begin, end, algorithm, checksum)
* For each file `[UUID|HASH]` there is a file `UUID.checksum`
* Content: Checksum of the whole file
* Format: Plain text
* The full syntax for a file is
* `./STAGE/USER/UUID(.checksum(s))`
* `./temporary/USER/HASH(.checksum(s))`
* Examples of file paths
* `./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430`
* `./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430.checksums`
* `./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums`
* `./previewed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksum`
* `./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums`
* `./fingerprinted/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./dropped/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./rejected/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`

### Permanent storage for user content

* User content is stored on a permanent file server
* For every content type of a file there is a corresponding folder `CONTENTYPE` (e.g. 'previews')
* For each folder `CONTENTYPE` there is a folder per `USER`
* Folderstructure for optimized filesystemaccess
* Semantic information for manual administrative interventions
* The filenames of the files are
* the `UUID` of the content in the database
* The full syntax for a file is
* `./CONTENTTYPE/USER/UUID`
* Examples of file paths
* `./previews/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`
* `./excerpts/4/35f0c169-6594-4bf3-b285-451d2aa8c61e`

### Stages

* For an overview of the workflow, see [[Workflows#File-Processing|Workflow]]
* Each stage of file processing results in a corresponding processing state
1. Upload *(uploaded)*
2. Process
* Preview *(previewed)*
* Checksum *(checksummed)*
* Fingerprint *(fingerprinted)*
5. Drop *(dropped)*
6. Archive *(archived)*
* There is a special state for user requested deletions *(tobedeleted)*
* Before a file is being processed at any stage, a file <UUID>.lock will be created to signal other processes to skip the file. The lockfile will be deleted after the file <UUID> has been moved to the next stage folder.

#### Upload a file

* Users are allowed to upload content only, if it belongs to them
* Upload of a file in chunks
* The chunk size is 1MiB (1024*1024 Byte)
* The chunk position is given by the header `Content-Range` (chunk start, chunk end, total size)
* A HASH of the user given filename is used as temporary filename
* The file is stored in `./storage/temporary/USER/HASH`
* For each uploaded chunk
* A checksum of the chunk is calculated, while the chunk is still in RAM
* The checksum is appended to `./storage/temporary/USER/HASH.checksums` (CSV: begin, end, algorithm, checksum)
* The database is queried for a duplicate of the checksum for early abuse detection
* The chunk checksum collisions are tracked in the user session
* Certain checksums are whitelisted (e.g. silence in different formats with/without headers)
* Above a configurable threshold (e.g. 150), the user upload is restricted temporarily
* The threshold violations are tracked in the database
* The chunk is appended to `./storage/temporary/USER/HASH`
* When the upload is finished
* A Uuid is generated
* If the validation of the fileextension or mimetype fails, further processing is aborted
* The files are moved to `./storage/rejected/USER/UUID(.checksums)`
* The Content is saved to the database (processing state: rejected, rejection reason: format_error, path)
* The files are moved to `./storage/uploaded/USER/UUID(.checksums)`
* The Content is saved to the database (processing state: uploaded, path, storage_hostname)
* The Checksums are saved to the database

#### Create a preview

* For each file in `./storage/uploaded`
* The file is locked during processing
* If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
* The files are moved to `./storage/unknown/(USER/)UUID(.checksums)`
* The Content is updated, if possible (processing state: unkown, path)
* An excerpt for analysis and statistics is taken and stored in `./content/excerpt/UUID[1]/UUID[2]/UUID`
* Lenght: 60 seconds out of the middle of the file
* Quality: Minimum for fingerprint recognition (11025 Hz, 16 bit, mono)
* A preview is created and stored in `./content/previews/UUID[1]/UUID[2]/UUID`
* Quality: Minimum for acceptable user experience (12bit, mono, 16kHz, ogg)
* Configuration: fade in, fade out, segment interval, segment length, segment crossfade
* If preview or excerpt creation fails, further processing is aborted
* The files are moved to `./storage/rejected/USER/UUID(.checksums)`
* The Content is rejected (rejection reason: format_error, path)
* The files are moved to `./storage/previewed/USER/UUID(.checksums)`
* The audio properties are saved to the database (length, channels, sample rate, sample width)
* The Content is updated (processing state: previewed, path)

#### Calculate a checksum

* For each file in `./storage/previewed`
* The file is locked during processing
* If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
* The files are moved to `./storage/unknown/(USER/)UUID(.checksums)`
* The Content is updated, if possible (processing state: unkown, path)
* Checksums for each chunk of 1MiB (1024*1024 Byte) are calculated and saved to the database, if not present
* A checksum for the whole file is calculated and stored in `./storage/previewed/USER/UUID.checksum`
* If the checksum is already present in the database, further processing is aborted
* The preview and excerpt is deleted
* The files are moved to `./storage/rejected/USER/UUID(.checksum(s))`
* The Content is rejected (rejection reason: checksum_collision, duplicate of: Content, path)
* The files are moved to `./storage/checksummed/USER/UUID(.checksum(s))`
* The Checksum is saved to the database
* The Content is updated (processing state: checksummed, path)

#### Calculate a fingerprint

* For each file in `./storage/checksummed`
* The file is locked during processing
* If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
* The files are moved to `./storage/unknown/(USER/)UUID(.checksums)`
* The Content is updated, if possible (processing state: unkown, path)
* The fingerprint is created
* If the fingerprint is already present in the database, further processing is aborted
* The preview and excerpt is deleted
* The files are moved to `./storage/rejected/USER/UUID(.checksum(s))`
* The Content is rejected (rejection reason: fingerprint_collision, duplicate of: Content, path)
* The fingerprint is ingested into the database (primary key: Content Uuid)
* A FingerprintLog is saved to the database (timestamp, user, algorithm, version)
* The files are moved to `./storage/fingerprinted/USER/UUID(.checksum(s))`
* The Content is updated (processing state: fingerprinted, path)

#### Drop a file

* For each file in `./storage/fingerprinted`
* The file is locked during processing
* If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
* The files are moved to `./storage/unknown/(USER/)UUID(.checksums)`
* The Content is updated, if possible (processing state: unkown, path)
* The files are moved to `./storage/dropped/USER/UUID(.checksum(s))`
* The Content is updated (processing state: dropped, path)

#### Archive a file

* For further details, see
* [[Specification#Archiving-of-the-files|Archiving of the files]]
* [[Specification#Orchestration-of-the-archiving|Orchestration of the archiving]]
* For each file in `./storage/dropped`
* The file is locked during processing
* If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
* The files are moved to `./storage/unknown/(USER/)UUID(.checksum(s))`
* The Content is updated, if possible (processing state: unkown)
* The files are moved to `./storage/dropped.closed/UUID(.checksum(s))`
* The Storehouse target `LOCATION`s are copied from `./storage/targets/` to `./storage/dropped.closed.targets/`
* Until there is no `LOCATION` in `./storage/dropped.closed.targets/` left
* Until the checksums of the whole files are valid on the target machine
* The files `UUID(.checksum(s))` in `./storage/dropped.closed/` are copied to `LOCATION:./UUID[1]/UUID[2]/`
* The target location `./storage/dropped.closed.targets/LOCATION` is deleted
* The target location folder `./storage/dropped.closed.targets/` is deleted
* The Content is updated (processing state: archived, archive: Archive, path)

#### Delete a file

* A user may request the deletion of uncommited Content
* A corresponding files might be only deleted, if the content is in the state 'uploaded' or 'rejected'
* Further the request for the deletion needs to trigger the deletion of
* the Content object
* the preview and excerpt file
* the entry in the echoprint server referencing the deleted Content object (maybe decentralized)

## Archiving

### Objects

#### Archiving

* The archiving process is coordinated via objects in the Database shown here [[Databasemodels#Archiving|as diagram]]
* Physical: [[Specification#Storehouse|Storehouse]], [[Specification#Harddisk|Harddisk]], [[Specification#Filesystem|Filesystem]], [[Specification#Content|Content]]
* Logical: [[Specification#HarddiskLabel|HarddiskLabel]], [[Specification#FilesystemLabel|FilesystemLabel]]
* Integrity: [[Specification#Checksum|Checksum]], [[Specification#HarddiskTest|HarddiskTest]]
* The archiving objects are administered via tryton client and [[Scripts]]

#### Storehouse

Physical storage location

* has a code
* has an admin user
* may have a detailed description
* may have many Harddisks

#### Harddisk

Physical harddisks

* has uuids (host, harddisk)
* has checksums (harddisk)
* has a HarddiskLabel
* has a Storehouse
* has a version (for Harddisks with the same HarddiskLabel per Storehouse)
* has a closed state
* has a on-/offline state
* has an usage state
* has a creator (tryton user)
* has a function to generate a label sticker
* may have a local position (e.g. "Shelf1")
* may have many Filesystems
* may have many HarddiskTests
* may have a health state (result of the tests)

#### Filesystem

Filesystems on a harddisk

* has uuids (partition, raid, raid sub, crypto, lvm, filesystem)
* has checksums (partition, raid, raid sub, crypto, lvm, filesystem)
* has an FilesystemLabel
* has a Harddisk
* has a closed state
* has partitioning information (partition number)
* has raid information (raid type, raid number, raid total)

#### Content

Contents on a filesystem

* has a file
* may have one FilesystemLabel

#### HarddiskLabel

Label for Harddisks containing the same Filesystems

* has a code
* may have many Harddisks

#### FilesystemLabel

Label for Filesystems containing the same Contents

* has a code
* may have many Filesystems
* may have many Contents

#### Checksum

Checksum, e.g. sha256

* has a timestamp
* has a begin (first Byte)
* has an end (last Byte)
* has an algorithm

#### HarddiskTest

Integrity tests of harddisks

* has a timestamp
* has a user, which performed the test
* has a health state (sane + error for each checksum of Harddisk and Filesystem)

### Identification with uuids

* A uuid is a Universally Unique Identifier
* Harddisks, Filesystems and Content are identified with uuids
* A Harddisk is identified with a combination of the uuids
* Host
* Harddisk
* A Filesystem is identified with with a combination of the uuids
* Partition
* Raid
* Raid Sub
* Crypto
* LVM
* Filesystem
* The uuids of a Harddisk and a contained Filesystem are strictly hierarchical:
* Host > Harddisk > Partition > Raid > Raid Sub > Crypto > LVM > Filesystem
* A Content is identified with exactly one uuid

### Integrity tests with checksums

* When a Harddisk is finalized, the follwing Checksums are saved into the database
* Checksum of the Harddisk: Harddisk
* Checksums of the Filesystem: Partition, Raid, Raid Sub, Crypto, LVM, Filesystem
* For each Content the following Checksums are saved to database
* Checksum for the whole file
* Checksums for each upload chunk
* All checksums are additionally saved on the harddisk as second indepentend source
* Harddisk/Filesystem: on a metadata partition
* File/Chunks: on the filesystem
* A checksum of the metadata partition is saved only on the harddisk
* Regularly schedulled (e.g. biannual) integrity tests ensure the logterm integrity of all Harddisks
* For each test
* Check of uuids
* Check of checksum of metadata partition
* Activation of crypto and raid
* Check of checksums of filesystems /dev/disk/by-uuid/...
* On error:
* Feedback Admin
* Write HarddiskTest (state: error_filesystem)
* Check of Checksumms of all files
* Write HarddiskTest (state: sane)
* An error might be tracked down to the resolution of the uploads chunks, if neccessary

### Archiving of the files

* The files associated with a Content are stored on a filesystem
* Each file is named after the Content Uuid: `UUID`
* Each file is associated with two other files by convention:
* `UUID.checksum`: checksum of the whole file
* `UUID.checksums`: checksums of all upload chunks
* Each file is stored in the folder `./UUID[0]/UUID[1]/`
* Folderstructure for optimized filesystemaccess
* No semantic information (e.g. USER) to avoid the need for corrections
* The full syntax for a file is
* `./UUID[0]/UUID[1]/UUID(.checksum(s))`
* Examples of file paths
* `./3/5/35f0c169-6594-4bf3-b285-451d2aa8c61e`

### Orchestration of the archiving

* Files on an [[Specification#Intermediate-storage-for-archive|intermediate storage]] may be archived in many storehouses
* The URLs to the storehouse machines are stored in STOREHOUSECODE target files
* The mappings of all intermediate storages to storehouses are administered on an central orchestration machine
* Syntax: `./STORAGE/STOREHOUSECODE`
* Example: `./storage001/DÜ1`
* The mapping of a single intermediate storage to storehouses are mirrored to this intermediate storage
* Syntax: `./storage/targets/STOREHOUSECODE`
* Example: `./storage/targets/DÜ1`
* The files on the intermediate storage are synchronized with the files on the orchestration machine for orchestration

### Human readable label sticker

* For each Harddisk, a label sticker may be generated
* Header: `PURPOSE-STOREHOUSECODE-CONTAINERLABELCODE (RAIDTYPE: RAIDNUMBER/RAIDTOTAL)`
* `PURPOSE`: "UCR" ("User Content Repertoire")
* `STOREHOUSECODE`: Code of the Storehouse, e.g. "DÜ1" for the first storehouse in Düsseldorf
* `CONTAINERLABELCODE`: Incremental, padded to 5 digits, e.g. 00001
* `RAIDTYPE`: Typ of the raid
* `RAIDNUMBER`: number of the harddisk in the raid
* `RAIDTOTAL`: total number of harddisks in the raid
* Details: List of `ARCHIVELABELCODE`s
* `ARCHIVELABELCODE`: Label of the Archive

# Sort

* Only more liberal licenses should be allowed.