General

Object Processing

Commits

  • Web users provide the system with metadata on objects as information basis for the distribution of money
  • The system needs to balance the need of web users to change the metadata and the stability of the information basis
  • Artists, Releases, Creations have to be committed by the web users as a conscious act
    • No data is published before a commit
    • Web users are able to distinguish public/private fields after a commit
    • A commit of a set of objects (all fields, public fields) is saved as rendered text for evidence
  • Creations do not need to have content to be committable
  • Content with a relation to a Creation is autocommitted, when the creation is committed
  • Before a commit, web users may edit the object freely
  • Committing an object implies committing all objects down the hierarchy (see Cascades)
  • After a commit, web users may
    • edit only data not relevant for the distribution or object to frequent changes:
      • adding/removing members of an Artist
      • adding a Release to a Creation
      • changing Release metadata
    • trigger a dispute request for changing relevant data:
      • deleting an object (implies creation of duplicate for references)
      • adding/removing a contributor to a Creation
      • adding/removing a original/derivative Creation
      • adding/removing licenses to Releases of Creations
      • adding/removing the Content assigned to a Creation
      • removing a Release from a Creation
  • After a commit, administrators may
    • revise the object(s)
    • reject the object(s)
  • A successful claim dispute (see Workflows) causes the object to be uncommitted
  • Uncommitted objects should be recommitted again as soon as possible by the web user
    • if the artist name or creation title is changed on recommit, all admins of referencing objects are informed via email/webinterface
    • if a referenced object is deleted before recommit, a duplicate for all references is created and referenced instead
  • Created / Uncommitted objects
    • are promoted on the web user dashboard
    • excluded from searches (via add/list/etc), except
      • web users can add own uncommitted objects to other own uncommitted objects
      • web users can see but not add own uncommitted objects to other committed objects

Disputes

  • A dispute is the process of mediation in a conflict between web users
  • A dispute has a
    • code: Sequence
    • state: Selection (requested, assigned, resolved)
    • case: Selection (list of usecases?)
    • object: Reference (to the disputed objects)
    • assignee: User
    • request_party: Party
    • request_text: Text (user statement)
    • request_time: DateTime
    • resolved_time: DateTime
    • comments: Comment (many comments by administrators)
  • A dispute comment has a
    • dispute: Dispute
    • text: Text
    • time: DateTime
  • A dispute may be triggered by a web user on several occasions
    • The web user claims ownership of an already claimed object
    • The web user claims authorship of a Content marked as a duplicate
    • The web user requests a change/deletion of commited content
  • A dispute can be requested, assigned and resolved (see Workflows)
  • A dispute is requested via a dispute form including
    • The disputed object
    • The issue category
    • The user motivation
  • A dispute is handled by an assignee
  • The assignee can add many comments
  • The assignee can mark the dispute as resolved

Licensing

Tariff System

  • A creation can have several tariff categories represented by different collecting societies

API

Repertoire

Rights Management

  • A rightsholder
    • must hold a right
      • Copyright
      • Ancillary Copyright
    • must have an object to which the right belongs
      • Creation
      • Release
    • must have a contribution depending on the object and right
      • Creation
        • Copyright
          • Lyrics
          • Composition
        • Ancillary Copyright:
          • Instrument
          • Production
          • Mixing
          • Mastering
      • Release
        • Copyright
          • Artwork
          • Text
          • Layout
        • Ancillary Copyright
          • Production
          • Mixing
          • Mastering
    • may have a start and end date
    • may be restricted to a territory
    • may have a successor
    • may be represented by a collecting society
    • may have a list of instruments for Creation -> Ancillary Copyright -> Instrument
  • Rightsholder subjects for creations and releases are Artists

Object processing

Claims

  • Artist, Releases, and Creations are unclaimed, offered or claimed
  • Unclaimed objects are objects, which don't belong to a web user
  • Offered objects are objects, which might belong to a web user and are offered to be claimed by the web user (e.g. on registration)
  • Claimed objects are objects, which "belong" to at least one web user
  • Web users can (see Workflows) claim
    • unclaimed and offered objects in general
    • a solo artist for the current web user
    • a group artist for a solo artist
    • a compilation release for a solo artist (role: producer)
    • a split/artist release for an artist
    • a creation for an artist
  • A claim implies a request for admin rights, where applicable
  • Revised objects are visually promoted on the object list/details
  • Unclaiming/Claiming/Revising an object implies unclaiming/claiming/revising all objects down the hierarchy (see Cascades)
  • Claiming an unclaimed object results in an uncommitted object
  • Claiming a claimed object
    • results in a dispute
    • may result in in an uncommitted object, if the disputing party proves to be right

Foreign objects

  • During object creation, a web user may create several new, unclaimed foreign objects
  • When a web user creates a new Artist, he may create many member Artists specified by a name and an email
  • When a web user creates a new Creation, he may create
    • many contributor Artists specified by group (yes/no), a name and an email
    • many original/derivative Creations specified by the creation and the artist name, resulting in
      • a new/referenced Artist object: artist name
      • a new Creation object: Artist
    • many track Creations specified by the creation and the artist name, resulting in
      • a new/referenced Artist object: artist name
      • a new Creation object: Artist
  • Foreign objects are auto commited
  • Foreign objects may be added by others
    • resulting in a duplicate for information separation
    • referencing the duplicated foreign object for deduplication
  • Foreign objects are editable
    • by the web user, which created the foreign object, if
      • the object was created by the web user
      • the object is not claimed
      • the object was not part of a distribution, yet
    • by the object admin of the object, which the foreign object was created for

File Processing

Intermediate storage for archive

  • Files are stored on an intermediate file server
  • For every stage of a file there is a corresponding folder STAGE
  • For each change of processing state, all files are moved into the next STAGE folder
  • For each folder STAGE there is a folder per USER
    • Folderstructure for optimized filesystemaccess
    • Semantic information for manual administrative interventions
  • The filenames of the files are
    • the HASH of the filename for temporary uploads
    • the UUID of the content in the database for all other stages
  • For each file [UUID|HASH] there is a file [UUID|HASH].checksums
    • Content: Checksums for each single upload chunk
    • Format: CSV (begin, end, algorithm, checksum)
  • For each file [UUID|HASH] there is a file UUID.checksum
    • Content: Checksum of the whole file
    • Format: Plain text
  • The full syntax for a file is
    • ./STAGE/USER/UUID(.checksum(s))
    • ./temporary/USER/HASH(.checksum(s))
  • Examples of file paths
    • ./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430
    • ./temporary/4/82d5582443e9f8d35d3ec798662255e46e9e8138c290b626a74a3bee9382d430.checksums
    • ./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./uploaded/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums
    • ./previewed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksum
    • ./checksummed/4/35f0c169-6594-4bf3-b285-451d2aa8c61e.checksums
    • ./fingerprinted/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./dropped/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./rejected/4/35f0c169-6594-4bf3-b285-451d2aa8c61e

Permanent storage for user content

  • User content is stored on a permanent file server
  • For every content type of a file there is a corresponding folder CONTENTYPE (e.g. 'previews')
  • For each folder CONTENTYPE there is a folder per USER
    • Folderstructure for optimized filesystemaccess
    • Semantic information for manual administrative interventions
  • The filenames of the files are
    • the UUID of the content in the database
  • The full syntax for a file is
    • ./CONTENTTYPE/USER/UUID
  • Examples of file paths
    • ./previews/4/35f0c169-6594-4bf3-b285-451d2aa8c61e
    • ./excerpts/4/35f0c169-6594-4bf3-b285-451d2aa8c61e

Stages

  • For an overview of the workflow, see Workflow
  • Each stage of file processing results in a corresponding processing state
    1. Upload (uploaded)
    2. Process
      • Preview (previewed)
      • Checksum (checksummed)
      • Fingerprint (fingerprinted)
    3. Drop (dropped)
    4. Archive (archived)
  • There is a special state for user requested deletions (tobedeleted)
  • Before a file is being processed at any stage, a file .lock will be created to signal other processes to skip the file. The lockfile will be deleted after the file has been moved to the next stage folder.

Upload a file

  • Users are allowed to upload content only, if it belongs to them
  • Upload of a file in chunks
    • The chunk size is 1MiB (1024*1024 Byte)
    • The chunk position is given by the header Content-Range (chunk start, chunk end, total size)
    • A HASH of the user given filename is used as temporary filename
    • The file is stored in ./storage/temporary/USER/HASH
  • For each uploaded chunk
    • A checksum of the chunk is calculated, while the chunk is still in RAM
    • The checksum is appended to ./storage/temporary/USER/HASH.checksums (CSV: begin, end, algorithm, checksum)
    • The database is queried for a duplicate of the checksum for early abuse detection
      • The chunk checksum collisions are tracked in the user session
      • Certain checksums are whitelisted (e.g. silence in different formats with/without headers)
      • Above a configurable threshold (e.g. 150), the user upload is restricted temporarily
      • The threshold violations are tracked in the database
    • The chunk is appended to ./storage/temporary/USER/HASH
  • When the upload is finished
    • A Uuid is generated
    • If the validation of the fileextension or mimetype fails, further processing is aborted
      • The files are moved to ./storage/rejected/USER/UUID(.checksums)
      • The Content is saved to the database (processing state: rejected, rejection reason: format_error, path)
    • The files are moved to ./storage/uploaded/USER/UUID(.checksums)
    • The Content is saved to the database (processing state: uploaded, path, storage_hostname)
    • The Checksums are saved to the database

Create a preview

  • For each file in ./storage/uploaded
    • The file is locked during processing
    • If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
      • The files are moved to ./storage/unknown/(USER/)UUID(.checksums)
      • The Content is updated, if possible (processing state: unkown, path)
    • An excerpt for analysis and statistics is taken and stored in ./content/excerpt/UUID[1]/UUID[2]/UUID
      • Lenght: 60 seconds out of the middle of the file
      • Quality: Minimum for fingerprint recognition (11025 Hz, 16 bit, mono)
    • A preview is created and stored in ./content/previews/UUID[1]/UUID[2]/UUID
      • Quality: Minimum for acceptable user experience (12bit, mono, 16kHz, ogg)
      • Configuration: fade in, fade out, segment interval, segment length, segment crossfade
    • If preview or excerpt creation fails, further processing is aborted
      • The files are moved to ./storage/rejected/USER/UUID(.checksums)
      • The Content is rejected (rejection reason: format_error, path)
    • The files are moved to ./storage/previewed/USER/UUID(.checksums)
    • The audio properties are saved to the database (length, channels, sample rate, sample width)
    • The Content is updated (processing state: previewed, path)

Calculate a checksum

  • For each file in ./storage/previewed
    • The file is locked during processing
    • If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
      • The files are moved to ./storage/unknown/(USER/)UUID(.checksums)
      • The Content is updated, if possible (processing state: unkown, path)
    • Checksums for each chunk of 1MiB (1024*1024 Byte) are calculated and saved to the database, if not present
    • A checksum for the whole file is calculated and stored in ./storage/previewed/USER/UUID.checksum
    • If the checksum is already present in the database, further processing is aborted
      • The preview and excerpt is deleted
      • The files are moved to ./storage/rejected/USER/UUID(.checksum(s))
      • The Content is rejected (rejection reason: checksum_collision, duplicate of: Content, path)
    • The files are moved to ./storage/checksummed/USER/UUID(.checksum(s))
    • The Checksum is saved to the database
    • The Content is updated (processing state: checksummed, path)

Calculate a fingerprint

  • For each file in ./storage/checksummed
    • The file is locked during processing
    • If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
      • The files are moved to ./storage/unknown/(USER/)UUID(.checksums)
      • The Content is updated, if possible (processing state: unkown, path)
    • The fingerprint is created
    • If the fingerprint is already present in the database, further processing is aborted
      • The preview and excerpt is deleted
      • The files are moved to ./storage/rejected/USER/UUID(.checksum(s))
      • The Content is rejected (rejection reason: fingerprint_collision, duplicate of: Content, path)
    • The fingerprint is ingested into the database (primary key: Content Uuid)
    • A FingerprintLog is saved to the database (timestamp, user, algorithm, version)
    • The files are moved to ./storage/fingerprinted/USER/UUID(.checksum(s))
    • The Content is updated (processing state: fingerprinted, path)

Drop a file

  • For each file in ./storage/fingerprinted
    • The file is locked during processing
    • If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
      • The files are moved to ./storage/unknown/(USER/)UUID(.checksums)
      • The Content is updated, if possible (processing state: unkown, path)
    • The files are moved to ./storage/dropped/USER/UUID(.checksum(s))
    • The Content is updated (processing state: dropped, path)

Archive a file

  • For further details, see
  • For each file in ./storage/dropped
    • The file is locked during processing
    • If the associated Content (processing state, processing hostname, storage hostname) is not valid, further processing is aborted
      • The files are moved to ./storage/unknown/(USER/)UUID(.checksum(s))
      • The Content is updated, if possible (processing state: unkown)
    • The files are moved to ./storage/dropped.closed/UUID(.checksum(s))
    • The Storehouse target LOCATIONs are copied from ./storage/targets/ to ./storage/dropped.closed.targets/
    • Until there is no LOCATION in ./storage/dropped.closed.targets/ left
      • Until the checksums of the whole files are valid on the target machine
        • The files UUID(.checksum(s)) in ./storage/dropped.closed/ are copied to LOCATION:./UUID[1]/UUID[2]/
      • The target location ./storage/dropped.closed.targets/LOCATION is deleted
    • The target location folder ./storage/dropped.closed.targets/ is deleted
    • The Content is updated (processing state: archived, archive: Archive, path)

Delete a file

  • A user may request the deletion of uncommited Content
  • A corresponding files might be only deleted, if the content is in the state 'uploaded' or 'rejected'
  • Further the request for the deletion needs to trigger the deletion of
    • the Content object
    • the preview and excerpt file
    • the entry in the echoprint server referencing the deleted Content object (maybe decentralized)

Archiving

Objects

Archiving

Storehouse

Physical storage location

  • has a code
  • has an admin user
  • may have a detailed description
  • may have many Harddisks

Harddisk

Physical harddisks

  • has uuids (host, harddisk)
  • has checksums (harddisk)
  • has a HarddiskLabel
  • has a Storehouse
  • has a version (for Harddisks with the same HarddiskLabel per Storehouse)
  • has a closed state
  • has a on-/offline state
  • has an usage state
  • has a creator (tryton user)
  • has a function to generate a label sticker
  • may have a local position (e.g. "Shelf1")
  • may have many Filesystems
  • may have many HarddiskTests
  • may have a health state (result of the tests)

Filesystem

Filesystems on a harddisk

  • has uuids (partition, raid, raid sub, crypto, lvm, filesystem)
  • has checksums (partition, raid, raid sub, crypto, lvm, filesystem)
  • has an FilesystemLabel
  • has a Harddisk
  • has a closed state
  • has partitioning information (partition number)
  • has raid information (raid type, raid number, raid total)

Content

Contents on a filesystem

  • has a file
  • may have one FilesystemLabel

HarddiskLabel

Label for Harddisks containing the same Filesystems

  • has a code
  • may have many Harddisks

FilesystemLabel

Label for Filesystems containing the same Contents

  • has a code
  • may have many Filesystems
  • may have many Contents

Checksum

Checksum, e.g. sha256

  • has a timestamp
  • has a begin (first Byte)
  • has an end (last Byte)
  • has an algorithm

HarddiskTest

Integrity tests of harddisks

  • has a timestamp
  • has a user, which performed the test
  • has a health state (sane + error for each checksum of Harddisk and Filesystem)

Identification with uuids

  • A uuid is a Universally Unique Identifier
  • Harddisks, Filesystems and Content are identified with uuids
  • A Harddisk is identified with a combination of the uuids
    • Host
    • Harddisk
  • A Filesystem is identified with with a combination of the uuids
    • Partition
    • Raid
    • Raid Sub
    • Crypto
    • LVM
    • Filesystem
  • The uuids of a Harddisk and a contained Filesystem are strictly hierarchical:
    • Host > Harddisk > Partition > Raid > Raid Sub > Crypto > LVM > Filesystem
  • A Content is identified with exactly one uuid

Integrity tests with checksums

  • When a Harddisk is finalized, the follwing Checksums are saved into the database
    • Checksum of the Harddisk: Harddisk
    • Checksums of the Filesystem: Partition, Raid, Raid Sub, Crypto, LVM, Filesystem
  • For each Content the following Checksums are saved to database
    • Checksum for the whole file
    • Checksums for each upload chunk
  • All checksums are additionally saved on the harddisk as second indepentend source
    • Harddisk/Filesystem: on a metadata partition
    • File/Chunks: on the filesystem
  • A checksum of the metadata partition is saved only on the harddisk
  • Regularly schedulled (e.g. biannual) integrity tests ensure the logterm integrity of all Harddisks
    • For each test
      • Check of uuids
      • Check of checksum of metadata partition
      • Activation of crypto and raid
      • Check of checksums of filesystems /dev/disk/by-uuid/...
      • On error:
        • Feedback Admin
        • Write HarddiskTest (state: error_filesystem)
        • Check of Checksumms of all files
      • Write HarddiskTest (state: sane)
    • An error might be tracked down to the resolution of the uploads chunks, if neccessary

Archiving of the files

  • The files associated with a Content are stored on a filesystem
  • Each file is named after the Content Uuid: UUID
  • Each file is associated with two other files by convention:
    • UUID.checksum: checksum of the whole file
    • UUID.checksums: checksums of all upload chunks
  • Each file is stored in the folder ./UUID[0]/UUID[1]/
    • Folderstructure for optimized filesystemaccess
    • No semantic information (e.g. USER) to avoid the need for corrections
  • The full syntax for a file is
    • ./UUID[0]/UUID[1]/UUID(.checksum(s))
  • Examples of file paths
    • ./3/5/35f0c169-6594-4bf3-b285-451d2aa8c61e

Orchestration of the archiving

  • Files on an intermediate storage may be archived in many storehouses
  • The URLs to the storehouse machines are stored in STOREHOUSECODE target files
  • The mappings of all intermediate storages to storehouses are administered on an central orchestration machine
    • Syntax: ./STORAGE/STOREHOUSECODE
    • Example: ./storage001/DÜ1
  • The mapping of a single intermediate storage to storehouses are mirrored to this intermediate storage
    • Syntax: ./storage/targets/STOREHOUSECODE
    • Example: ./storage/targets/DÜ1
  • The files on the intermediate storage are synchronized with the files on the orchestration machine for orchestration

Human readable label sticker

  • For each Harddisk, a label sticker may be generated
  • Header: PURPOSE-STOREHOUSECODE-CONTAINERLABELCODE (RAIDTYPE: RAIDNUMBER/RAIDTOTAL)
    • PURPOSE: "UCR" ("User Content Repertoire")
    • STOREHOUSECODE: Code of the Storehouse, e.g. "DÜ1" for the first storehouse in Düsseldorf
    • CONTAINERLABELCODE: Incremental, padded to 5 digits, e.g. 00001
    • RAIDTYPE: Typ of the raid
    • RAIDNUMBER: number of the harddisk in the raid
    • RAIDTOTAL: total number of harddisks in the raid
  • Details: List of ARCHIVELABELCODEs
    • ARCHIVELABELCODE: Label of the Archive

Sort

  • Only more liberal licenses should be allowed.