Datenbank #339

Add field 'uniqueness', 'most_similar_content', 'most_similar_track_artist', and 'most_similar_track_title' to table content

Added by Alexander Blum over 4 years ago. Updated about 2 years ago.

Status:ErledigtStart date:
Priority:HochDue date:
Assignee:Alexander Blum% Done:

100%

Category:-Estimated time:0.50 h
Target version:Repertoire 1) Testing phase I

Description

  • Decimal
  • max: 100
  • Description: Ration of match scores after and before ingestion

Related issues

Blocks collecting_society - Verarbeitung #273: Fingerprint test after ingest Erledigt 04/24/2017 05/08/2017

History

#1 Updated by Alexander Blum over 4 years ago

#2 Updated by Thomas Mielke over 4 years ago

  • Subject changed from Add field 'uniqueness' for content to Add field 'uniqueness' and 'most_similar_track' for content
  • Assignee set to Alexander Blum
  • Priority changed from Normal to Hoch

uniqueness as float
most_similar_track as track_id uuid string

#3 Updated by Alexander Blum over 4 years ago

  • Status changed from Neu to Feedback
  • Assignee changed from Alexander Blum to Thomas Mielke

most_similar_track as uuid string? not as relation to another content?

#4 Updated by Thomas Mielke over 4 years ago

  • Assignee changed from Thomas Mielke to Alexander Blum

I don't mind, doesn't make a big difference to me. If it makes sense, let it be a relation.

#5 Updated by Alexander Blum over 4 years ago

I don't mind, doesn't make a big difference to me. If it makes sense, let it be a relation.

It makes a difference for the tryton interface. Admins should not handle
or interpret numbers, but the semantic relationship to other contents.

#6 Updated by Alexander Blum over 4 years ago

  • Estimated time set to 0.50

#7 Updated by Thomas Mielke over 4 years ago

I'd still opt for the string. This would allow us to include EchoNest track IDs.

#8 Updated by Alexander Blum over 4 years ago

  • Assignee changed from Alexander Blum to Thomas Mielke

Thomas Mielke wrote:

EchoNest track IDs.

What is the exact purpose of the field?

I assume, the information of the most similiar track could be useful for the admins if there's a dispute. They wouldn't know what to do with that string.

#9 Updated by Thomas Mielke over 4 years ago

having a free string field allows more freedom when refering to other tracks. Processing of the information can be done intelligently then. I.e. if the form is "TR" + 16 alphanumeric characters can be queried at an EchoNest server while a UUID without '-' can be interpreted as a C3S content ID.

#10 Updated by Thomas Mielke over 4 years ago

  • Assignee changed from Thomas Mielke to Alexander Blum

#11 Updated by Alexander Blum over 4 years ago

  • Assignee changed from Alexander Blum to Thomas Mielke

most similiar track

I understand that, but there's also the good practise, that all fields should be handable via tryton client. A string even including semantics implicitly would be highly error-prone. You can't switch the relationship by a prefix. Either it's a relation to a Content and Tryton will offer search and everything, or it is a dumb string. Only a semantic relation to Content would be useful for humans.

If there are echonest tracks not covered by our database, wouldn't it be more reasonable to create the Creation in our database?
What is the echonest track id anyway? Is this id stable? I would not want to mix external information with our internal database "directly".
We had in mind, to add other identifications for a Content via a special table (including e.g. musicbrain id / discogs id / etc.).

uniqueness

Another question for uniqueness: do you use the "excerpt" to calculate this field? If yes, shouldn't we also defere the calculation of the ratio "before/after" to the external statistics module and just save the "before" value? As I understand, the "after" value might change after ingestion of other tracks. Via the statistics module, the current value could then be computed on a regular basis.

#12 Updated by Alexander Blum over 4 years ago

concerning uniquness: nevermind my previous question. the 'before' value depends also on the set of ingested tracks, so the only useful ratio is only just before/after ingestion

#13 Updated by Alexander Blum over 4 years ago

If there are echonest tracks not covered by our database, wouldn't it be more reasonable to create the Creation in our database?

Ok, this would not help, as most_similar_track is an attribute of Content.

#14 Updated by Thomas Mielke over 4 years ago

Well, then instead of 'most_similar_track', add other fields 'most_similar_track_artist' and 'most_similar_track_title', which I will fill, if it's an EchoNest ID, and 'most_similar_content' if it's a C3S content uuid.

#15 Updated by Alexander Blum over 4 years ago

'most_similar_track_artist' and 'most_similar_track_title'

You haven't answered this question: What is the exact purpose of the
field(s)?

The only purpose I can imagine is for using this information in disputes
about fingerprint recognition.
Providing the names as strings may be not unique and is also
error-prone, if filled out manually.
And it would be another step to manually search for this creation, if
needed for disputes.

If the field is only for technical evaluation, wouldn't it be better
placed in the statistical evaluation part?
Should this field be updated regulary, as it might change after
ingestion of other songs?

#16 Updated by Alexander Blum over 4 years ago

Well, then instead of 'most_similar_track', add other fields 'most_similar_track_artist' and 'most_similar_track_title', which I will fill, if it's an EchoNest ID, and 'most_similar_content' if it's a C3S content uuid.

Ah, I misread your proposal (in terms of not reading until the end ;) ).
To add both field types would be ok, I guess.

Nevertheless I want to be clear about the purpose.

#17 Updated by Thomas Mielke over 4 years ago

  • Assignee changed from Thomas Mielke to Alexander Blum

The purpose is to show both, artist and admin, if the creation has already been uploaded -- either by another artist or the same one. If most_similar_content is None but the fields most_similar_artist and most_similar_track are filled, maybe it's not necessary to show it to the artist but it can act as a random sample to check if the artist and track title are provided correctly.

For the web interface, creation form, the logic could be: "if similar_content is not None and uniqueness <= possible_dupe_threshold then display 'sounds like artist - track title' with a preview link"; furthermore: "if artist == similar_artist then print 'are you sure you haven't uploaded this track before?' else handle_possible_artist_dispute()"

#18 Updated by Thomas Mielke over 4 years ago

  • Subject changed from Add field 'uniqueness' and 'most_similar_track' for content to Add field 'uniqueness', 'most_similar_content', 'most_similar_track_artist', and 'most_similar_track_title' to table content

#19 Updated by Alexander Blum over 4 years ago

purpose is

  1. a temporary playing field
  2. statistics for analyses
  3. information for admin on e.g. disputes

#20 Updated by Alexander Blum over 4 years ago

  • Status changed from Feedback to Erledigt
  • % Done changed from 0 to 100

#21 Updated by Alexander Blum about 2 years ago

  • Target version changed from 1) Testing phase I to Repertoire 1) Testing phase I

#22 Updated by Alexander Blum about 2 years ago

  • Project changed from repertoire to collecting_society

Also available in: Atom PDF