So, I've got this new definition of edit distance and I'm thinking about music comparison, am I stupid?

Edit distance of a string can be useful, and processed per-character in a generic way on any string.

However if you know the string is something the "a Java class" you could "tokenize" the string and instead of doing an edit distance on two strings in a generic fashion, you do them at the "token level" and that might get you more useful data.

Then again, if you take two Java classes that exactly the same when compiled but are defined with different names for local variables in methods, the output would be exactly the same but the edit distance might be completely different-- does that matter, or would you then do an edit distance at the "abstract syntax tree" level or do you wait until the "compiled bytecode level"? You get different results based on what you choose to infer.

With regards to MP3, you're looking to infer "note data" from that-- but there's some MP3s that are just spoken word, or some just sound effects, and of course lots that are "music" but have accompanying singers with it.

The way I see it, MP3s are just a collection of sound waves / sine waves, and though they may sound like words or music or whatever to us, the information on how they were produced is fundamentally lost.

That said there are ways to get say "music data" from MP3s, things like "beat per minute" calculations will work, but only make sense on music based MP3s, and perhaps work better with songs that have a constant BPM.

I wouldn't be surprised if by now there were algorithms or machine learning techniques to attempt to reproduce MIDI files from songs, but is be curious of their success.

I think what you're looking at is the equivalent of saying, "I have a bunch of PNGs, and I want to compare edit distance with regard to how each was drawn."

PNGs could be photos, and thus weren't drawn. If instead they're scanned paintings, they were drawn but you can necessarily infer the order of the strokes correctly. Even if they're simple line drawings, you already lose the information about "order in which lines were drawn."

As humans we think of MP3s as music, and when we think of music we think of "notes in an order" like in MIDI but actual played music (and later MP3s) I think are more complicated than that.

You can't represent reverb or echo in a midi I think, nor do they have a list of all instruments that exist that could potentially be used, etc.

tl;dr: There's high level data than can be inferred, and it'd be great to see how far you'd get. Generally speaking though what I think you want to do is equivalent to trying to un-cook a steak to see what the cow looked like; there might be a way to do it but you already have information lost and your assumptions when reversing might be incorrect.

/r/compsci Thread