Work in progress: correcting tag capitalisation with regular expressions
March 11, 2014 in bliss by Dan Gravell
I just committed a new change to the bliss codebase:
noncompliant because "album name should start with an upper case character" when album.name/m/"^[a-z]"/ { fixedby Set name to album.name/s/"^([a-z])"/"\U$1\E"/ }
To some that might look like gibberish. To others who have used regular expressions some of it might make some sense. To me, however, there are three reasons why this is exciting... and why it promises interesting new avenues for bliss.
The most immediate reason this is exciting is that this is a piece of code that can change album names to a starting upper-case character. If your album name is erroneously tagged thriller, it can change it to Thriller. This is the top rated feature on our ideas forum, so clearly lots of people want to see this!
But for a geek like me, that's not the interesting bit. In fact, the above taken on its own is mundane... it only fixes album names, and it only changes to upper case? Boring!
It gets interesting when you consider the second reason: that this is bliss's own internal rule language. I've been developing this for a while, and it's used for the track artist consolidation and canonical artist rules. What I've done here is add support for regular expressions ("regex(es)") to these "rule scripts".
The final reason this is exciting is because these scripts are written in a format that bliss can interpret and execute, that leaves the possibility for writing custom scripts... Let me know if you're interested in playing around with this.
On regular expressions
So how do these "regular expressions" work here? It's all about the highlighted sections:
noncompliant because "album name should start with an upper case character" when album.name/m/"^[a-z]"/ { fixedby Set name to album.name/s/"^([a-z])"/"\U$1\E"/ }
The rule script is executed for each album. The script states that the album is non-compliant when the regex ^[a-z]
matches. That is, when the first character in the album name is a lower case character. It then applies the reason "album name should start with an upper case character" as the reason for the non compliance.
bliss is a constructive-kind-of-app though, so it doesn't just point out your tagging deficiencies from the sidelines like a software Russell Brand. It wants to suggest solutions. And in this case album.name/s/"^([a-z])"/"\U$1\E"/
basically says "replace the first character of the album name with the upper case equivalent". So there you have it: assessment and fix.
Regex afficionados: you may or may not recognise the syntax of these expressions. The regexes themselves are hopefully understandable, but the surrounding expressions may not be. I've taken inspiration for the /s/
or /m/
syntax meaning "substitute" or "match" from sed. The \U
is non-standard syntax ripped from Perl, meaning convert the contained string to upper case. Finally, denoting the groups using $1
is done because underneath all this it's just the Java regex engine processing these rules.
The intention is, to fully satisfy the idea, these rule scripts will be generated on the fly for each tag that should be corrected, also plugging in the desired regular expression.
Personally, I've long used regular expressions for various tasks but their applicability really stood out when it was suggested on the ideas thread to take a look at Grammartron. This is an enormous range of grammar rules for the popular tagged MP3Tag. I realised how they could be used in bliss, and how regexes enabled this, so I set about supporting them in rule scripts.
Supporting regexes in rule scripts really opens up the possibilities for all sorts of textual rules to be developed! I'm hoping to release case standardisation within bliss to beta in the coming weeks, so let me know if you want to try it out ahead of full release.
Thanks to Chris Corwin for the image above.