Dynamic Range Database Extraction Script

Dynamic Range Database Extraction Script

UPDATED Modified (23/02/2014) the script to work with the new DR css design.

The Dynamic Range Database is a collection of albums and their associated dynamic ranges, measured using software available here.

I thought it’d be interesting to look at the entire data set available and look for trends. The below script ‘scrapes’ the salient contents of the database (artist, album, year, ave, max, min, track values) a record at a time and writes it to a file.

I’m sure there is a better / more efficient way to write this script, but it worked OK for me. YMMV.

#!/bin/sh

# Tool to extract DR values in bulk from the DR database
# Total number of submissions as of 20th Nov 2013 is 50131
# Total number as of 23rd Feb 2014 is 56410
# Script updated due to DR site cosmetic changes 23/02/2014

let count=4

while [ $count -lt 56410 ]; do
((count++))

#echo $count

#curl -# -s http://www.dr.loudness-war.info/details.php?id=$count > temp.html
curl -# -s http://dr.loudness-war.info/album/view/$count > temp.html

#Artist Name - line 20, now 73
#a=$(sed '20q;d' temp.html | cut -d "'" -f 8)
a=$(sed '73q;d' temp.html | cut -d "<" -f 5 | cut -c 4-)

#Album Name - line 25, now 74
#b=$(sed '25q;d' temp.html | cut -d "'" -f 8)
b=$(sed '74q;d' temp.html | cut -d "<" -f 5 | cut -c 4-)

#Year - line 30, now 75
#c=$(sed '30q;d' temp.html | cut -d "'" -f 8)
c=$(sed '75q;d' temp.html | cut -d "<" -f 5 | cut -c 4-)

#Codec NEW FOR 23/02/2014
d=$(awk '/<tr><th>Codec<\/th><td>/' temp.html | cut -d "<" -f 5 | cut -c 4-)

#Source
#e=$(tail -9 temp.html | head -1 | cut -d ">" -f 3 | cut -d "<" -f 1)
e=$(awk '/<tr><th>Source<\/th><td>/' temp.html | cut -d "<" -f 5 | cut -c 4-)

#DR values, now lines 76-78
#e=$(awk '/dr...dr/' temp.html | cut -d " " -f 2- | cut -c10-11 | awk 'BEGIN {FS="\n";RS="";ORS=""} { x=1; while (x<NF) {print $x","; x++} print $NF "\n"}')
f=$(sed '76q;d' temp.html | cut -d ">" -f 5 | cut -d "\"" -f 2 | cut -c 9-)
g=$(sed '77q;d' temp.html | cut -d ">" -f 5 | cut -d "\"" -f 2 | cut -c 9-)
h=$(sed '78q;d' temp.html | cut -d ">" -f 5 | cut -d "\"" -f 2 | cut -c 9-)


echo $count $a $b $c $d $e $f $g $h
echo $count",\""$a"\",\""$b"\","$c",\""$d"\",\""$e"\","$f","$g","$h >> output_DR.txt

rm temp.html

done