These forums are archived

See this post for further info

get_iplayer forums

Forum archived. Posting disabled.

Screen Scraper Example

user-619

See attachment in post below:

/topic...post-10225

user-2

@boisemick: Please replace the script. I tried to wrap it in a code block but WordPress destroyed it instead. Sorry about that.

user-2

Sorry again - you won't be able to edit it, so please repost in a new entry and I'll try to fix.

user-2

Actually, please post it as a .txt attachment so I see what it is actually supposed to contain before I do any editing.

user-619

Sure thing, but have done a lot more work on it today, it now takes a show PID, scrapes the site, creates a multidimentional array containing Show, EpisodePID, Series number, Episode Number and a suggested filename in the format 'Doctor Who.s08e09'. It then takes that array, loops through it, runs get_iplayer against the pids and then renames the show. There is a few bugs right now. I will post the whole thing sometime tomorrow.

Mick

user-619

Here you go (attached)

Work remaining -
1 - Read a CSV that comntains two columns (Show PID and Show Name)
2 - I have the filename built in the array but I am not renaming yet. I did notice that if my get_iplayer installation cached the metadata for a show, then get_iplayer is still doing a great job of renaming the file, but I assume this data comes from the same feeds that are now dead? Perhaps you could comment.

Other than that, this seems to work. Of course - let the madness begin on screen-scraping - I already have to check for various formats on the web pages. Example in the HTML for Doctor Who the word 'episode' is missing but for Russel Howards Good News Show it is there.

user-619

Attached as TXT this time

user-2

Thanks. I replace original with directions to attached script.

user-695

If anybody is interested, here is a bash script that has some of the same functionality. Just give it a series
pid as an argument and it will output some get_iplayer downloads. It attempts to find a usable name for the episode from the final epsiode url (after redirection), but I haven't tried to format it for tvdb or the like. So far it's only been tested on Doctor Who, QI XL, and Only Connect.



Code:
#!/bin/bash

queue_series()
{
series_page="<code>wget -qO - http://www.bbc.co.uk/iplayer/episodes/$1</code>"
episode_pids=<code>echo $series_page  |grep -oP '/iplayer/episode/\K([a-z,0-9]{8})' - |sort -u</code>

while read -r pid; do
    episode_url=<code>curl -Ls -o /dev/null -w %{url_effective} http://www.bbc.co.uk/iplayer/episode/$pid</code>
    title="${episode_url##*/}"
    command="get_iplayer --pid=$pid --file-prefix=$title"
    echo $command
done <<< "$episode_pids"

}

series_pid=$1
queue_series $series_pid

user-695

I'll try that again. Here's an attachment.

user-695

You can also use the iPlayer search function to scrape the series pid with something like:

Code:
curl -s  http://www.bbc.co.uk/iplayer/search? --data-urlencode q=QI XL --get | grep -oP '/episodes/\K[0-9,a-z]{8}' -

(Using curl because it includes url encoding as standard functionality.) Pipe the output of that into the previous script and you have something close to the old pvr functionality.

If I get the time I'll try to put it all together in a "nice" script with option-parsing and documentation, but don't all hold your breath :-)

These forums are archived

See this post for further info