These forums are archived

See this post for further info

get_iplayer forums

Forum archived. Posting disabled.

Webscraping TV/Radio Cache

Pages: 1 2

user-649

Hi people,

It seems that lots of people are interested in generating these cache by webscraping (right term?) the caches and then using the beloved iplayer as before. I was wondering if we could concentrate the posts here and also share experience with how to do it

I'm particular interested in radio feeds so would be will to write a python (then perl) scrapper. My only is issue is that I need the template for the radio.cache file. Can you help

Another question results to changing the way the cache works. Instead of downloading the whole BBC iplayer listing, which worked really with the feeds and now is quite slow via other methods. Would it be easier to use the pvr search terms with the iplayer search page (http://www.bbc.co.uk/iplayer/search?q=Doctor%who)

Thanks for all you work

Andy

user-637

Hello,

I think that I can help a little here. Firstly, I have made improvements to the "Refresh Cache" software so now it looks almost the same as before the changes. There are still some limitations, for example only 20 episodes of each show are cached so Bargain Hunt fans will be unhappy! ;) The information is mostly complete for each episode including the jpg image. There are still some occasional problems that need fixing, I am sorry to say.

After recording with the PVR manager the name is almost the same as before, but the beginning of the name is always "BBC_iPlayer_Feeds" which is something I cannot change. Since a new updated is expected today it may soon be a thing of the past anyway.

I guess that this only works with a Windows PC, but I did read that it can also work on other operating systems?? I have two files that I copy into the get_iplayer directory which is “C:\Program Files (x86)\get_iplayer” (on this machine). I then remove the “.txt” from the end of both files so that they are simply named “Refresh_Cache_V0_3.pl” and “Refresh_Cache_V0_3.pl.cmd”. I will try to upload the two files with this message.

I then double click the “Refresh_Cache_V0_3.pl.cmd” file to open the command, which just starts running the Perl code (“Refresh_Cache_V0_3.pl.cmd”). My computer then sits there for a very long time scraping all the Data from the BBC iPlayer website and storing the results on the hard-disk, this refreshes the cache that get i-player uses to store all the data from the iPlayer system. You can watch the progress on the screen, but it needed 16 minutes to complete on my PC, as there are currently 3820 videos on BBC iPlayer! After this, I can then start the “PVR Manager” and use get_iplayer exactly as before.

I worry about the huge amounts of data that is unnecessarily downloaded just to make a simple list of episodes on the BBC iPlayer. It would be much better to use one server to compile a list and allow users to download the very small text file, rather than all users downloading huge numbers of large html pages from the BBC!!!

I will post the format of the Radio Cache soon...

Thank you for all who have worked on this software!

Tiiveni.
Refresh_Cache_V0_3.pl_.txt
Refresh_Cache_V0_3.pl_.cmd_.txt

user-637

Howdy,

I have tried to upload the format of the "radio.cache" file for you to see.

#index|
type|
name|
pid|
available|
episode|
seriesnum|
episodenum|
versions|
duration|
desc|
channel|
categories|
thumbnail|
timeadded|
guidance|
web


Many thanks.

Tiiveni.
radio.cache_.txt

user-637

Hi There!

Since "Scraping" the whole of the BBC iPlayer website takes so long and uses so much bandwidth, I have tried to upload a "tv.cache" file made here on my own computer. The "Refresh_Cache_V0_3" code still has some bad bugs!! So, it makes more sense to just copy and paste this file for now until a fix is released later today.

The file should be renamed as "tv.cache" and copied & pasted into the correct location which is "C:\Users\"my USERNAME"\.get_iplayer\" on this Windows PC. If you are not sure of the location then you can open a DOS prompt (command prompt) and type "set" then hit enter, then look for "USERPROFILE" and next to this will be the correct location for the "tv.cache" file on your machine (if you add the "\.getiplayer\" part).

I don't know if any of this is clear or explained properly, I am not a computer guy! LOL!!

Thank you everyone!

Tiiveni.
tv.cache_.2014_11_02-08.56.zip

user-649

When I use this awesome cache file, I get a

Can't call method "new" on an undefined value at /usr/bin/get_iplayer line 1772.


I'm running linux ...

user-651

vancheese, if you delete the first line of the tv.cache it works ok for me, on linux.

agreed, awesome cache file!

user-649

Thanks for the first line help - It does the trick

If you want to program that into a bash script, this works very nicely

Code:
sed -i 1d tv.cache

user-637

Hello,

I have fixed some problems with the earlier "Refresh_Cache" software, now I have version V0_4.

There are still some limitations, for example only (maximum) 20 episodes of each show are cached, sorry Bargain Hunt fans ;) , after recording with the PVR manager the file-name is almost the same as it should be, but the beginning of the file-name is always “BBC_iPlayer_Feeds” which is something I cannot change!!!

Since a new update is expected today it may soon be a thing of the past anyway.


INSTRUCTIONS:

I guess that this only works with a Windows PC, but I did read that it can also work on other operating systems?? I have two files that I copy into the get_iplayer program files directory which is “C:\Program Files (x86)\get_iplayer” (on this machine). I then remove the “.txt” from the end of both files so that they are simply named “Refresh_Cache_V0_4.pl” and “Refresh_Cache_V0_4.pl.cmd”. I will try to upload the two files with this message.

I then double click the “Refresh_Cache_V0_4.pl.cmd” file to open the command, which just starts running the Perl code (“Refresh_Cache_V0_4.pl”). My computer then sits there for a very long time (about 20 minutes) scraping all the Data from the BBC iPlayer website and storing the results on the hard-disk, this refreshes the cache that get i-player uses to store all the data from the BBC iPlayer system. You can watch the progress on the screen, but it needed just over 16 minutes to complete on my PC, as there are currently 3820 videos on BBC iPlayer! After this, I can then start the “PVR Manager” and use get_iplayer exactly as before.


***********

Big thanks to all those who helped to make get_iPlayer!!!!! :)

Tiiveni.
Refresh_Cache_V0_4.pl_.txt
Refresh_Cache_V0_4.pl_.cmd_.txt

user-30

Great effort on this. It's great to see such input from the community.

For reference - You can zip up files and upload the zip here, thereby retaining the original file names on the zipped content. For security reasons only a small set of file types are allowed here but zip will allow you to present exactly what you want to users.

EDIT - Topic made sticky.

user-585

Thanks Tiiveni. That is great.

Is there a way this can also be used to update the radio cache too?

user-649

Here is my quick and nasty Radio ripping program. It uses the search page of the bbc to get the listing.
Sorry but I used python instead of perl, and the only dependency outside a vanilla python install is beautifulsoup4

Code:
import bs4,os, re, urllib2

#target={"localfolder":"searchterm"}
target={"Radcliffe":"The+Radcliffe+%26+Maconie+Show"}
for show in target:
    pid_list = []
    file_path = "/home/andy/Radio/" + show
    url = "http://www.bbc.co.uk/search?q="+target[show]
    print url
    page = urllib2.urlopen(url)
    soup = bs4.BeautifulSoup(page.read())
    for link in soup.find_all("a"):
        if link.get("href").startswith("http://www.bbc.co.uk/programmes/"):
            pid_list.append((os.path.split(link.get("href"))[-1]))
    for thing in set(pid_list):            
       os.system("""get_iplayer --type=radio --pid=%s --output="%s" """ % (thing,file_path))

user-626

Nice work you two - this is real Spirit Of The Blitz stuff!

user-21

thanks Tiiveni
are you planning a similar thing for the radio cache?

user-680

I'm getting close under Debian. I put new tv.cache file given above into /home/zippy/.get_iplayer
Then, I edited the first few lines in Refresh file V4 as follows...

Quote:# Parameters
$showPID = 'b006q2x0';
$episodePID = 'b04n1wqm';

#$ProgramFiles = "/usr/bin";
print "Directory for Program Files: ".$ProgramFiles."\n";

$Get_iPlayerPath = "/home/zippy/.get_iplayer";
print "Directory for Get_iPlayer files: ".$Get_iPlayerPath."\n";

$PerlPath = "/usr/bin/perl";
print "Directory for Perl.exe: ".$PerlPath."\n";

$UserProfile = "/home/zippy/.get_iplayer";
print "Directory for User Files: ".$UserProfile."\n";

$Get_iPlayer_Cache_Path = "/home/zippy/.get_iplayer";
print "Directory for iPlayer Cache File: ".$Get_iPlayer_Cache_Path."\n";

$Refreshed_Cache = "/home/zippy/.get_iplayer/tv.cache";
print "Directory for Refreshed Cache File: ".$Refreshed_Cache."\n";

I then ran the perl script with

Quote:perl ./Refresh_Cache_V0_4.pl > /home/zippy/data/list.txt

This give me a text file with all the updated pics in it.
Since I don't use the PVR, this works great for me
One warning though......its not quick :(

Thanks guys for the fix :)

Zippy

user-637

Hello,

This is really great all the work that you are all doing! I do not know if I have enough time, but I did take a look already at making the "Refresh Cache" Perl script to update both the "tv.cache" file and also the "radio.cache" file. The biggest problem is the huge number of episodes available on the iPlayer Radio system!!

Thanks again to all here!

Tiiveni.

user-21

Quote:<div class="d4p-bbp-quote-title">Quote:</div>Hello,

This is really great all the work that you are all doing! I do not know if I have enough time, but I did take a look already at making the “Refresh Cache” Perl script to update both the “tv.cache” file and also the “radio.cache” file. The biggest problem is the huge number of episodes available on the iPlayer Radio system!!

Thanks again to all here!

Tiiveni.

Quote:<div class="d4p-bbp-quote-title">Quote:</div>Hello,

This is really great all the work that you are all doing! I do not know if I have enough time, but I did take a look already at making the “Refresh Cache” Perl script to update both the “tv.cache” file and also the “radio.cache” file. The biggest problem is the huge number of episodes available on the iPlayer Radio system!!

Thanks again to all here!

Tiiveni.
If you can get a radio one working you will be my hero
would it be possible to limit the search to specific radio stations rather than all of them?

user-550

Quote:<div class="d4p-bbp-quote-title">Quote:</div>The biggest problem is the huge number of episodes available on the iPlayer Radio system!!

I can appreciate that Tiiveni, but that update only really needs to be run once a week to catch everything. Although some people might want it every 3 days to amke sure nothing they are collecting is missed!

Sadly I am more of a hardware and network person than a coder otherwise I would have a crack, I can follow the logic but I wouldn't know how to start a new thread.

Neil

user-637

Hello,

Here is a "radio.cache" file that I have made here on my PC. It is incomplete as there are no "episodes" from Radio Series, only the radio programs that don't belong to a series.

The new version of get iPlayer is out now, so I hope that this "radio.cache" file will not be needed. Fingers crossed!

INSRUCTIONS
The file should be renamed as “radio.cache” and copied & pasted into the correct location which is “C:\Users\”my USERNAME”\.get_iplayer\” on this Windows PC. If you are not sure of the location then you can open a DOS prompt (command prompt) and type “set” then hit enter, then look for “USERPROFILE” and next to this will be the correct location for the “tv.cache” file on your machine (if you add the “\.getiplayer\” part).

Thanks again to all who work on this project!

Tiiveni.
radio.cache_.2014_11_03_12.30.zip

user-250

Excellent !
In recent windows you can test for 32 or 64 bit architecture quite easily

Code:
--cut
Set RegQry=HKLM\Hardware\Description\System\CentralProcessor
REG.exe Query %RegQry% > checkOS.txt
Find /i "x86" < CheckOS.txt > StringCheck.txt
If %ERRORLEVEL% == 0 (
    Echo "This is 32 Bit Operating system"
set arch=32
) ELSE (
    Echo "This is 64 Bit Operating System"
set arch=64
)
echo %arch%
-- cut





Quote:<div class="d4p-bbp-quote-title">Quote:</div>Hello,

I think that I can help a little here. Firstly, I have made improvements to the “Refresh Cache” software so now it looks almost the same as before the changes. There are still some limitations, for example only 20 episodes of each show are cached so Bargain Hunt fans will be unhappy! ;) The information is mostly complete for each episode including the jpg image. There are still some occasional problems that need fixing, I am sorry to say.

After recording with the PVR manager the name is almost the same as before, but the beginning of the name is always “BBC_iPlayer_Feeds” which is something I cannot change. Since a new updated is expected today it may soon be a thing of the past anyway.

I guess that this only works with a Windows PC, but I did read that it can also work on other operating systems?? I have two files that I copy into the get_iplayer directory which is “C:\Program Files (x86)\get_iplayer” (on this machine). I then remove the “.txt” from the end of both files so that they are simply named “Refresh_Cache_V0_3.pl” and “Refresh_Cache_V0_3.pl.cmd”. I will try to upload the two files with this message.

I then double click the “Refresh_Cache_V0_3.pl.cmd” file to open the command, which just starts running the Perl code (“Refresh_Cache_V0_3.pl.cmd”). My computer then sits there for a very long time scraping all the Data from the BBC iPlayer website and storing the results on the hard-disk, this refreshes the cache that get i-player uses to store all the data from the iPlayer system. You can watch the progress on the screen, but it needed 16 minutes to complete on my PC, as there are currently 3820 videos on BBC iPlayer! After this, I can then start the “PVR Manager” and use get_iplayer exactly as before.

I worry about the huge amounts of data that is unnecessarily downloaded just to make a simple list of episodes on the BBC iPlayer. It would be much better to use one server to compile a list and allow users to download the very small text file, rather than all users downloading huge numbers of large html pages from the BBC!!!

I will post the format of the Radio Cache soon…

Thank you for all who have worked on this software!

Tiiveni.

user-550

Thanks for the radio.cache, Tiiveni. It certainly beats manual searching, downloading and renaming!

I wish I could save you the trouble of zipping and posting it.

Neil
Pages: 1 2

These forums are archived

See this post for further info