These forums are archived

See this post for further info

get_iplayer forums

Forum archived. Posting disabled.

BBC data sources

user-1102

I'm a little surprised to see get_iplayer resort to web-scraping at this point, isn't enough data to populate a radio cache available from the still-running aod feeds?

Similarly, it looks as if enough data is available via the ibl API (which is accessible without an API key) to populate a tv cache.

Both of these sources are efficient, populating caches in just a few seconds, even on my slow Windows system.

For anyone interested, there is a proof-of-concept at https://github.com/mermade/bbcparse (node.js >=4.x is required). More work would be required to maintain a rolling 30-day cache.

Usage:

Quote:node gip_ibl > /path/to/tv.cache
node gip_aod > /path/to/radio.cache

Probably wise to backup your cache files first, especially if they take a long time to generate.

user-1687

Hi Meic,

I think the approach first and foremost was to get it up and going.
Now there is some time to look at the ways to optimise it.
I just say it because I'm a software developer/systems analyst and know that is most important first and foremost when there is a massive outage.
Get it up and going and then you can look at ways to optimise it.

I appriciate all developers work in this and your way sounds really good, but integrating it into a release so quickly I would not have personally done. HTML scraping was proven to work even as an interium until a proven more optimised method could be applied.

I actually think HTML scraping (well for me) is working faster than the old XML way. Don't ask me why on that, same hardware, same connection, same VPN but it seems to work faster.

Now there is the option of looking at optimisation, we as a community can do it.

Cheers
Tel

user-1286

I would also be interested in the idea of creating a community server, which makes available the information get_iplayer needs, in a format optimised for get_iplayer to use, and which can employ whatever mechanism is needed to keep it up to date.

For example, if web-scraping is required, the site could do the scraping once, and then make the data available to the community without further impact on the BBC's servers. If eventually necessary, it could even be kept up to date using a completely manual process by community volunteers. This is how TiVo users in Australia keep their machines running, for example (see oztivo.net).

If anyone has any estimates of how many active users of get_iplayer there are I would be interested. I would also be interested in co-operating on creating a suitable API format for get_iplayer to use. There is no need for it to be a copy of any BBC API, or even to use the same concepts (for example the PID hierarchy) as long as we could abstract the notion of "what PID does get_iplayer have to specify to download this programme". Things like pidrecursive could be replaced by a concept of programme grouping defined and implemented on the server any way we want.

These forums are archived

See this post for further info