Nitro Interface
#1
At the moment Nitro api keys are only available to internal BBC staff according to the header on the developer page: https://developer.bbc.co.uk/nitro
"The BBC Developer site is currently open for registration to BBC Employees. Account requests from other users are not currently being activated. Please check back soon for more info."

We should start to develop a Nitro interface for get-iplayer for when this interface is opened up.

The API documents are available: https://developer.bbc.co.uk/sites/defaul...lients.pdf but the schema at http://nitro.api.bbci.co.uk/nitro/api/schema is not available without an API key. There is not enough info in the For Clients document to do a complete design, but enough to map out a structure.

When the API schema is open we can progress.
#2
(05-04-2016, 11:26 AM)xplora1a Wrote: but the schema at http://nitro.api.bbci.co.uk/nitro/api/schema is not available without an API key.

Is it not?

The latest description of the Nitro query format is always available here:

https://raw.githubusercontent.com/Mermad...pi/api.xml

And the schema describing the output format here:

https://raw.githubusercontent.com/Mermad...schema.xsd
#3
Nitro is not really an option for get_iplayer. Nitro terms preclude get_iplayer from being granted access. Moreover, Nitro isn't made for the sort of bulk indexing that get_iplayer currently relies upon. It works, but it's much slower than the data feeds in use up to now, and it produces a lot of unnecessary (for get_iplayer) information, while also leaving some holes. On the plus side, it can provide the searchable version/genre info that get_iplayer lost some time ago. 

As has been discussed several times before in get_iplayer circles, the likely place for Nitro would be in providing a central programme index, completely separate from get_iplayer, though that may also fall foul of Nitro terms.  But, even if someone was willing to provide that service, and was permitted to do so, it still remains to be seen whether Nitro will ever be available for such use - it's nearly 2 years and counting. 

get_iplayer was written for the BBC and iPlayer ecosystem of 8 years ago, when there was complete and easy-to-use XML programme data available. If the last vestige of that is removed by the BBC, it will probably be easier to change the fundamentals of get_iplayer than to continue working around the BBC.
#4
(05-04-2016, 01:52 PM)dinky Wrote: Nitro is not really an option for get_iplayer. Nitro terms preclude get_iplayer from being granted access. Moreover, Nitro isn't made for the sort of bulk indexing that get_iplayer currently relies upon. It works, but it's much slower than the data feeds in use up to now, and it produces a lot of unnecessary (for get_iplayer) information, while also leaving some holes. On the plus side, it can provide the searchable version/genre info that get_iplayer lost some time ago. 

As has been discussed several times before in get_iplayer circles, the likely place for Nitro would be in providing a central programme index, completely separate from get_iplayer, though that may also fall foul of Nitro terms.  But, even if someone was willing to provide that service, and was permitted to do so, it still remains to be seen whether Nitro will ever be available for such use - it's nearly 2 years and counting. 

Granted Nitro is slow if you don't parallelize the queries. You can run a reasonable number in parallel without hitting the rate limiting. It isn't currently a great advert for MarkLogic's NoSQL database though.

I've knocked together a proof of concept that can index all 80-odd thousand available programmes (i.e. including those available for longer than 28 days, plus podcasts) in around 40 minutes* and produce an sqlite3 database or CSV output. I still need to add a couple of fields.

https://morph.io/MikeRalphson/Nitro9

A next step might be to seed the database over bit-torrent each day, every 4 hours etc. Though at about 29Mb it isn't that large.

The purpose of the demo is purely academic at the moment, and to provide an SQL queryable dataset for research purposes.

* That's how long it takes to run, not how long it took to write. ;)
#5
(16-04-2016, 02:14 PM)meic Wrote: I've knocked together a proof of concept that can index all 80-odd thousand available programmes (i.e. including those available for longer than 28 days, plus podcasts) in around 40 minutes* and produce an sqlite3 database or CSV output. I still need to add a couple of fields.
That agrees pretty well with my own tests, though I expect Node (assumed to be your weapon of choice) would be faster at this sort of thing in general. However, I never ran more than 4 queries in parallel and I hit rate limit several times, though it was simple enough to restart and pick up where I left off.  Anyway, it's good to see confirmation that it's possible.
(16-04-2016, 02:14 PM)meic Wrote: A next step might be to seed the database over bit-torrent each day, every 4 hours etc. Though at about 29Mb it isn't that large.
I had the same thought re: BitTorrent, but ultimately decided I wouldn't be willing to take on the potential support headaches. And as you say, the download isn't that large. It could be even smaller in practice since get_iplayer would only ever want programmes that have become available within the previous 30 days.

Thanks, btw, for sharing your Nitro and IBL API documentation at GitHub. It is really helpful to have that information in a concise, useful format.
#6
(16-04-2016, 06:07 PM)dinky Wrote: I had the same thought re: BitTorrent, but ultimately decided I wouldn't be willing to take on the potential support headaches.

Ah, I hadn't thought of the networking problems some users might experience.

(16-04-2016, 06:07 PM)dinky Wrote: And as you say, the download isn't that large. It could be even smaller in practice since get_iplayer would only ever want programmes that have become available within the previous 30 days.

Morph.io has an API which allows you to run remote SQL queries against the output and extract what you want in json, csv or atom. I'm thinking of trying to run pvr style queries against it.

The latest db is about 39Mb as I've added non-default versions, categories, guidance and a 'vpids' column.

(16-04-2016, 06:07 PM)dinky Wrote: Thanks, btw, for sharing your Nitro and IBL API documentation at GitHub. It is really helpful to have that information in a concise, useful format.

No problem, glad it's useful to someone else.