These forums are archived

See this post for further info

get_iplayer forums

Forum archived. Posting disabled.

limitation on --info matches

user-667

Is there a way of getting around the limit on the amount of programme metadata available?

"--info, -i    Show full programme metadata and availability of modes and subtitles (max 50 matches)"

(Or is there a better way of getting all "desclong" metadata for BBC Radio 3 - which is what I am trying to do?)

user-2

Hmm, that help text is incorrect - limit is 40. The limit is hard-coded in script, so you would need to change the code.  But don't do that. Be nice to the BBC and don't thrash their servers. You only need 1 file per episode instead of the 8-10 per episode that get_iplayer retrieves for --info listings. Besides, get_iplayer won't give you anywhere near all the programmes available on Radio 3. You will need to scrape the relevant pages yourself.  There are many tools available, but the scraping may be a bit tricky since iPlayer content is only partly organised by channel.

user-667

Yes, I'd noticed the discrepancy.  -i doesn't seem to put any significant load on the BBC servers.  A couple of kbytes per programme.  A typical programme set for Radio 3 is about 80 programmes, so not too many.  I wonder why a limit - 40 or 50 - was ever set for the -i option?  It is giving me grief!  My first effort is to try to use -exclude to exclude programmes near the start of the alphabet.  Then a straight use of -i will capture the first 40, and an -exclude version will include the last 40.  It's a nuisance that the actual limit is 40 rather than the documented 50 - as 40 plus 40 leaves no overlap tolerance with this crude-ish technique, whereas 50+50 would work out better.  Damn, damn, damn!

Any other ideas would be very welcome.

user-2

Your arithmetic is off by an order of magnitude. The --info display for  each programme requires ~10 downloads (only one of which you need), roughly 20kB+ total. Not a lot of data, but a lot of useless downloads. Since the R3 programme names won't vary much, you may find it easier to use inclusive search binning rather than using --exclude. I presume the limit was included to prevent users from filling their screens with useless spew if they forgot to enter search criteria, and perhaps to prevent casual users from using get_iplayer as a site scraper. Of course, when get_iplayer was written its data sources were a bit different. As for why the limit was set at 40, I have no idea, but there it shall remain.

user-667

I can only speak as I find, which is that approximately 80 programmes result in about 300kbytes of meta data.  What appears to do the job are the following 3 DOS batch file commands:

get_iplayer --type=radio --channel="Radio 3" "Afternoon on 3" "Breakfast" "Composer" "Essential" -i > R3Metadata.txt

get_iplayer --type=radio --channel="Radio 3" "In Tune" "Radio 3" "The Essay" "Through"  -i >> R3Metadata.txt

get_iplayer --type=radio --channel="Radio 3" --exclude="Afternoon on 3","Breakfast","Composer","Essential","In Tune","Radio 3","The Essay","Through"  -i >> R3Metadata.txt

 ...  and then a Word macro that selectively converts newlines to tabs, converts the result to a table and then deletes the unwanted columns and rows, leaving me with Programme Name, Desclong and pid columns.

user-2

That isn't all the data that comes over the wire - it has been digested by get_iplayer

These forums are archived

See this post for further info