These forums are archived

See this post for further info

get_iplayer forums

Forum archived. Posting disabled.

Not quite (yet) feature request - more like research - Welsh Subtitles for S4C

user-978

First let me repeat my thanks to the get_iplayer team first for providing such a great resource, and then keeping it going in the face of recurring serious obstacles.

Now I may not express this very well, but please bear with me: 

As of approximately December last year, S4C programmes started to become available on BBC iPlayer. This had been announced earlier in the year and it was potentially quite exciting because potentially it provided a way of downloading S4C programmes which hitherto had not been possible via "Clic" (S4C's equivalent of iPlayer).

For various reasons I had not had the need to try downloading any S4C programmes until recently.
My use of get_iplayer had mostly been for radio, and if I wanted to watch S4C I usually used "Clic".

However, I'd had some problems connecting to Clic, so I thought I'd try iPlayer. I watched one of my regular programmes via the iPlayer website (i.e. not downloading), and that was fine. However, as I suspected, this programme which has both English and Welsh subtitles on Clic only had English subtitles on iPlayer.

Just to compare, I downloaded it using the iPlayer downloader. It downloaded and played all right, but again, no Welsh subtitles. (It had also lost the s/t text colour changes used to  differentiate between characters).

So I thought now's the time to try get_iplayer. I was afraid it might not work because of some format difference between S4C programmes and original BBC ones, and at first, my fears seemed to be confirmed because it failed (with the scalar error that I've seen mentioned in other threads, and didn't download anything).

I then wondered if it was because I was not using the latest version, checked with this site, and was pleased to find a recent upgrade (2.94 - I was on 2.92). I installed that, and, bingo, all worked fine.

Now, I was half hoping that perhaps get_iplayer might have found the Welsh subtitles (even if the BBC iplayer interface wasn't doing anything with them). I played the MP4 file with my usual player (Daum potplayer - I find this a bit more "lightweight" (in a good way) than VLC), and was able to switch the subtitles on and off, and that was fine as far as it went - still only English subtitles. I used the facility in potplayer to browse all the subtitles, and indeed there were only English ones there. Potplayer has the ability to switch on a 2nd set of subtitle files (or maybe multiple sets). By default both 1st and 2nd s/t set options were pointing to the same .srt file that get_iplayer had created, and I could display them simultaneously at the bottom and the top of the screen.

Anyway, it looks like if the Welsh subtitles were available, they would have to go into a second .srt file.
Hold that thought for a minute while I go into some more background.

I had written to the BBC to ask if it were possible for them to show the Welsh subtitles for S4C programmes that had then, when they appeared on iPlayer. I got the reply that I half expected, even though it's not actually logical, that they had no control over S4C content and this was a matter for S4c and I should write to them...

Well, ok, I thought, I'll go along with this for now, and see what S4C say. I also asked them whether they had any objections to making the Welsh subtitle data available to the BBC.

I got a courteous reply today from S4C saying yes they were perfectly happy to make this data available, but the "BBC technical infrastructure" was not capable of making use of it (or similar words). By which I assumed they meant the BBC would have to tweak iPlayer to make use of it, which is clearly true, but I don't believe it would be a very major piece of work.

Well, I will wait a few more days to see if I get any further communications from either "side". However, I got to wondering whether the Welsh subtitle data might actually be already buried in the data content of the S4C programmes on the iPlayer servers.

May I ask the technically knowledgeable people here if that is at all likely?

Is it at all possible for the get_iplayer technical wizards to look at one or two relevant programmes and see if it looks like there could actually be multiple subtitles there? Since I have no idea how they would be stored, I am not sure if what I am asking is realistic. I know that get_player puts subtitles in a .srt file, but I don't know if they are stored in the same way on the iPlayer site.

If it is at all helpful, here are some URLs of S4C programmes that appear on iPlayer, which originally had Welsh and English subtitles on S4C's "Clic": 

http://www.bbc.co.uk/iplayer/episode/p02...-pennod-55

http://www.bbc.co.uk/iplayer/episode/p02...-pennod-56

http://www.bbc.co.uk/iplayer/episode/p02...-pennod-57

http://www.bbc.co.uk/iplayer/episode/p02...-pennod-58

I just noticed the "--info" option - trying this on a couple of the above, I notice it includes "..modes=subtitles1", for what it's worth.

Thanks in advance for any help, suggestions, or comments.

p.s. I suppose if I could just talk S4C into sending me .srt files with them in for the programmes I'm interested in, I'd be home and dry ... :-)  Don't think they will though, somehow.

user-2

Quote:I just noticed the "--info" option - trying this on a couple of the above, I notice it includes "..modes=subtitles1", for what it's worth.

You've answered your own question.  There is only one set of subtitles (English) available from iPlayer.  If multiple subtitles were ever available from iPlayer, get_iplayer would have to be amended to support them.

You can find Welsh subtitles from Clic without the help of S4C.  View the source of the page where the programme is playing in your browser and you should see the subtitle files specified on a line that looks like the following (search for captionsFile):

Code:
captionsFile: "Isdeitlau Saesneg:/sami/E_A522218798.xml,Isdeitlau Cymraeg:/sami/C_A522218798.xml",

Prefix the path with the base URL of the S4C site to get the complete URL for the subtitles file, e.g.,

http://www.s4c.cymru/sami/C_A522218798.xml

The subtitles are in TTML format (same format used by iPlayer), so you will probably need to find a tool to convert them to SRT or another format that can be used by your media player.

user-978

(17-07-2015, 11:24 AM)I just noticed the "--info" option - trying this on a couple of the above, I notice it includes "..modes=subtitles1", for what it's worth.

You can find Welsh subtitles from Clic without the help of S4C.  View the source of the page where the programme is playing in your browser and you should see the subtitle files specified on a line that looks like the following (search for captionsFile):

Code:
captionsFile: "Isdeitlau Saesneg:/sami/E_A522218798.xml,Isdeitlau Cymraeg:/sami/C_A522218798.xml",

Prefix the path with the base URL of the S4C site to get the complete URL for the subtitles file, e.g.,

http://www.s4c.cymru/sami/C_A522218798.xml

The subtitles are in TTML format (same format used by iPlayer), so you will probably need to find a tool to convert them to SRT or another format that can be used by your media player.
Wow! You are a star Dinky; thank you!

As you say, I will have to find a tool for the conversion, but at least now I can see what has to be done.
And if necessary, I could even hack them by hand or try scraping off my rusty coding skills.
Fan-blydi-tastic. :-)

...

My only worry would be if S4C one day pull "Clic" since "it's all on iPlayer now" (this underlying worry was one reason why I'd hoped they would have carried the Welsh s/t's into iPlayer from the beginning).
But perhaps that's probably not worth worrying about until it happens or seems very likely to happen imminently. There would at least be then more of a potential argument for getting the BBC to do what would be needed.

user-978

Update: found a python script converter which looks like it has come out of the get_iplayer stable. :-) 

Had to install python, but that was painless with ninite, and it worked first time, and the resulting .srt plays fine too.

I don't know about other players, but with potplayer, I can now choose between no subtitles, English-only subtitles, Welsh-only subtitles, or both at once (top and bottom), so this is actually better than both normal iPlayer and S4C Clic.

Thanks again people!

...

While googling around, I noticed that there had been calls for VLC to support TTML directly. Evidently this hasn't been implemented so far. I made sure I was on the latest one and tried to use the TTML, but it just ignored it (as does potplayer).
Well, one day, maybe.

user-2

If you're talking about the converter that I wrote, you will find some of the SRT files it generates to be indigestible to VLC. It doesn't clean up all the leading white space found in S4C subtitles. Potplayer is more forgiving.

user-978

(17-07-2015, 03:52 PM)If you're talking about the converter that I wrote, you will find some of the SRT files it generates to be indigestible to VLC.  It doesn't clean up all the leading white space found in S4C subtitles. Potplayer is more forgiving.

Ah, thanks for the warning. Well, I just googled, and this was the most straightforward thing that I found so far: 

http://get-iplayer-automator.googlecode....tml2srt.py

There is no author name, but the get_iplayer in the title is suggestive. :-)

If that's yours, then thank you once again.

I will watch out for the white space issue.

user-2

That script was written to work with ITV subtitles, so it may not travel well. Since you now have python, take a look at pycaption and pycaption-cli. pycaption is a bit heavier than my script, with additional dependencies, but it does a more rigorous conversion.

user-978

(17-07-2015, 05:27 PM)That script was written to work with ITV subtitles, so it may not travel well.  Since you now have python, take a look at pycaption and pycaption-cli.  pycaption is a bit heavier than my script, with additional dependencies, but it does a more rigorous conversion.

Thanks again!  

Might have to do a bit more thinking to get that to work, but I'll investigate. :-)

user-2

I was motivated to try to fix up the whitespace handling so that the output makes VLC happy.  An updated script is here:

https://raw.githubusercontent.com/GetiPl...tml2srt.py

I should have mentioned before: Don't pull that script from Google Code.  That site is long dead.

user-2

I've made another update to preserve subtitle colours. URL as above.

user-978

@user-2:

Belated many thanks for this! Haven't checked back here for a week or two.
Haven't tried the above, but will do.

I'd played around with manually edited SRT to see if either potplayer or VLC supported the <font color=> tag which is apparently part of basic SRT, but I couldn't get them to. (Have mentioned it on their respective forums).

However, I've now tried KMplayer and GOM, and they both (sort of) seem to support it, so it's definitely worth having the colour info preserved.

Ironically, S4C have recently made a change to their site, and their player is not currently supporting the colours!
But BBC's iPlayer does! (for the sameprogrammes/episodes). And the colour info is there in the TTML .xml files when I download them from the S4C site. I've had a moan about the sudden lack of colour support in their player to S4C, but nothing more than a polite routine acknowledgement so far. But at least the actual subtitle source is still ok (all those I have checked so far anyway).  

(for some reason can't get quote-reply to work at the moment, hence this is a "new reply").

Many thanks again!

user-2

Both PotPlayer and VLC support colours with <font color="">, so have another go. S4C has switched to a new video player (looks like it is based on video.js), which apparently doesn't do colours in subtitles.

user-978

(01-08-2015, 01:49 AM)Both PotPlayer and VLC support colours with <font color="">, so have another go.  S4C has switched to a new video player (looks like it is based on video.js), which apparently doesn't do colours in subtitles.

Sorry to necro-post, although hopefully forgivable as I started the thread.

Well one small piece of good news is that the S4C Clic now does do colours in subtitles again.

However, and this is bad news as far as I'm concerned, the most recent change they made to their site means that it doesn't appear possible to download subtitles from there any more (i.e. the S4C "Clic" website).

It may still be possible, but if so, I have no idea how.
@user-2, have you possibly investigated this at all?
I think the change may have occurred towards the end of 2017.

I fear that S4C took exception to people downloading their subtitles and have now hidden them. But perhaps you can discern something from the source of one of their programme web pages.

...

And sadly, BBC iPlayer still doesn't support Welsh subtitles.
However, I felt vaguely optimistic when I was looking through the release notes (I think) of recent releases.
(Not having used get_iplayer for a while, I needed to upgrade, and am now at 3.12).

I thought I saw a reference somewhere in there to downloads for S4C programmes downloading any subtitle streams there were (implying that there may be more than one). I may well have misunderstood that, and for the moment, cannot find it again.

However, having now successfully downloaded an S4C programme ("Rownd a Rownd", Tues 13.02.2018), and requesting subtitles, only the English ones were downloaded. (This is a programme that does have Welsh ones on Clic). So it looks like I was being a bit too hopeful.

FWIW, I've made another request to the BBC iPlayer people for support of Welsh subtitles for S4C programmes, and got the routine automatic reply. I'm not wildly hopeful, but thought it was worth trying.


Many thanks in advance if you can help on either aspect of this question.

user-2

If you want to download Welsh subtitles, sniff the URLs during playback on the S4C site. They are WebVTT files (.vtt extension). If you don't know how to do that, Google is your friend. If Welsh subtitles are ever offered via iPlayer, get_iplayer will endeavour to support them. Until such time, this topic is irrelevant to get_iplayer.

user-978

Fair enough! Many thanks again.

These forums are archived

See this post for further info