>From time to time I'm struck by entries in a cache file appearing to split what I would think of as a programme title with a subtitle, into a shorter title and an episode name...
For example if one looks at: http://www.bbc.co.uk/tv/programmes/a-z/by/h/all?page=3 you can see what appear to be three programme groups named: Hidden Histories (BBC2, Welsh history) Hidden Histories: Britain's Oldest Family Businesses (BBC4, three programmes about family businesses) Hidden Histories: WW1's Forgotten Photographs (BBC4, a single programme I think) When they were available some time ago I watched the Family Businesses programmes; one of the lines in my download history says: b03qlp97|Hidden Histories|Britain's Oldest Family Businesses: 2. Toye the Medal Maker|tv|... - that is, the programme name then was "Hidden Histories" and that episode was "Britain's Oldest Family Businesses: 2. Toye the Medal Maker". Similarly the WW1 photos programme in today's tv cache appears as: |tv|Hidden Histories|b03xsrvv|Unknown|WW1's Forgotten Photographs|||... which mean that as far as my own computer programs (processing these cache and history files) are concerned these entries all seem to refer to the same overall programme. And I suppose the Welsh history "Hidden Histories" programmes would also look like the same thing. A while ago I looked at get_iplayer's perl source code, but I'm not at all fluent in perl. I had the impression though that maybe get_iplayer concatenates various possible fragments of a programme's name, episode name etc into one long string then tries to chop it up again. And if it assumes that a string like "ABC DEF: GHI JKL" should be split on the first colon (which is sensible IF that was "Programme: Episode") then it will make a mistake if the string contains "Prog: Ramme: Episode"... As DP has had to sweat blood, or juice (do pumpkins have blood?) on parsing metadata recently, I wondered if any of the newer sources of metadata allow better discrimination between programme & episode titles? If it's possible to tell from the metadata sources that something is a Programme title, even if it contains a colon, surely it shouldn't be split there? And yet, my own code shows a few examples (seen over months, not necessarily recent) where programme names do have embedded colons in them, eg: "Doctor Finlay: The Further Adventures of a Black Bag" "Hamish and Dougal: You'll Have Had Your Tea" "Tim FitzHigham: The Gambler" "Hinterland: Series 1 (full length)" "MasterChef: The Professionals: Series 7" "The Choir: Sing While You Work: Series 2" "The Cruise: A Life at Sea" "Vets: Gach Creutair Beo" Why does it work some of the time and not others? Is it because get_iplayer assembles descriptions etc from a bunch of different sources (web pages, RDF pages ... whatever), or is it down to inconsistency in the way the BBC list their programmes? Or, is there also the BBC 'brand' to take into account? Maybe all of these programmes are part of the same overall brand? -- Jeremy Nicoll - my opinions are my own. _______________________________________________ get_iplayer mailing list get_iplayer@lists.infradead.org http://lists.infradead.org/mailman/listinfo/get_iplayer