Better name aggregation different programs

Program names are currently taken from the URL, however in the current implementation I have underestimated the messiness of these urls. Some of these shows have been indexed in wrong ways, for instance, by using the names of the attended program guests as the program title. Other shows, such as EenVandaag, use various names for different subprograms (like EenVandaag Politicus van het jaar). All these different names make filtering by show name more difficult.

Accented character issues

Accented characters should be ignored in the search query (e.g. querying coördinator or coordinator should make no difference).

Some of the accented characters can not be recognized, and are replaced by a �. Have to find out whether this is caused by the NPO or during the indexation/crawling etc.

Adding more context

At the moment it can be rather difficult to immediately understand what a sentence is about. It's unclear what is being said before and after the sentence, whether it is said by a host, a narrator or an interviewee etc.

— Adding a before and after sentence

This could provide quite some context. I have to think of a proper technical implementation though while minimizing server load.

— Adding stills

One relatively expensive but interesting way to provide context would be to extract still images from the different segments. For instance, in a segment that takes 5 seconds you could extract 5 stills (one frame for each second). If you would however over the sentence you would see the corresponding still popup. See the video below for an impression.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b8cff85d-af2e-4ea9-8314-b6afa061f6c3/hq_9_h264.mp4

Crawler

Refactoring...

Better overview of what is being indexed and what isn't

Statistics page. Better overview of those shows that contain subs and those who do not.