Wowza Community

Prevent Googlebot-Video from crawling - robots.txt / X-Robots-Tag?

Our VOD-Servers are being crawled by the Googlebot (User Agent “Googlebot-Video/1.0”)

We have many terabytes of data there and we want to avoid that traffic.

I would like to put a robots.txt to prevent the googlebot from continuing.
Wowza runs on its own subdomain so the robots.txt would need to be hosted from Wowza itself.
Is there a good practice for that

Or does wowza support ways of setting up the X-Robots-Tag in the HTTP headers?

Hi @Elio Wahlen robots.txt is added to a web server; WSE is not really a web server though, but you can add custom headers to your HLS chunklist though.

https://www.wowza.com/docs/how-to-add-custom-playlist-headers-to-apple-hls-manifests

Thanks for this good and helpful answer.

Is there also a way for MPEG-DASH and HDS (i know HDS is dying, but we still need to support it for a while for our customers)?

I just saw in our logs that google was using the HDS manifest to access the vod-streams…

Thanks. I had the some issue. Help me a lot . :slight_smile: @arquiteto

Dear @Rose Power-Wowza Community Manager

Unfortunately I now realize that you got me wrong.
The chunklist headers are not equivalent to the HTTP headers that I meant. Specifically I am talking about the X-Robots-Tag HTTP header that seem to be standard.
Please see here: https://developers.google.com/search/reference/robots_meta_tag

Is there any way for sending custom HTTP response headers across all HTTP based connections?
If not - what do you recommend? Is reverse-proxying e.g. with nginx a way to go? How would that look like?

I can imagine there are a lot of users that don’t want googlebot and possibly other crawlers to download all their video content because of different reasons (traffic, performance, legal circumstances, etc). It would be wise to have a plan here.

Best wishes, Elio

Apologies @Elio Wahlen. You can add custom http headers to hls/dash/hds by adding

httpUserHTTPHeaders property to your application (this is similar to the Access-Control-Allow-Origin cors http headers).

Here’s an example of how this can be added:
https://www.wowza.com/docs/how-to-stream-from-an-android-device-to-the-google-chromecast-device


the value for the property in your case would be:

X-Robots-Tag: noarchive

etc. It’s a pipe-delimited list, so you can add multiple headers.

if you prefer to host a robots.txt file, then you would need to have a custom HTTPProvider that handles requests for robots.txt; this is similar to how WSE handles
http://:1935/crossdomain.xml

https://www.wowza.com/docs/how-to-create-an-http-provider

@Rose Power-Wowza Community Manager thanks very much.

httpUserHTTPHeaders works nicely!

I think it would help people to have a general tutorial or manual entry about adding these super useful custom http headers. For now it seems to be hidden in two more special tutorials.

All the best, Elio

Fabulous idea! Thank you so much for the feedback.