PIERRE FAR: My name is Pierre. I work in web search. ILYA GRIGORIK: And
I'm Ilya, a developer advocate for the Chrome team. PIERRE FAR: We're here
today to talk to you about why HTTPS matters. And we want to convince you
that all communication should be secure by default. I know there may be
some skeptics here. We're going to talk
about the motivation to explain to you
why that's the case, we're going to talk you
about some common questions and misconceptions
that we've seen. And we're going to give
you some hands on tips and best practices about
how to go about doing that. ILYA GRIGORIK: So
Pierre, I'm glad that you mentioned those skeptics. Because I think most
of us will agree that secure by default
and HTTPS matters for things like
e-commerce and banking. But I think what
we're saying here is actually much, much stronger.
Right? There's a reason why
the title of our talk is "HTTPS Everywhere,"
and that is, we should be using secure
by default communication for all communication,
everywhere. And that includes my playlists,
the news articles I read, and where I read them, and all
other things that I do online. Because while it seems
like, individually, the metadata that is
available that you can gather by looking at these
unencrypted sites is benign, when you actually
put it all together, it reveals a lot
about my intent.
It can actually
compromise my privacy. PIERRE FAR: So that's actually
a very interesting point. I see my website there. I didn't ask for
that, by the way. And I write about
tech news stuff. It doesn't sound sensitive. Does that mean I really
need to encrypt it, too? ILYA GRIGORIK: Well,
I think my answer. The answer is yes. And there are actually
two reasons for that. The first reason is
you want to protect the privacy of the visitors
coming to your site. So as we said, it's
not just your site. It's also others' sites. And second is you want to
protect your site as well. So all of us are
developers here. We're building
sites and services. And we want to make sure
that we don't give out access to malicious participants. For example, maybe you have
an admin section of your site, or you want to make sure
that your service is not compromised by a
malicious attacker. So when we talk about
security, we actually mean three specific things. We're talking about
authentication, data integrity, and encryption. Authentication makes
sure that the servers that we are talking to
are who they claim to be.
So for example, if you're
talking to your bank, you want to make
sure that you're talking to the right entity. Data integrity means
that the data is not modified while it's in
transit between your client and the servers. And encryption, of
course, protects the actual communication
from eavesdroppers such that they can't figure
out what's going on here. PIERRE FAR: And this is
actually an important point, because when I'm talking with
people, when we say HTTPS, people immediately think
it's only about encryption. But in reality,
you need all three. They're equally critical. Because just to give
you a simple example is that you can actually
have an encrypted channel to an attacker. And you wouldn't
know any better. And that's not actually
protecting you. That's why authentication
and data integrity kick in. And all three of them are
what form a secure website. ILYA GRIGORIK: Right. So the good is, this is
actually all taken care for us by the transport
layer of security.
So for those that
are not familiar, transport layer
security is a protocol. And when we talk about
HTTPS, we actually are just talking about
HTTP running on top of TLS. And of course, TLS
is actually used for a variety of other
applications as well. For example, the Gmail team uses
TLS to secure mail delivery. But in this particular
session, we're going to focus on HTTPS, and
on the web in particular. So I think a lot of this
sounds very abstract. Authentication, data
integrity, and all the things. So let's actually take a
look at a hands-on example. So we've been scheming
this kind of crazy idea. We want to launch a new
service, tasktip.com, where you can come and
create guides and find guides for everyday things.
Anything from how do I make
a great omelette to how do I configure my Wi-Fi router. So let's actually step
through the exact steps of how we would make
it secure by default. PIERRE FAR: So before we get
into this– like tasktip.com, what we're saying– one
of the examples we had is that we're going to tell people
how to make a good omelette. And that doesn't
sound that sensitive. So I wanted to go
back to what we said earlier– do we need
to bother with HTTPS here? ILYA GRIGORIK: Well, once again,
I think you know my answer. But I'm curious. What do you guys think? A show of hands–
how many people think that we should
encrypt and deliver all of the content via HTTPS? All right.
So there's definitely
some skeptics here. So let's work
through some examples and see how things can go wrong. So first of all, Pierre,
I know that you're coming from the London office. So you're probably a
little bit jet lagged. You came here for
I/O. And I'm guessing you had to pay a
couple of visits to your local coffee shop. PIERRE FAR: That's true, yes. ILYA GRIGORIK: OK. You're also coming from
London, which tells me that you probably have
roaming charges on your phone. And let's face it, those
are pretty high still. So you're probably pretty happy
to use their Wi-Fi, as well. PIERRE FAR: That's
true, too, yes. ILYA GRIGORIK: OK. Well, when you use
unauthenticated Wi-Fi or unsecure Wi-Fi,
passive attackers can actually just listen
in and gather data about what you're doing.
So they can see which
sites you're browsing to, what you're doing,
how much time you're spending on each article. And that includes all
guides on tasktip.com. So that's a problem. So the combination of
all of these signals actually reveals quite a
bit about what you're up to. And that's kind of
scary, isn't it? PIERRE FAR: Yes. That is very scary. But the other thing is
that– the key point is what Ilya was saying, is
that any individual visit is not interesting on its own. It's when you start looking
at an aggregate– about all the behavior that
somebody eavesdropping on that unsecure connection–
that you can start to build a bigger
picture and a better idea about what that person is doing.
And this is why you need
to secure everything, because you don't
want to be part of that pile of information that
your users are inadvertently showing to the world. ILYA GRIGORIK: Right. So as owners of the
tasktip.com site, we want to protect the
data of our visitors, right, to protect them. But also, as good
citizens of the web, you want to make sure that
the web is getting better, is getting more secure. So that's a good thing, too. Now compare this
scenario if we had HTTPS deployed on
all of the sites. So instead, when you're
using this public Wi-Fi, all of the communication would
be done over encrypted channels with each of the servers. So the attacker couldn't
just listen in and figure out what we're up to. So this sort of passive
attack, where you just take a seat back and just
listen to what's happening, is basically impossible, which
makes the web more secure.
Now this is great. Right? This protects our users. But HTTPS also provides
important features for us as site owners to
protect our site. So let's talk about
another example. Instead of just being
a passive attacker and just listening in
on what's happening, an attacker can actually
target your site. Right? So perhaps you have
a particular user that the attacker
is interested in. They could actually
trick the user into visiting a site that is
hosted by them– at which point all bets are off as
to what they could do. PIERRE FAR: Yes. And this is actually
quite scary. If somebody's actively attacking
your users like that, what that means is that they get to
change every single bit about your website, and you
wouldn't know any better. They could change the text. They can steal passwords. They can do whatever they want. ILYA GRIGORIK: Right. And that's why we have
server authentication. So when you install an HTTPS
certificate, what you're actually doing is
providing a guarantee that when the client actually
connects to your server, they are talking to
the right server.
Right? So somebody else
can't come along. They can get your
public certificate, but they don't have
your private key. So they can't complete
that handshake. And that is why, when
you get that green lock icon in your browser, that you
get that warm fuzzy feeling, because you know you're
talking to the right entity. So that's very important. PIERRE FAR: And this is
why we have certificates, because they add this extra
authentication that you know who you're talking
to, and your users would know if
something is wrong, if somebody's trying to actively
intercept their connection. ILYA GRIGORIK: Right. And that's why we need
the three properties that we mentioned
earlier, right? It's not just encryption.
So as a summary
here, TLS provides three important properties. First of all, attackers
can't listen in, because we're encrypting
the data that's in flight. The attackers can't
tamper with the data, because we're
checksumming all the data as it's being transferred. And the attackers can't
impersonate to the site or service, and they
can't use our site as an attack vector against our
site, and our users as well. PIERRE FAR: So I'm convinced. We're going to make tasktip.com
to be completely encrypted. What we'd like to
do now is to look at some practical tips about
how to go about doing that. Between Ilya and
I, when we talked with developers and
webmasters, we've seen a pattern about what
kind of questions people ask and what kind of
concerns we have. And so what we've seen
is this uncertainty, and people are afraid. In reality, once you
know what you need to do, it's actually quite simple.
So to help you think
through these things, we've come up with
this list of tips about what you need to be doing. We've broken it into two parts. One is about the
operational stuff. How do you buy a certificate? How do you set it
up on your website? And the other one is that
most content right now is not secure on the web. So how do you migrate
to a secure website? Ilya's going to talk about
the operations stuff. I'm going to talk about how
you'd make it HTTPS friendly, and how do you migrate there.
And I'm also going to talk
about how you make sure that Google's algorithms
will index that and show it in search results,
so that you don't lose any search
referred traffic. ILYA GRIGORIK: Right. So this actually is a
fairly simple process, once you know what you're doing. So we're going to give
you some checklists that you can follow along later,
and make your site be secure by default. So here's my
sys-admin checklist. We'll go through
these step by step, so don't worry about
writing them down. I think the most
important takeaway about this particular checklist
is that you should actually follow it in order. The biggest mistake I find is
that people just skip a step.
So for example, they
get the certificate, but then they don't
really verify their server configuration. They're just going to
drop it in and let it go. Or they do their
verification, but they forgot to test for
performance in other things. So go in order. But having said that, let's
start at the beginning. So first things first. You should be using
2,048-bit certificates. If you already have HTTPS on
your site and you're using 1,024-bit certificate, that
is no longer strong enough. So please upgrade and
get a new certificate. And speaking of getting
new certificates, perhaps the biggest and the
first question that I get is, well, what about cost? Aren't these things
super, super expensive? And the answer is, it
depends on the use case.
If you have a non-commercial
use case– for example, I run a personal
tech blog, that's a non-commercial site– you
can actually get a free TLS certificate from a number
of different providers. If you have an open
source project, you can also get
free certificate from a variety of providers. Use your favorite search
engine to look for those, and you'll find them.
Now, if you have a
commercial use case, then you need to think about
what type of certificate you actually need for your site. PIERRE FAR: And that's
actually something that I've always
been confused about. How do I know which
one do I need? And so, can you help me there? ILYA GRIGORIK: Right. So it's actually pretty simple. The single host certificate
is the cheapest one, and it's exactly as it sounds. You have a fairly simple
site, where all your content resides at example.com,
for example. Right? And you serve all your
content from there, and that's all you need,
then you need a single host certificate.
That'll cost you $10. And you're off to the races. Now if you have a slightly
more advanced use case where, for example, you have
the site localized– you have a .com and a .co.uk. Or maybe you have a
couple of subdomains, because you're using a
CDN to serve your content, you will need a
multi-domain certificate, or the multi-domain is
probably the best one that you should use. And the key observation
here is that you know all of the subdomains. Right? So you can list
them out, and you can get a certificate that
will cover multiple domains. And then finally,
the wild card one is definitely an
advanced use case, where you perhaps don't
know all of the subdomains or all of the origins
that you will need. And that one can be quite
a bit more expensive. So it's up and north of $100. But basically,
it's not that bad. PIERRE FAR: Yeah. And that's the thing. It's not that bad for
the most common use cases that most of the
developers here would face. And the amounts we're
talking about money wise are not that expensive.
ILYA GRIGORIK: Right. So somewhere
between $10 and $30, and even for a commercial site,
you should be up and running. So that's step number one. Once you've got
your certificate, you need to configure
your server. And let's be
honest, for somebody who has not spent a lot of time
kind of looking and getting familiarized with
all the options, it can be a little bit daunting. Because first of
all, let's face it, a security can be complicated. As an example,
this cryptic string here is actually the
recommended cipher suite list that your server should be
advertising as of today. That's the requirement
list today. But please don't try
and write it down, because the biggest
problem that I find is that a lot of developers
try to piece together their configuration from a
variety of different sources.
Some of them may be outdated. Some of them are maybe are not
recommending the best things. So instead, please go to
one canonical source that is up to date, and use
that as a starting point. And the good news is, there
is such a destination. The Mozilla guys actually
have a great wiki page, which they keep up to date
with all of the best practices. And that includes
configuration tips for all of the most
popular servers. So if you just follow that link,
and that's a great place to get started. So yeah. PIERRE FAR: I'm glad you
didn't read this, by the way. So it's still a bit scary. But how do I know that
I've done it correctly? How do I know that I've
actually followed the Mozilla recommendations correctly? ILYA GRIGORIK: Right. So once you've enabled
all the 55 flags and configured
your sector suite, and all the rest– right? I have the same question. And the tool that I use that
I like to use, and recommend that you guys try, is
the Qualys SSL Labs test.
So they actually provide a
service where you can go in and type in the
name of your site or the IP address
of your server. And they will scan the
server, and they'll run a whole battery of tests,
and verify all the latest settings, all the
recommendations, and all the rest. They'll give you a score. And they'll give you
hands on tips for things like, hey, we
found this problem, you should look into this. So this is a great
tool to use anytime you update your
server configuration. Anytime you change it,
please use this tool. PIERRE FAR: Yep.
So this is a very
nice tool, by the way. You can use it to
test any websites. So for example, if you're
trying to buy from a website, and you're not sure that
the security's that great, you can use it to test that. Also, you can use it
to test your own server and see what you've done. And because it tells
you what you're missing or things that
could be improved, you know what you can
go about doing next. ILYA GRIGORIK: Right. That's actually a
great point, right? So we talking about
configuring your own server.
But of course you
can use this tool and some other similar tools to
also test your favorite sites. And perhaps if you find
an issue, talk to them. Send a bug. Get them to fix it. So next, once you've got
your server configured, the next question
I get is, well, what about the CPU
resources on the server? I heard that all this
cryptography work adds a lot of overhead
and all other stuff. So the thing to
recognize here is there are two steps to
establishing the TLS tunnel. There is the asymmetric
cryptography, which is the step where we
verified the certificate and do the public crypto. And then there's a
symmetric cryptography, which is where and how we
encrypt the actual application data. The first part is
the expensive part. And as Jacob points out
here, it is very important that you leverage things
like keepalives and session resumption to optimize
for this step.
So keepalives allow you to
reuse the same connection between different requests. Which means that you will only
need to do that handshake once, and then the actual
overhead is much, much less. And session resumption
actually means that we can reuse the
negotiated parameters of our secure session
from a previous session when we establish a new one. So once again, we can
skip that handshake, and significantly
reduce the overhead. So as you can see, the
experience at Twitter is that it basically added
negligible CPU overhead when they deployed it correctly. And the emphasis, of
course, is on correctly. PIERRE FAR: Correctly. Exactly– yeah. ILYA GRIGORIK: So after that,
the next question is, well, OK, great. But I'm still not
convinced, because there's a bunch of vendors, and
there's a lot of literature that I've read about things
like dedicated hardware, and TLS offloading, and all
of this other stuff. And the answer
there is, it is true that about a decade ago, these
handshakes were so expensive that we actually needed extra
and dedicated hardware that you had to buy.
That is no longer the case. Modern CPUs are well optimized
for the kinds of work that we do in the TLS stack,
such that Facebook, Google, Twitter, and others all
run TLS purely in software without any
specialized hardware. This is all just
commodity hardware. And that should work
for you guys as well. PIERRE FAR: So that
makes things easier. Basically, get a
2,048-bit certificate. Configure and test your servers. And you get to use
the currently hardware that you're already
using now anyway. Anything else? ILYA GRIGORIK: Right. So yes, there is, actually. And one of the most exciting
things to me personally that's happened in
the last couple years is the development of
protocols like SPDY and HTTP 2. So for those that are not
familiar with SPDY and HTTP 2, it's a new initiative to
address some of the performance bottlenecks in the
existing HTTP 1.1 protocol.
And when we enable
SPDY– so first of all, it turns out that you
need TLS in practice to deploy these new protocols,
for a variety of reasons. But once you have TLS, this is
just basically a config flag away. Most of the popular servers–
Apache, Nginx, Jetty– all support SPDY so that it's
something you can enable. And you can see
some numbers here. This is data for some of the
most popular Google services after they've enabled SPDY. And this is compared to HTTPS. So we're making the
performance much, much better for the actual page load time
by enabling these things. So that's pretty awesome. But it's not just making
performance better for the users who
are visiting the site and improving the
page load time. It's actually also improving
performance on the server. Because one of the key
features of SPDY and HTTP 2 is that its goal is actually
to use a single connection, instead of having many
connections open to the server. And by having a
single connection, it means that there
are fewer handshakes, there are fewer
sockets, there are fewer buffers that
we need to allocate.
And all that means that we
consume less amount of memory, CPU usage, and other
things on the server. So for example, in this
study– this was actually done with mod-spdy
and Apache– there was a server workload that was
run comparing HTTPS and SPDY. And they found that, when
they ran it over SPDY, because there's
fewer connections, the load on the server
was actually much less. Because we're just
much more efficient about reusing the stuff. So truth of the matter
is, enabling TLS and SPDY may actually decrease
your ops cost– which by itself is a really
nice selling point.
PIERRE FAR: Yeah. Exactly. Just do it. ILYA GRIGORIK: Yeah. So as a quick summary, right? Certificates are
not that expensive. It really depends
on your use case. So make sure you pick
the right use case. Don't just assume
that you need to get the most expensive
certificate out there. You don't really
benefit from that. There are great
tools that will help you verify your configuration. I mentioned Qualys. There's a bunch of others. They're all very good. You don't need
dedicated hardware. And there's a lot
of optimization that you can do to
optimize your stack. Now, unfortunately, we
don't have the full day that we really need to really
go into the details of how to tweak and optimize
each and every bit. But I've put together a
site– isTLSfastyet.com. By the way, the answer is yes. And you can go there
and you can just follow the links to
learn more about how to configure your servers. So if you're a
sys-admin, or if you need to talk to your sys-admin,
point him to that resource.
And hopefully it'll help
you to make it fast. PIERRE FAR: Right. So I see we've
figured out everything you need to do on the server. How do you make it be as
friendly for both your visitors and for search engines? There are a lot of
misconceptions in this space. And I hope I'm going to
clear up quite a few of them. So the first I want to
start with is actually the most basic one. We don't treat, currently,
HTTPS sites any differently in search. They get indexed and ranked
like any other website. OK? But what we're seeing
is that webmasters can break their secure websites
in many different ways. Doing it right is easy. And I'm going to show
you how to do it now.
The general pattern of
breakages that we're seeing is that the indexing
signals that we see on the website– both
technically on the pages and in servers– they
are inconsistent. OK? So the one thing you need to
do to make your websites be indexed correctly
as secure websites– make sure that all
the signals you are sending us are
consistent, and all of them point to the fact that you
want the secure website to be indexed.
Now a lot of the signals
that we talk about, you're going to be need doing
them for your users anyway. So let's talk about
them in detail. This is the checklist. It's actually very simple. This is roughly in the order
that you need to do it anyway. The first thing is that you set
it up on your server correctly. And then you set up all the
signals, in terms of– we'll talk about these in a second. About the page resources,
like JavaScript and CSS, how you link to other
content on your website. Fix those up. Then you give us some signals
that are not visible directly to the user. But our algorithms will
help them decide and pick the right one. And once you have
it all done, we'll talk about how Webmaster
Tools can actually help you to figure out– keep
monitoring that everything is humming along nicely, and
also some error reporting that we see there.
So first things first. Make sure that the
certificate is not broken. Now what do we mean by that? There are quite a few
ways they can break. The most common one is that
we see that the server returns the incorrect host name for
the certificate– there's a mismatch. OK? So verify that the
server's not doing that. Verify that what
you're trying to secure is actually what it claimed
to be wanting secure. The other thing
is that we've seen that the certificate itself
has an incomplete certificate chain. OK? Don't do that. The way you do it–
when I set up HTTPS on my website, where I bought
the certificate made it very easy to actually tell me
this is the certificate chain that you need to do.
The other one is that
certificates expire. When you buy a
certificate, it's actually for a certain period of time. One of the things that
we see is that people forget to renew them. So here's a tip. Put it in your calendar
a week in advance, so you can renew your
certificate before it expires, so you get the continuity. And the other thing is that
keep an eye on Webmaster Tools, because we do try to
send you a message when we see a problem with these,
like the certificate expired, or there's a mismatch in the
host name with the certificate. We try to alert you for these.
So we'll talk a little bit
more about Webmaster Tools. Right? ILYA GRIGORIK: Right. And tools like Qualys would
actually catch some of those as well. PIERRE FAR: Exactly, yeah. ILYA GRIGORIK: So
please use that. PIERRE FAR: So
the first thing is that– this example
is JavaScript, but it applies also to CSS– is
that we've seen some websites actually hard code HTTP
resources on secure pages. That's a problem. And the reason for
that is two-fold. Because sometimes browsers
will block insecure resources from being downloaded. So your web page will
not work as intended. And the other thing is that,
if the browser doesn't block that download– actually
downloads it insecurely– you're opening up
a security hole in your website for your users. So fixing that is
very, very important. And the other thing is that it's
one of those inconsistencies that we can pick up
on in our indexing. If we see that a secure
website doesn't quite know what it's doing with
their page resources, we might be indexing your
website inconsistently.
ILYA GRIGORIK: So
Pierre, just to clarify. PIERRE FAR: Yeah. ILYA GRIGORIK: That slash
slash kind of looks weird. Is that correct syntax? PIERRE FAR: That's
actually a good point. So look at this. Who knows relative URLs here? You have all done it. This is a special
kind of relative URL. It's a protocol relative URL. Now we're all developers
here, and we all develop on own workstations. When you have this, that
means you can actually develop and test your website
as you're building it, without having the certificates
installed on your server locally. OK? So when you have it
this way, it will work on your
development machine. And when you deploy
it live, it'll actually be secure and use
the server configuration. ILYA GRIGORIK: Right. So if the page is rendered
on a non-secure origin, that will just say,
well, I want to load the unencrypted
version of script.js. PIERRE FAR: Exactly. Right. So the other thing
that we see is that sometimes we have
a secure web page, but then it has
a hard coded HTTP link to other pages
on the same website.
So again, this is
an inconsistent one. OK? You're securing this
page, but you're linking to the insecure page. And we'll talk about
why that's bad, because it will incur redirects
and it will hurt performance. But also it's something that
our algorithms can pick up on. If you're being inconsistent
about how you're linking, it might actually
be a problem for us. And again, if you use the
protocol relative URLs, it's actually useful for
you when you're developing. So these are the basics.
This is the answer
about how you can get your secure website indexed. If you know about
how this works, there's nothing
magical about it. It simply is two steps. The first one is that you make
sure that any insecure URLs redirect to the
secure counterparts. OK? And you use that with
the 301 redirect. And the other thing
is that you emphasize that signal to our
algorithms by having a rel canonical on
that page itself– the secure page itself–
the rel canonical will point to the page itself. So it's a self-reference
rel canonical, which emphasizes the
signal that you're sending us with the redirect. OK? And remember here, both
Googlebot and the users are getting this redirect.
Treat them the same way,
and everything will be fine. Redirects are very,
very important. OK? Because that means when
the page is being shared, when it's being emailed,
when it's being bookmarked, that it's the secure
version of the page that is being used here. The other thing is that the
redirect is an important signal for our algorithms that say,
don't index the insecure URL, index the secure URL. OK? And again, treat Googlebot
and users in the same way. ILYA GRIGORIK: Right. And this is also important
because chances are, you've had your content
linked by many other sites. And you want to make sure that
those links, or the users that are following those links,
end up on the secure version. Which is once again why
redirects are so important. PIERRE FAR: Exactly. And this is especially true
for established websites. It's easier when you're starting
from scratch being secure. But established websites
that are migrating, that will not be the case.
OK? Here's the other thing. There's a tip for redirects. We've seen this, and I was
personally guilty of this, is that sometimes we see
the insecure redirects go in a chain. So in this example
here, so the insecure– the HTTP URL without
subdomain WWW– actually redirects to the
insecure WWW subdomain, which, in turn, redirects to
the secure version of that. That's extra latency
for your users. The browser and Googlebot are
OK with short redirect chains. That's fine. But you're actually incurring
extra latency for your users. What do you do about this? You already know the answer. You know where they're
going to end up in. OK? So just send them
there directly, and then you don't
incur any extra latency.
And this is doubly
important, especially for your mobile users. ILYA GRIGORIK: Right. And I was actually
just going to say, for mobile users
is where I actually find this to be the
biggest problem, because you end up with
the mobile redirect that goes to the– first you
have the domain, that goes to the mobile site, that
then takes you to the HTTPS site, right? And this is extremely
expensive, especially in mobile, because the latency is so high. So I have one tip for you
guys that would actually help you eliminate a
lot of that latency. And that's HSTS, or the HTTP
Strict Transport Security. So this is a header
that your server can return when it returns
the page which actually sets a policy on the browser.
And it consists of two parts. First there's the
max-age, and then there's the includeSubDomains. The max-age basically
says, remember this policy for this amount of seconds. So this is just like
a caching header. And the includeSubDomains
does exactly what it implies. It just means that also apply
this policy to subdomains. When this policy
is set, the browser will remember that you want this
site to be accessed over HTTPS. And every time the user
tries to request your site, it will automatically take
them to the HTTPS site. So it will actually
skip that redirect. It'll apply that
rewrite on the client before it even
sends the request. So this is important
for performance, because obviously it
eliminates the redirects, which is what Pierre is describing. But also, it eliminates
an entire class of downgrade attacks
against your user. So those two things combined is
what makes HSTS so important.
PIERRE FAR: Yep. And not just that. I mean, serving HSTS is another
signal that our algorithms can potentially look to, to
see that you actually really, really want us
to index the secure page. ILYA GRIGORIK: Right. PIERRE FAR: So here's
the other thing. So those class of signals
we've talked about now are kind of visible to users. Now redirects for
Googlebot– Googlebot obeys the robot exclusion
protocol, which in this case, we're talking about
the robots.txt file. You could actually
be– because browsers don't look at the
robots.txt file, you could actually be
inadvertently blocking Googlebot from
accessing either or both of your insecure
and secure website. OK? These are directives. So we would follow them once we
discover those rules that says, do not access the
secure or insecure page. This can lead to
serious inconsistencies about the indexing
of your website. So the first thing to look
at is whether Googlebot is allowed to crawl both the
insecure and the secure URL.
So that means we are able to
discover the redirect and also index the secure content. OK? And you can use that
with– in Webmaster Tools, there's a tool called
Fetch as Google. And if you give it a
URL, it will tell you whether Googlebot
is able to access it or not, and what the
content that is returned. OK? So that's the first program
you need to check out. The other thing is that,
supposed you've done all this, and you've allowed Googlebot
to crawl the pages. We've seen webmasters
inadvertently add the noindex that directs us
to not index the secure page.
So that they've got us to
crawl it, and then they say don't index it. Now, there are two
ways to put a noindex. You can do it
either in the HTML, and you can do it either
as an HTTP header. This is what they look like. Remember, this is
not what you want to be doing if you
want the page indexed. OK? So don't copy and paste this. This is what you need
to be looking for to make sure that it's
not there on the page. And again you can use Fetch
as Google in Webmaster Tools to see if that HTML or the
HTTP header is being returned. OK? Now the other thing
is that the rel canonical– we've
talked about that. And we've seen
inconsistencies, again, where the secure page– although
it redirects and everything, it will actually say,
the rel canonical references the
insecure counterpart. So it references the
HTTP version of the URL. And the problem with this is
that, a, it's an inconsistency, and can lead us to index
your content inconsistently.
OK? But also, if you do it right,
it emphasizes the signal that you're already
standing with the redirect. ILYA GRIGORIK: Right. So what I found
when I was migrating my site is I had to audit
both my past content, because I had examples of
where I hard coded HTTP links and I had hard coded HTTP
resources in other places. And you mentioned Fetch
as Googlebot, right? So using Webmaster
Tools was actually very helpful to
find these problems. And I know that you
guys have been working on a lot of awesome
stuff to improve it. PIERRE FAR: Yep. Thank you. Webmaster Tools is a
developer's friend. Use it. The way it works
is that you verify ownership of your website. And once you verify ownership,
we share with you data that you don't want
others to know, like indexing and ranking
information about your website.
It has a ton of features. I'm not going to go
through all of them. OK? So please verify your
slides and start exploring. For secure websites
in particular, I'd like to highlight
a couple of things and give you a couple of tips. So let's dig in for here. The first thing is
that I would like to verify all variants
of your websites. And we'll talk about what
that means in a second. There's also two reports
that you need to be checking. One is called "Index Status" and
one is called "Crawl Errors." What are these? Let's start with verifying
all versions of your websites.
Now these four here that I've
listed– the HTTP, HTTPS, and WWW, and without the WWW,
are treated as different sites. Now if you're doing
things correctly, you're redirect to
just the one version. OK? But you need to
verify all of them. Because what we've
seen sometimes is that the protocol and
subdomain– the protocol and host name
combination– can actually have different problems. And you wouldn't know
about that unless you keep checking all
of them separately. OK? So verify all of them. And yes, by all means, check
the canonical version most, because that's what your
users are interacting with. And that's where you're
redirecting everyone. But keep an eye
on the other ones. ILYA GRIGORIK: So this is
actually very important. I just want to
highlight this one, because this is something that
caught me off guard as well. I didn't realize that each of
these sites is a different one. So when I migrated
my content– you'll see this in the next slide–
and you see my indexing go down.
I was like, oh, my
god, I broke the web. And it turns out
that I just needed to add the other profile, and
I saw everything spike up. So it's a different profile. PIERRE FAR: And speaking
of different profiles, lots of websites have a separate
mobile URL– separate sites. While you're doing all
these verifications, make sure you're also
verifying the mobile site, because, again, it's
a different one. And just to see if there
are any problems with that.
Now this is what Ilya
was talking about. This is the index status report. And these are actually
two separate reports. I took them from
the same web site. OK? And what you can see is that–
this is the basic report. Index status has
basic and advanced. And what index status
does is that it tells you about the number
of URLs we know, and the number of URLs
that have been indexed. OK? Historically, that
wasn't the best report. So we've improved that recently. And you see that's
still in the graph. There's the vertical
line that says update. Now on the left is
the insecure version– the HTTP profile of the site. Notice that we've
historically said, there are lots of, like,
10, 15 pages indexed, and then it drops to zero. What's happened in this
site is that the site has moved to the HTTPS version
quite a while back, but Webmaster Tools
wasn't reporting it. And so when we did
that update and started reporting the data
accurately, that says, you're looking at the HTTP
URLs instead of the HTTPS URLs. OK? The indexing dropped.
And this is what you
want, because if you're doing everything
right by telling us that we should index
the secure website, and if you're giving us
enough time for the algorithms to update the indexing, the
insecure site has to flatline. OK? You want this insecure
website to go to zero, and not have a heart
attack like Ilya did. OK? But what you want to look at is
the secure version of the site, and see the way that the
pages are being indexed. ILYA GRIGORIK: And
chances are, if you do have– if it's
not flatlining, then you're probably missing a
few redirects or other things that you should
hunt down and fix. PIERRE FAR: Yeah. Or actually, a common one
is also robots.txt blocked, so we actually don't know
that there's a redirect. ILYA GRIGORIK: Right. PIERRE FAR: OK? And this is why you need
to verify both of them and see if there's
a problem or not. Similarly, we have something
called "Crawl Alerts." And just like index
status, keep checking and see if there are any
problems that you don't expect.
If you see that the host name
and the protocol combinations, there's something
unexpected going on. We break it into sites wide
and URL the specific ones. And we have smartphone and
desktop specific errors. So again, something
to keep an eye on. And the last thing is, a
couple weeks ago, we massively improved our documentation
for how to move a website. And to move to secure websites
from insecure websites is a site move– it's
a type of site move. And so I'm not going to go
through the recommendations right now. If you follow that
link, or if you just search for making
site moves easier, you'll find this documentation. And it's very, very detailed
about all the things you need to be
thinking about when you're migrating to secure URLs.
It covers all of the stuff
we've talked about and quite a bit more. So in summary– ILYA GRIGORIK: Yeah. And that resource is
actually very, very good. It helped me quite a bit
when I was migrating my site. So we talked about a
lot in the session. So just a quick recap. When we talk about
TLS, really, we're talking about encryption,
authentication, and data integrity. TLS is not slow. Right? We've talked about
a lot of things that we can do to optimize it. And TLS can actually be used
to make your sites both faster, and your operations,
actually, use fewer resources. So that's all important. We should be using
HTTPS on all sites. HTTPS everywhere. Right? That is our message here. And there is some
work that you guys need to do to
update your content. You need to audit your
content, fix mixed content, fix your links,
and other things. And make sure that you're
sending the right signals.
Webmaster Tools,
Qualys, and there's all this infrastructure
that is already in place that is available to
you guys to make this work. PIERRE FAR: Right. And there's more
here at I/O. Tomorrow there's a talk at 3:00 PM
about how we deploy security at large scales at Google. So if you're interested in doing
this and see our experience, please come to that. And also at 3:30
PM tomorrow, we're doing a sandbox session– I
think it's on level two here– to talk about some
common mistakes, and talk about if you have
any SEO questions which cover some of this. You can also come and
find us, ask us questions.
And we'd love to have
your feedback, please. If you scan this QR
code, it will take you straight to the feedback
form, or you can just follow that link. And thank you for coming. [APPLAUSE] We have a few minutes. If you have any questions,
you can ask us now or you can come afterwards. AUDIENCE: [INAUDIBLE]. PIERRE FAR: Can you
actually go to the– ILYA GRIGORIK: Can
you use the mic? PIERRE FAR: If you go to
the microphone and then ask. AUDIENCE: Is there any talk
that will cover blacklisting of certificates,
specifically in Android? ILYA GRIGORIK: Can
you elaborate on that? AUDIENCE: Well, I
believe since Jelly Bean, there is a facility
to blacklist CA and EE certificates in Android. ILYA GRIGORIK: So I'm not
familiar with the exact process in Android for blacklisting.
There are things in TLS, things
like revocation checks, that are used to revoke
bad certificates. I'm not sure abut denying
access to the entire CA. If you want, we can
follow up later. PIERRE FAR: Eric? Do you know the answer, Eric? ERIC: Yes. In general, one thing we might
add to the recommendations there is to do OCSP stapling,
because online certificate revocation is not reliable. So what we do at
Google is push out changes directly to
Chrome or to Android to try to blacklist some very
particular famous certificates that are actively being
misused in the wild. PIERRE FAR: Thank you. AUDIENCE: What about
those of us who make our money
through advertising, and therefore may have
some significant revenue in advertising that
we can't control that's coming through a
DSP or something like that, and therefore lose
a lot of money? PIERRE FAR: That's
a great question.
So this is something we care
about quite a bit at Google. AUDIENCE: Yeah. You guys have the– PIERRE FAR: And that's in Steam. It's actually enabled–
there's an ad unit that works now with secure pages. So you can use AdSense
for those pages. And I think more
and more networks are starting to support
advertising on secure pages. AUDIENCE: Yeah. But I think the
best policy should be to have your
advertisers make sure they do support secure pages. PIERRE FAR: That's
the other thing, is that if you're not using
an ad network– if you're talking to
advertisers directly– make sure that their
infrastructure supports that. Because otherwise
you'd be including an insecure JavaScript ad unit. AUDIENCE: Or a DSP. PIERRE FAR: Yeah. And then that won't
work, or it might not give you the results
you are looking for.
ILYA GRIGORIK: Right. But just to highlight
what Pierre said, there is now an ad
code that you can install which is TLS friendly,
which was not the case before. So that was released
earlier this year. So please take a look at that. AUDIENCE: Hi. So I've got a large site
with millions of pages. And out of fear of a
search traffic catastrophe, I've decided to move just a
small fraction of the pages at a time to SSL. But I believe the
recommendations say to move the
whole site at once. PIERRE FAR: Yes. AUDIENCE: And I was wondering
if you could comment on that. Is this a bad approach? I don't want to do all of this
at once, and all of a sudden find out there's no
crawling happening, or I've made a huge mistake
and I'm losing money everyday.
PIERRE FAR: So this goes back
to the fear and uncertainty that webmasters go through. And I agree with you. Now the best thing
to do is actually bite the bullet and jump in. OK? Do it all in one go. Now before you do
that, of course, you need to set up and make sure
that the infrastructure does exactly what you need to do. So if you look at
the recommendations, they'll say, set up a separate
copy and test that properly. OK? And make sure that
everything is working on that before you move. But once you move, we've
seen some really badly implemented site moves that
way, that when you move pieces at a time, that doesn't quite
work out as you intended. AUDIENCE: Can you
explain why that is? ILYA GRIGORIK: There are
quite a few edge cases. So if you like, to go
back to the example where you would have–
for example, if you use the protocol relative URLs
to link to other content on your site. If you're on a
secure page, you're actually linking to another
place that's secure.
But if you're not serving
that already as secure, you're basically sending them
to a page that's supposed to be secure, but isn't. And the browser can throw an
error or something like that. AUDIENCE: Right. Well, so my case, I will have an
access volume of 301 redirects. I'll have more 301 redirects
than I think I would like. But I'm weighing that
against the fear. PIERRE FAR: So
that's the thing is that, if you follow
the guidelines– the recommendations– and if you
have any questions about them, please come to our forums. And we also have
webmaster office hours. You can come and join and
ask questions directly. And don't have the fear. Just make sure that you
follow the guidelines. And make sure you follow all the
steps– what Ilya was saying.
Make sure you think about every
step that has been highlighted. And you should be fine. AUDIENCE: OK. Thanks. PIERRE FAR: Yeah. Good luck. AUDIENCE: I imagine
that you need to have both HTTP and
HTTPS in your sitemap.xml. PIERRE FAR: No. You'd need only the HTTPS. So does everybody know what a
sitemap is here, by the way? Yeah? OK, so a sitemap is basically
exactly what it says. It's like here's
a list of links, URLs that are on my site.
What we recommend is actually
have just the one version, which is the secure version
of your site, in the sitemaps. Now during the transition,
it's OK to actually have an HTTP one. Just produce one to
monitor indexing. Because if you look at
the recommendations, we report the
number of URLs that are indexed for each sitemap. So if you have the
HTTP one, you would start with like 100% indexed. And you want it, over
time, to drop to zero. And if there are still ones
that are still indexed, it would be good to look at
why they're still indexed and what's going on there. So during the transition,
it's OK to have both. But in the long run, you'd want
to move to just HTTPS URLs. AUDIENCE: And once it flatlines,
then get rid of all the HTTP– PIERRE FAR: Yeah. And that's the thing. It's like you don't
need sitemaps. So if you want to skip the
monitoring using sitemaps with insecure URLs, that's
also perfectly fine. So by all means go ahead
and just submit sitemaps with HTTPS URLs only,
and you'll be good.
AUDIENCE: Thank you. And just to be clear–
I intend to do this– we switch over to HTTPS
in the canonicals. PIERRE FAR: Yes. AUDIENCE: And we
set up the 301s. We're going to retain all our
indexing in Google search? PIERRE FAR: Yes. So it's a per URL process. It's an organic
process for each URL. So it's not like the site will
move million of pages in one go. It's as we process– as we
crawl and process each URL. So it's a move over time. It's not like everything
moves in one go. So the indexing signals will
be consolidated and everything. So if you do that
correctly, you'll be fine. OK? You might see a
bit of hiccuping, but you shouldn't be
worried about that.
AUDIENCE: Thank you. ILYA GRIGORIK: So I think
we're almost out of time, so maybe just one more. PIERRE FAR: Yeah. Last question, please. AUDIENCE: Hi. So I just wanted to
know that you guys have any suggestions
for older browsers. Is there something? Like for instance,
Android 2.3.6 browser does not let you install
[INAUDIBLE] certificates and stuff for that.
So is there anything
I can do for that? ILYA GRIGORIK: I'm not
sure about installing the certificates. I thought you were going to
ask about things like SNI. So if you need to
support older clients, you will need a
dedicated IP address. That's something that you need
to look into– so if you're targeting older browsers. But I do believe there
is a way to load– like if you need to test,
you can load and whitelist certificates on Android,
even on the older versions. AUDIENCE: OK. Thanks. PIERRE FAR: Cool. Thank you, guys..