83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
of the requester and so a malicious robot can forge a UserAgent
string that makes it look like a human. But most robots want
to "play nicely" on the internet and are quite open
about the fact that they are a robot. And so the UserAgent string
provides a good first-guess about whether or not a request originates
from a human or a robot.
The [/help?cmd=auto-hyperlink|auto-hyperlink] setting, shown as
"<b>Enable hyperlinks based on User-Agent and/or Javascript</b>" on
the Robot Defense Settings page,
can be set to "UserAgent only" or "UserAgent and Javascript" or "off".
If the UserAgent string looks like a human and not a robot, then
Fossil will enable hyperlinks even if the <b>Hyperlink</b> capability
is omitted from the user permissions. This setting gives humans easy
access to the hyperlinks while preventing robots
|
|
|
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
of the requester and so a malicious robot can forge a UserAgent
string that makes it look like a human. But most robots want
to "play nicely" on the internet and are quite open
about the fact that they are a robot. And so the UserAgent string
provides a good first-guess about whether or not a request originates
from a human or a robot.
The [/help/auto-hyperlink|auto-hyperlink] setting, shown as
"<b>Enable hyperlinks based on User-Agent and/or Javascript</b>" on
the Robot Defense Settings page,
can be set to "UserAgent only" or "UserAgent and Javascript" or "off".
If the UserAgent string looks like a human and not a robot, then
Fossil will enable hyperlinks even if the <b>Hyperlink</b> capability
is omitted from the user permissions. This setting gives humans easy
access to the hyperlinks while preventing robots
|
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
|
This extra step of using JavaScript to enable the hyperlink targets
is a security measure against robots that forge a human-looking
UserAgent string. Most robots do not bother to run JavaScript and
so to the robot the empty anchor tag will be useless. But all modern
web browsers implement JavaScript, so hyperlinks will show up
normally for human users.
If the [/help?cmd=auto-hyperlink|"auto-hyperlink"] setting is (2)
"<b>Enable hyperlinks using User-Agent and/or Javascript</b>",
then there are now two additional sub-settings that control when
hyperlinks are enabled.
The first new sub-setting is a delay (in milliseconds) before setting
the "href=" attributes on anchor tags. The default value for this
delay is 10 milliseconds. The idea here is that a robots will try to
|
|
|
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
|
This extra step of using JavaScript to enable the hyperlink targets
is a security measure against robots that forge a human-looking
UserAgent string. Most robots do not bother to run JavaScript and
so to the robot the empty anchor tag will be useless. But all modern
web browsers implement JavaScript, so hyperlinks will show up
normally for human users.
If the [/help/auto-hyperlink|"auto-hyperlink"] setting is (2)
"<b>Enable hyperlinks using User-Agent and/or Javascript</b>",
then there are now two additional sub-settings that control when
hyperlinks are enabled.
The first new sub-setting is a delay (in milliseconds) before setting
the "href=" attributes on anchor tags. The default value for this
delay is 10 milliseconds. The idea here is that a robots will try to
|
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
|
See also [./loadmgmt.md|Managing Server Load] for a description
of how expensive pages can be disabled when the server is under heavy
load.
<h2>Do Not Allow Robot Access To Certain Pages</h2>
The [/help?cmd=robot-restrict|robot-restrict setting] is a comma-separated
list of GLOB patterns for pages for which robot access is prohibited.
The default value is:
<blockquote><pre>
timelineX,diff,annotate,fileage,file,finfo,reports
</pre></blockquote>
|
|
|
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
|
See also [./loadmgmt.md|Managing Server Load] for a description
of how expensive pages can be disabled when the server is under heavy
load.
<h2>Do Not Allow Robot Access To Certain Pages</h2>
The [/help/robot-restrict|robot-restrict setting] is a comma-separated
list of GLOB patterns for pages for which robot access is prohibited.
The default value is:
<blockquote><pre>
timelineX,diff,annotate,fileage,file,finfo,reports
</pre></blockquote>
|
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
|
* <b>zip →</b>
The special "zip" keyword also matches "/tarball/" and "/sqlar/".
* <b>zipX →</b>
This is like "zip" in that it restricts access to "/zip/", "/tarball"/
and "/sqlar/" but with exceptions:<ol type="a">
<li><p> If the [/help?cmd=robot-zip-leaf|robot-zip-leaf] setting is
true, then tarballs of leaf check-ins are allowed. This permits
URLs that attempt to download the latest check-in on trunk or
from a named branch, for example.
<li><p> If a check-in has a tag that matches the GLOB list in
[/help?cmd=robot-zip-tag|robot-zip-tag], then tarballs of that
check-in are allowed. This allow check-ins tagged with
"release" or "allow-robots" (for example) to be downloaded
without restriction.
</ol>
The "zipX" restriction is not in the default robot-restrict setting.
This is something you might want to add, depending on your needs.
* <b>diff →</b>
This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that
is primarily about showing the difference between two check-ins or two
file versioons.
* <b>annotate →</b>
This also matches /blame/ and /praise/.
Other special keywords may be added in the future.
The default [/help?cmd=robot-restrict|robot-restrict]
setting has been shown in practice to do a good job of keeping
robots from consuming all available CPU and bandwidth while will
still allowing humans access to the full power of the site without
having to be logged in.
One possible enhancement is to add "zipX" to the
[/help?cmd=robot-restrict|robot-restrict] setting,
and enable [help?cmd=robot-zip-leaf|robot-zip-leaf]
and configure [help?cmd=robot-zip-tag|robot-zip-tag].
Do this if you find that robots downloading lots of
obscure tarballs is causing load issues on your site.
<h2>Anti-robot Exception RegExps</h2>
The [/help?cmd=robot-exception|robot-exception setting] under the name
of <b>Exceptions to anti-robot restrictions</b> is a list of
[/re_rules|regular expressions], one per line, that match
URIs that will bypass the captcha and allow robots full access. The
intent of this setting is to allow automated build scripts
to download specific tarballs of project snapshots.
The recommended value for this setting allows robots to use URIs of the
|
|
|
|
|
|
|
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
|
* <b>zip →</b>
The special "zip" keyword also matches "/tarball/" and "/sqlar/".
* <b>zipX →</b>
This is like "zip" in that it restricts access to "/zip/", "/tarball"/
and "/sqlar/" but with exceptions:<ol type="a">
<li><p> If the [/help/robot-zip-leaf|robot-zip-leaf] setting is
true, then tarballs of leaf check-ins are allowed. This permits
URLs that attempt to download the latest check-in on trunk or
from a named branch, for example.
<li><p> If a check-in has a tag that matches the GLOB list in
[/help/robot-zip-tag|robot-zip-tag], then tarballs of that
check-in are allowed. This allow check-ins tagged with
"release" or "allow-robots" (for example) to be downloaded
without restriction.
</ol>
The "zipX" restriction is not in the default robot-restrict setting.
This is something you might want to add, depending on your needs.
* <b>diff →</b>
This matches /vdiff/ and /fdiff/ and /vpatch/ and any other page that
is primarily about showing the difference between two check-ins or two
file versioons.
* <b>annotate →</b>
This also matches /blame/ and /praise/.
Other special keywords may be added in the future.
The default [/help/robot-restrict|robot-restrict]
setting has been shown in practice to do a good job of keeping
robots from consuming all available CPU and bandwidth while will
still allowing humans access to the full power of the site without
having to be logged in.
One possible enhancement is to add "zipX" to the
[/help/robot-restrict|robot-restrict] setting,
and enable [help?cmd=robot-zip-leaf|robot-zip-leaf]
and configure [help?cmd=robot-zip-tag|robot-zip-tag].
Do this if you find that robots downloading lots of
obscure tarballs is causing load issues on your site.
<h2>Anti-robot Exception RegExps</h2>
The [/help/robot-exception|robot-exception setting] under the name
of <b>Exceptions to anti-robot restrictions</b> is a list of
[/re_rules|regular expressions], one per line, that match
URIs that will bypass the captcha and allow robots full access. The
intent of this setting is to allow automated build scripts
to download specific tarballs of project snapshots.
The recommended value for this setting allows robots to use URIs of the
|