Check-in [cee1af5a37]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Only apply the complex-request restriction to pages listed in the robot-restrict setting. Deprecate the robot-limiter and robot-allow settings.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: cee1af5a3731d2c44e35abba2623da6ae570520fde9690ec86bbe127448d0884
User & Date: drh 2024-07-27 14:30:31.593
Context
2024-07-27
17:28
Simplified interaction on the honeypot. Humans can prove themselves with just two simple clicks when the auto-captcha setting is enabled. check-in: 0e675ad32c user: drh tags: trunk
14:30
Only apply the complex-request restriction to pages listed in the robot-restrict setting. Deprecate the robot-limiter and robot-allow settings. check-in: cee1af5a37 user: drh tags: trunk
10:31
In the default skin, disable the 'disc' view of UL/LI elements for the /dir page. Reported in [forum:915412fb92|forum post 915412fb92]. check-in: 61e62c02a1 user: stephan tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/login.c.
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281


1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
      fossil_exit(0);
    }
  }
  fossil_free(zDecode);
  return uid;
}

/*
** SETTING: robot-limiter                 boolean default=off
** If enabled, HTTP requests with one or more query parameters and
** without a REFERER string and without a valid login cookie are
** assumed to be hostile robots and are redirected to the honeypot.
** See also the robot-allow and robot-restrict settings which can
** be used to override the value of this setting for specific pages.
*/
/*
** SETTING: robot-allow                   width=40 block-text
** The VALUE of this setting is a list of GLOB patterns which match
** pages for which the robot-limiter is overwritten to false.  If this
** setting is missing or an empty string, then it is assumed to match
** nothing.
*/
/*
** SETTING: robot-restrict                width=40 block-text
** The VALUE of this setting is a list of GLOB patterns which match
** pages for which the robot-limiter setting should be enforced.
** In other words, if the robot-limiter is true and this setting either
** does not exist or is empty or matches the current page, then a
** redirect to the honeypot is issues.  If this setting exists
** but does not match the current page, then the robot-limiter setting
** is overridden to false.


*/

/*
** Check to see if the current HTTP request is a complex request that
** is coming from a robot and if access should restricted for such robots.
** For the purposes of this module, a "complex request" is an HTTP
** request with one or more query parameters.
**
** If this routine determines that robots should be restricted, then
** this routine publishes a redirect to the honeypot and exits without
** returning to the caller.
**
** This routine believes that this is a complex request is coming from
** a robot if all of the following are true:
**
**    *   The user is "nobody".
**    *   The REFERER field of the HTTP header is missing or empty.
**    *   There are one or more query parameters other than "name".
**
** Robot restrictions are governed by settings.
**
**    robot-limiter     The restrictions implemented by this routine only
**                      apply if this setting exists and is true.
**
**    robot-allow       If this setting exists and the page of the request
**                      matches the comma-separate GLOB list that is the
**                      value of this setting, then no robot restrictions
**                      are applied.
**
**    robot-restrict    If this setting exists then robot restrictions only
**                      apply to pages that match the comma-separated
**                      GLOB list that is the value of this setting.
*/
void login_restrict_robot_access(void){
  const char *zReferer;
  const char *zGlob;
  Glob *pGlob;
  int go = 1;
  if( g.zLogin!=0 ) return;
  zReferer = P("HTTP_REFERER");
  if( zReferer && zReferer[0]!=0 ) return;
  if( !db_get_boolean("robot-limiter",0) ) return;
  if( cgi_qp_count()<1 ) return;
  zGlob = db_get("robot-allow",0);
  if( zGlob && zGlob[0] ){
    pGlob = glob_create(zGlob);
    go = glob_match(pGlob, g.zPath);
    glob_free(pGlob);
    if( go ) return;
  }
  zGlob = db_get("robot-restrict",0);
  if( zGlob && zGlob[0] ){
    pGlob = glob_create(zGlob);
    go = glob_match(pGlob, g.zPath);
    glob_free(pGlob);
    if( !go ) return;
  }

  /* If we reach this point, it means we have a situation where we
  ** want to restrict the activity of a robot.
  */
  cgi_set_cookie("fossil-goto", cgi_reconstruct_original_url(), 0, 600);
  cgi_redirectf("%R/honeypot");
}  








<
<
<
<
<
<
<
<
<
<
<
<
<
<
<

|
|
<
<
|
<
|
>
>






|














|
|
<
<
<
<
|
<
<
|
<









<
<
|
|
<
<
<
|
<
<
<
|
|
|
|
<







1251
1252
1253
1254
1255
1256
1257
1258















1259
1260
1261


1262

1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288




1289


1290

1291
1292
1293
1294
1295
1296
1297
1298
1299


1300
1301



1302



1303
1304
1305
1306

1307
1308
1309
1310
1311
1312
1313
      fossil_exit(0);
    }
  }
  fossil_free(zDecode);
  return uid;
}

/*















** SETTING: robot-restrict                width=40 block-text
** The VALUE of this setting is a list of GLOB patterns that match
** pages for which complex HTTP requests from robots should be disallowed.


** The recommended value for this setting is:

** 
**      timeline,vdiff,fdiff,annotate,blame
** 
*/

/*
** Check to see if the current HTTP request is a complex request that
** is coming from a robot and if access should restricted for such robots.
** For the purposes of this module, a "complex request" is an HTTP
** request with one or more query parameters other than "name".
**
** If this routine determines that robots should be restricted, then
** this routine publishes a redirect to the honeypot and exits without
** returning to the caller.
**
** This routine believes that this is a complex request is coming from
** a robot if all of the following are true:
**
**    *   The user is "nobody".
**    *   The REFERER field of the HTTP header is missing or empty.
**    *   There are one or more query parameters other than "name".
**
** Robot restrictions are governed by settings.
**
**    robot-restrict    The value is a list of GLOB patterns for pages
**                      that should restrict robot access.  No restrictions




**                      are applied if this setting is undefined or is


**                      an empty string.

*/
void login_restrict_robot_access(void){
  const char *zReferer;
  const char *zGlob;
  Glob *pGlob;
  int go = 1;
  if( g.zLogin!=0 ) return;
  zReferer = P("HTTP_REFERER");
  if( zReferer && zReferer[0]!=0 ) return;


  zGlob = db_get("robot-restrict",0);
  if( zGlob==0 || zGlob[0]==0 ) return;



  if( cgi_qp_count()<1 ) return;



  pGlob = glob_create(zGlob);
  go = glob_match(pGlob, g.zPath);
  glob_free(pGlob);
  if( !go ) return;


  /* If we reach this point, it means we have a situation where we
  ** want to restrict the activity of a robot.
  */
  cgi_set_cookie("fossil-goto", cgi_reconstruct_original_url(), 0, 600);
  cgi_redirectf("%R/honeypot");
}  
Changes to src/setup.c.
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503

504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
  @ computations here.  Set this to 0.0 to disable the load average limit.
  @ This limit is only enforced on Unix servers.  On Linux systems,
  @ access to the /proc virtual filesystem is required, which means this limit
  @ might not work inside a chroot() jail.
  @ (Property: "max-loadavg")</p>

  @ <hr>
  onoff_attribute("Prohibit robots from issuing complex requests",
                  "robot-limiter", "rlb", 0, 0);
  @ <p> A "complex request" is an HTTP request that has one or more query
  @ parameters. Some robots will spend hours juggling around query parameters
  @ or even forging fake query parameters in an effort to discover new
  @ behavior or to find an SQL injection opportunity or similar.  This can
  @ waste hours of CPU time and gigabytes of bandwidth on the server.  Hence,
  @ it is recommended to turn this feature on to stop such nefarious behavior.

  @ (Property: robot-limiter)
  @
  @ <p> When enabled, complex requests from user "nobody" without a Referer
  @ redirect to the honeypot.
  @
  @ <p> Additional settings below allow positive and negative overrides of
  @ this complex request limiter. 
  @ <p><b>Allow Robots To See These Pages</b> (Property: robot-allow)<br>
  textarea_attribute("", 4, 80,
      "robot-allow", "rballow", "", 0);
  @ <p><b>Restrict Robots From Seeing Only These Pages</b>
  @ (Property: robot-restrict)<br>
  textarea_attribute("", 4, 80,
      "robot-restrict", "rbrestrict", "", 0);

  @ <hr>
  @ <p><input type="submit"  name="submit" value="Apply Changes"></p>
  @ </div></form>
  db_end_transaction(0);
  style_finish_page();







|
|




|
|
>
|
|
<
<
<
<
<
<
|
<
<
<
<







489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506






507




508
509
510
511
512
513
514
  @ computations here.  Set this to 0.0 to disable the load average limit.
  @ This limit is only enforced on Unix servers.  On Linux systems,
  @ access to the /proc virtual filesystem is required, which means this limit
  @ might not work inside a chroot() jail.
  @ (Property: "max-loadavg")</p>

  @ <hr>
  @ <p><b>Do not allow robots to make complex requests
  @ against the following pages.</b>
  @ <p> A "complex request" is an HTTP request that has one or more query
  @ parameters. Some robots will spend hours juggling around query parameters
  @ or even forging fake query parameters in an effort to discover new
  @ behavior or to find an SQL injection opportunity or similar.  This can
  @ waste hours of CPU time and gigabytes of bandwidth on the server.  A
  @ suggested value for this setting is:
  @ "<tt>timeline,vdiff,fdiff,annotate,blame</tt>".
  @ (Property: robot-restrict)
  @ <p>






  textarea_attribute("", 2, 80,




      "robot-restrict", "rbrestrict", "", 0);

  @ <hr>
  @ <p><input type="submit"  name="submit" value="Apply Changes"></p>
  @ </div></form>
  db_end_transaction(0);
  style_finish_page();