Fossil

Check-in [1a0b304307]
Login

Check-in [1a0b304307]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add the complex-requests-from-robots limiter.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 1a0b3043073b1f2b9274a247df7c6f777e170043c01483ae006e4a611423422e
User & Date: drh 2024-07-26 17:49:16.726
References
2024-09-11
13:30
Update tests to account for new settings introduced with [1a0b304307] and [cadfcba32c]. ... (check-in: 6ead7d999e user: andybradford tags: trunk)
Context
2024-07-27
10:20
A redirect to the honeypot due to robot complex-request detection also sets the "fossil-goto" cookie with the original URL. If a real users proceeds to login, then a redirect to the complex-request occurs as soon as the login completes. ... (check-in: aa4159f781 user: drh tags: trunk)
2024-07-26
17:49
Add the complex-requests-from-robots limiter. ... (check-in: 1a0b304307 user: drh tags: trunk)
10:49
When doing a "fossil open URL" such that the repository is first cloned and then opened, leaving the repository as a file in the check-out, make sure the repository pathname in VVAR is relative, so that the entire check-out can be moved without breaking the link to the repository. See [forum:/forumpost/f2f5ff2e35031612|forum thread f2f5ff2e35031612]. ... (check-in: 6e04d9cbd4 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/cgi.c.
895
896
897
898
899
900
901













902
903
904
905
906
907
908
      if( i<nUsedQP ){
        memmove(aParamQP+i, aParamQP+i+1, sizeof(*aParamQP)*(nUsedQP-i));
      }
      return;
    }
  }
}














/*
** Add an environment varaible value to the parameter set.  The zName
** portion is fixed but a copy is be made of zValue.
*/
void cgi_setenv(const char *zName, const char *zValue){
  cgi_set_parameter_nocopy(zName, fossil_strdup(zValue), 0);







>
>
>
>
>
>
>
>
>
>
>
>
>







895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
      if( i<nUsedQP ){
        memmove(aParamQP+i, aParamQP+i+1, sizeof(*aParamQP)*(nUsedQP-i));
      }
      return;
    }
  }
}

/*
** Return the number of query parameters.  Cookies and environment variables
** do not count.  Also, do not count the special QP "name".
*/
int cgi_qp_count(void){
  int cnt = 0;
  int i;
  for(i=0; i<nUsedQP; i++){
    if( aParamQP[i].isQP && fossil_strcmp(aParamQP[i].zName,"name")!=0 ) cnt++;
  }
  return cnt;
}

/*
** Add an environment varaible value to the parameter set.  The zName
** portion is fixed but a copy is be made of zValue.
*/
void cgi_setenv(const char *zName, const char *zValue){
  cgi_set_parameter_nocopy(zName, fossil_strdup(zValue), 0);
Changes to src/login.c.
1248
1249
1250
1251
1252
1253
1254
























































































1255
1256
1257
1258
1259
1260
1261
      fossil_exit(0);
    }
  }
  fossil_free(zDecode);
  return uid;
}

























































































/*
** This routine examines the login cookie to see if it exists and
** is valid.  If the login cookie checks out, it then sets global
** variables appropriately.
**
**    g.userUid      Database USER.UID value.  Might be -1 for "nobody"
**    g.zLogin       Database USER.LOGIN value.  NULL for user "nobody"







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
      fossil_exit(0);
    }
  }
  fossil_free(zDecode);
  return uid;
}

/*
** SETTING: robot-limiter                 boolean default=off
** If enabled, HTTP requests with one or more query parameters and
** without a REFERER string and without a valid login cookie are
** assumed to be hostile robots and are redirected to the honeypot.
** See also the robot-allow and robot-restrict settings which can
** be used to override the value of this setting for specific pages.
*/
/*
** SETTING: robot-allow                   width=40 block-text
** The VALUE of this setting is a list of GLOB patterns which match
** pages for which the robot-limiter is overwritten to false.  If this
** setting is missing or an empty string, then it is assumed to match
** nothing.
*/
/*
** SETTING: robot-restrict                width=40 block-text
** The VALUE of this setting is a list of GLOB patterns which match
** pages for which the robot-limiter setting should be enforced.
** In other words, if the robot-limiter is true and this setting either
** does not exist or is empty or matches the current page, then a
** redirect to the honeypot is issues.  If this setting exists
** but does not match the current page, then the robot-limiter setting
** is overridden to false.
*/

/*
** Check to see if the current HTTP request is a complex request that
** is coming from a robot and if access should restricted for such robots.
** For the purposes of this module, a "complex request" is an HTTP
** request with one or more query parameters.
**
** If this routine determines that robots should be restricted, then
** this routine publishes a redirect to the honeypot and exits without
** returning to the caller.
**
** This routine believes that this is a complex request is coming from
** a robot if all of the following are true:
**
**    *   The user is "nobody".
**    *   The REFERER field of the HTTP header is missing or empty.
**    *   There are one or more query parameters other than "name".
**
** Robot restrictions are governed by settings.
**
**    robot-limiter     The restrictions implemented by this routine only
**                      apply if this setting exists and is true.
**
**    robot-allow       If this setting exists and the page of the request
**                      matches the comma-separate GLOB list that is the
**                      value of this setting, then no robot restrictions
**                      are applied.
**
**    robot-restrict    If this setting exists then robot restrictions only
**                      apply to pages that match the comma-separated
**                      GLOB list that is the value of this setting.
*/
void login_restrict_robot_access(void){
  const char *zReferer;
  const char *zGlob;
  Glob *pGlob;
  int go = 1;
  if( g.zLogin!=0 ) return;
  zReferer = P("HTTP_REFERER");
  if( zReferer && zReferer[0]!=0 ) return;
  if( !db_get_boolean("robot-limiter",0) ) return;
  if( cgi_qp_count()<1 ) return;
  zGlob = db_get("robot-allow",0);
  if( zGlob && zGlob[0] ){
    pGlob = glob_create(zGlob);
    go = glob_match(pGlob, g.zPath);
    glob_free(pGlob);
    if( go ) return;
  }
  zGlob = db_get("robot-restrict",0);
  if( zGlob && zGlob[0] ){
    pGlob = glob_create(zGlob);
    go = glob_match(pGlob, g.zPath);
    glob_free(pGlob);
    if( !go ) return;
  }

  /* If we reach this point, it means we have a situation where we
  ** want to restrict the activity of a robot.
  */
  cgi_redirectf("%R/honeypot");
}  

/*
** This routine examines the login cookie to see if it exists and
** is valid.  If the login cookie checks out, it then sets global
** variables appropriately.
**
**    g.userUid      Database USER.UID value.  Might be -1 for "nobody"
**    g.zLogin       Database USER.LOGIN value.  NULL for user "nobody"
1411
1412
1413
1414
1415
1416
1417



1418
1419
1420
1421
1422
1423
1424
      uid = -1;
      zCap = "";
    }
    login_create_csrf_secret("none");
  }

  login_set_uid(uid, zCap);



}

/*
** Set the current logged in user to be uid.  zCap is precomputed
** (override) capabilities.  If zCap==0, then look up the capabilities
** in the USER table.
*/







>
>
>







1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
      uid = -1;
      zCap = "";
    }
    login_create_csrf_secret("none");
  }

  login_set_uid(uid, zCap);

  /* Maybe restrict access to robots */
  login_restrict_robot_access();
}

/*
** Set the current logged in user to be uid.  zCap is precomputed
** (override) capabilities.  If zCap==0, then look up the capabilities
** in the USER table.
*/
Changes to src/main.c.
2992
2993
2994
2995
2996
2997
2998

2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011

3012
3013
3014
3015
3016
3017


3018
3019
3020
3021
3022
3023
3024
** and the SSH_CONNECTION environment variable is set.  Use the --test
** option on interactive sessions to avoid that special processing when
** using this command interactively over SSH.  A better solution would be
** to use a different command for "ssh" sync, but we cannot do that without
** breaking legacy.
**
** Options:

**   --test              Do not do special "sync" processing when operating
**                       over an SSH link
**   --th-trace          Trace TH1 execution (for debugging purposes)
**   --usercap   CAP     User capability string (Default: "sxy")
**
*/
void cmd_test_http(void){
  const char *zIpAddr;    /* IP address of remote client */
  const char *zUserCap;
  int bTest = 0;

  Th_InitTraceLog();
  zUserCap = find_option("usercap",0,1);

  if( zUserCap==0 ){
    g.useLocalauth = 1;
    zUserCap = "sxy";
  }
  bTest = find_option("test",0,0)!=0;
  login_set_capabilities(zUserCap, 0);


  g.httpIn = stdin;
  g.httpOut = stdout;
  fossil_binary_mode(g.httpOut);
  fossil_binary_mode(g.httpIn);
  g.zExtRoot = find_option("extroot",0,1);
  find_server_repository(2, 0);
  g.zReqType = "HTTP";







>













>
|
|
|
|
<
|
>
>







2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017

3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
** and the SSH_CONNECTION environment variable is set.  Use the --test
** option on interactive sessions to avoid that special processing when
** using this command interactively over SSH.  A better solution would be
** to use a different command for "ssh" sync, but we cannot do that without
** breaking legacy.
**
** Options:
**   --nobody            Pretend to be user "nobody"
**   --test              Do not do special "sync" processing when operating
**                       over an SSH link
**   --th-trace          Trace TH1 execution (for debugging purposes)
**   --usercap   CAP     User capability string (Default: "sxy")
**
*/
void cmd_test_http(void){
  const char *zIpAddr;    /* IP address of remote client */
  const char *zUserCap;
  int bTest = 0;

  Th_InitTraceLog();
  zUserCap = find_option("usercap",0,1);
  if( !find_option("nobody",0,0) ){
    if( zUserCap==0 ){
      g.useLocalauth = 1;
      zUserCap = "sxy";
    }

    login_set_capabilities(zUserCap, 0);
  }
  bTest = find_option("test",0,0)!=0;
  g.httpIn = stdin;
  g.httpOut = stdout;
  fossil_binary_mode(g.httpOut);
  fossil_binary_mode(g.httpIn);
  g.zExtRoot = find_option("extroot",0,1);
  find_server_repository(2, 0);
  g.zReqType = "HTTP";
Changes to src/setup.c.
488
489
490
491
492
493
494
























495
496
497
498
499
500
501
  @ computer is too large.  Set the threshold for disallowing expensive
  @ computations here.  Set this to 0.0 to disable the load average limit.
  @ This limit is only enforced on Unix servers.  On Linux systems,
  @ access to the /proc virtual filesystem is required, which means this limit
  @ might not work inside a chroot() jail.
  @ (Property: "max-loadavg")</p>

























  @ <hr>
  @ <p><input type="submit"  name="submit" value="Apply Changes"></p>
  @ </div></form>
  db_end_transaction(0);
  style_finish_page();
}








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
  @ computer is too large.  Set the threshold for disallowing expensive
  @ computations here.  Set this to 0.0 to disable the load average limit.
  @ This limit is only enforced on Unix servers.  On Linux systems,
  @ access to the /proc virtual filesystem is required, which means this limit
  @ might not work inside a chroot() jail.
  @ (Property: "max-loadavg")</p>

  @ <hr>
  onoff_attribute("Prohibit robots from issuing complex requests",
                  "robot-limiter", "rlb", 0, 0);
  @ <p> A "complex request" is an HTTP request that has one or more query
  @ parameters. Some robots will spend hours juggling around query parameters
  @ or even forging fake query parameters in an effort to discover new
  @ behavior or to find an SQL injection opportunity or similar.  This can
  @ waste hours of CPU time and gigabytes of bandwidth on the server.  Hence,
  @ it is recommended to turn this feature on to stop such nefarious behavior.
  @ (Property: robot-limiter)
  @
  @ <p> When enabled, complex requests from user "nobody" without a Referer
  @ redirect to the honeypot.
  @
  @ <p> Additional settings below allow positive and negative overrides of
  @ this complex request limiter. 
  @ <p><b>Allow Robots To See These Pages</b> (Property: robot-allow)<br>
  textarea_attribute("", 4, 80,
      "robot-allow", "rballow", "", 0);
  @ <p><b>Restrict Robots From Seeing Only These Pages</b>
  @ (Property: robot-restrict)<br>
  textarea_attribute("", 4, 80,
      "robot-restrict", "rbrestrict", "", 0);

  @ <hr>
  @ <p><input type="submit"  name="submit" value="Apply Changes"></p>
  @ </div></form>
  db_end_transaction(0);
  style_finish_page();
}