lua-resty-limit-traffic-dynamic

Table of Contents

Name

lua-resty-limit-traffic-dynamic - Lua library adjusts rate limiting in OpenResty/ngx_lua dynamically.

If you’re interested in this private repository, please click on the link https://openresty.com/en/contact/ to contact us for purchase.

Status

This library is production-ready and actively maintained.

Description

This module dynamically adjusts the rate-limiting values of HTTP requests to control the CPU utilization of Nginx worker processes within the target range. To use this module, you need to call dynamic.init() in the init_worker_by_lua* phase to complete the initialization. Then call dynamic.do_access_phase() in the access_by_lua* phase to perform request rate limiting, and finally call dynamic.do_log_phase() in the log_by_lua* phase to perform traffic statistics. It collects statistics on the total size of request responses (including response headers and body) for different URIs and the CPU usage of Nginx worker processes. Based on the collected statistics and the configured target CPU utilization, it calculates new rate-limiting values and applies them to incoming requests. This dynamic adjustment of rate limits helps prevent CPU usage from exceeding the desired threshold.

For optimal performance and balanced traffic distribution across CPUs:

  1. The listen directive should enable reuseport to ensure more even distribution of traffic to each CPU.
  2. Enable worker_cpu_affinity auto to bind worker processes to specific CPUs.

Synopsis

If you haven’t installed the private library yet, please click on the link Installation guide to proceed with the installation.

# Put this directive at the top of the config file
load_module /usr/local/openresty/nginx/modules/ngx_http_lua_dymetrics_module.so;

worker_processes  auto;
worker_cpu_affinity auto;

events {
    worker_connections  1024;
}

http {
    include            mime.types;
    default_type       application/octet-stream;
    sendfile           on;
    keepalive_timeout  65;

    # Specify lua_package_path as needed.
    lua_package_path   "/usr/local/openresty/site/lualib/?.ljbc;;";

    # Specify lua_package_cpath as needed.
    lua_package_cpath  "/usr/local/openresty-yajl/lib/?.so;/usr/local/openresty/site/lualib/?.so;;";

    # Define a dictionary that the worker's CPU statistics will use.
    # Note that the dictionary name must be dymetric_cpu_stat.
    lua_shared_dict dymetric_cpu_stat 128k;

    # Define a shared memory of 2M for the dynamic statistics metrics.
    # Assuming the number of worker processes is 32 and the average URI length is 64 bytes. Then each entry take (100 + 64) bytes.
    # We need to round up to 256.
    # So the required shared memory size is:
    # 256 * (32 + 3) * 200 ~= 1.7 MB
    # So we use 2M here.
    lua_shared_dymetrics  dymetrics 2M;

    # Assuming the URI length is less than 64 bytes. Each entry take (72 + 64 + 16) = 152 bytes. We need to round up 256 bytes.
    # So 256 * 20 = 5 KB would be enough. However, for some operating systems, the minimum memory requirement is 128 KB, so we choose 128 KB.
    lua_shared_dict my_limit_req_store 128K;


    init_worker_by_lua_block {
        local dynamic = require "resty.limit.traffic.dynamic"

        -- set the policy of dynamic rate limit
        dynamic.set_policy("topn_uri_by_accum_upstream_body_size")

        -- If the adjustment period is too short, it may cause shocks.
        -- Our recommended configuration value for period is 10s or more.
        local period = 10 -- unit: second
        local worker_target_cpu = 50
        local min_rps = 10
        local burst = 100
        dynamic.init(period, worker_target_cpu, min_rps, burst)
    }


    log_by_lua_block {
        require "resty.limit.traffic.dynamic".do_log_phase()
    }

    access_by_lua_block {
        local dynamic = require "resty.limit.traffic.dynamic"
        -- The presence of the burst value allows for a smooth transition between periods of low and high traffic.
        -- It prevents the immediate rejection of a large number of requests when traffic suddenly increases, providing a better user experience.
        -- We set burst to 100 for the current URI.
        local reject, delay, err = dynamic.do_access_phase(100)
        if err ~= nil then
            -- Error logs should be protected by a speed limit to prevent large logs from being printed in the event of a DoS attack.
            -- For the purposes of demonstrating the use of the interface, the logging speed limit has not been added to nginx.conf.
            ngx.log(ngx.ERR, "limit request failed: ", err)
        elseif reject then
            -- you may need to sleep 1s in case the client starts a new request immediately
            -- ngx.sleep(1)

            -- return "429 Too Many Requests" to the client
            ngx.exit(429)

            -- You may choose to close connection directly
            -- ngx.exit(444)
        elseif reject == false then
            if delay >= 0.001 then
                ngx.sleep(delay)
            end
        end
    }

    server {
        # Enable reuseport to distribute traffic more evenly across CPUs
        listen 443 ssl reuseport;
        server_name   test.com;
        location / {
            root html/;
            index index.html;
        }

        location /limit-rate-status {
            content_by_lua_block {
                -- Use this interface to view the current rate limit.
                local status = require "resty.limit.traffic.dynamic".get_stats()
                ngx.say(status)
            }
        }
    }
}

Nginx Directive

Directive: lua_shared_dymetrics name size

The name parameter specifies the name of the shared memory, and size specifies its total size. Only one such shared memory block is allowed to be configured.

To calculate the required size of the shared memory, you can use the following formula:

Assuming the number of worker processes is n and the average URI length is L, the required size of the shared memory is (100 + L) * (n + 3) * 200 bytes. For example, with 32 worker processes and an average URI length of 64 bytes, the required shared memory size is 164 * 35 * 200 ~= 1.09 MB.

Adjust the size parameter accordingly based on your specific deployment.

Back to TOC

Lua API

set_policy

syntax: dynamic.set_policy(policy)

Call this interface in the init_worker_by_lua* phase to initialize the policy for dynamic rate limiting. This interface should be called before any other interfaces of this module.

This interface can also be called in other contexts like timer.at, content_by_lua*, or access_by_lua* to dynamically change the policy at runtime.

The policy parameter is used to set the policy for dynamic rate limiting. Currently, the supported policy is:

  • topn_uri_by_accum_upstream_body_size: Adjust rate by the statistics of the top N URIs with the highest accumulated upstream body sizes.
  • by_accum_upstream_body_size: Adjust rate by the statistics of total upstream body size.

init

syntax: dynamic.init(period, worker_target_cpu, min_rps, burst)

Call this interface in the init_worker_by_lua* phase to initialize the parameters for dynamic rate limiting. This interface can also be called in other contexts like timer.at, content_by_lua*, or access_by_lua* to dynamically change the policy at runtime.

The period parameter specifies the time interval in seconds for collecting CPU and traffic statistics. It must be an integer greater than or equal to 2 seconds. Choosing an appropriate period value is important for achieving the desired balance between responsiveness and smoothness of the dynamic rate limiting. Consider your specific requirements and traffic patterns when setting this value. This period value determines how frequently the module adjusts the rate-limiting rules based on the collected CPU and traffic data. A smaller period leads to more responsive adjustments but may introduce more overhead. A larger period results in less frequent adjustments but smoother traffic control. Our recommended configuration value for period is 10s or more.

The worker_target_cpu parameter sets the target maximum CPU utilization percentage for the Nginx worker processes. For example, to limit CPU usage to 50%, pass 50 as the value.

The min_rps parameter defines the minimum allowed requests per second (RPS) for each URI. The dynamic rate limiting will not reduce the RPS below this specified minimum value. Be cautious when setting min_rps to avoid allowing too much traffic during high load periods, which may impact the overall system performance. It’s recommended that you determine this value based on capacity planning and the expected minimum traffic for your service. This min_rps setting ensures that rate-limiting never throttles traffic below a certain level, preventing over-aggressive throttling that may affect service availability. It should be set based on the minimum expected traffic for individual URIs.

The parameter burst represents the number of burst requests allowed within 1 second. The value of burst must be greater than 0. This value can be overrided in do_access_phase.

For example:

    init_worker_by_lua_block {
        require "resty.limit.traffic.dynamic".init(10, 60, 500)
    }

do_access_phase

syntax: reject, delay, err = dynamic.do_access_phase(burst, uri)

Call this interface in the access_by_lua* phase to perform rate limiting on requests.

It is recommended to call do_access_phase in the access_by_lua directive within the http block. If access_by_lua is also used in multiple places like the server block or location block, remember to call dynamic.do_access_phase in each corresponding access_by_lua directive. Otherwise, the related URIs will not be subject to the rate-limiting actions.

The parameter burst represents the number of burst requests allowed within 1 second. The value of burst must be greater than 0. It is important to set a reasonable value for the burst parameter in the do_access_phase interface. The parameter uri represents the URI corresponding to the current request. If the URI parameter is empty, ngx.var.uri will be used instead.

For example:

    access_by_lua_block {
        local dynamic = require "resty.limit.traffic.dynamic"

        local reject, delay, err = dynamic.do_access_phase(1000)
        if err ~= nil then
            -- Error logs should be protected by a speed limit to prevent large logs from being printed in the event of a DoS attack.
            -- For the purpose of demonstrating the use of the interface, the logging speed limit has not been added to nginx.conf.
            ngx.log(ngx.ERR, "dynamic rate-limit failed: ", err)
        elseif reject then
            -- Here we return 444 to close the connection directly. Alternatively, we can execute sleep before closing the connection to prevent the client from immediately creating a new connection
            -- ngx.sleep(1)
            ngx.exit(444)

        elseif reject == false then
            if delay >= 0.001 then
                ngx.sleep(delay)
            end
        end
    }

do_log_phase

syntax: dynamic.do_log_phase()

Call this function in the log_by_lua* phase to record statistics for the response traffic size of requests.

It is important to ensure that do_log_phase is called in all relevant locations. Failing to call this function in some places will lead to inaccurate traffic statistics.

It is recommended to call do_log_phase in the log_by_lua directive within the http block. If log_by_lua* is also used in the server block or location block, remember to call dynamic.do_log_phase in the corresponding log_by_lua* directive as well.

For example:

    log_by_lua_block {
        require "resty.limit.traffic.dynamic".do_log_phase()
    }

enable_detail_stat

syntax: stats, err = dynamic.enable_detail_stat()

By default, statistics will not include the number of passes and rejections per API interface. Calls to this interface will count the number of rejections and passes per API interface.

This interface is a debugging interface and should not be enabled in production environments.

For example:

    init_worker_by_lua_block {
        require "resty.limit.traffic.dynamic".enable_detail_stat()
    }

get_stats

syntax: stats, err = dynamic.get_stats()

This function returns the statistics of the previous period, including the CPU utilization of each worker process and the total CPU utilization. It also provides the current rate-limiting results and the expected CPU utilization simulated based on the current rate limits.

In case of error, it returns nil with a string describing the error.

Note that the get_stats() function is intended for internal monitoring purposes only and should not be exposed to external users or systems.

For example:

CPU statistics of last cycle:
Total worker's CPU usage: 173%
worker 1: 36%
worker 2: 48%
worker 3: 47%
worker 4: 39%

API statistics of last cycle:
/api/exchange/001: 4501034 bytes/sec, 11366 requests/sec
/api/exchange/002: 4166464 bytes/sec, 10521 requests/sec
/api/exchange/003: 3777572 bytes/sec, 9539 requests/sec
/api/exchange/004: 3401144 bytes/sec, 8589 requests/sec
/api/exchange/005: 3110515 bytes/sec, 7855 requests/sec
/api/exchange/006: 2771935 bytes/sec, 7000 requests/sec
/api/exchange/007: 2380271 bytes/sec, 6011 requests/sec
/api/exchange/008: 2018342 bytes/sec, 5097 requests/sec
/api/exchange/009: 1701745 bytes/sec, 4297 requests/sec
/api/exchange/010: 1032525 bytes/sec, 2607 requests/sec
[others]: 4920764 bytes/sec, 12426 requests/sec

Estimated CPU Utilization: 54%

New rate api value for API (rps):
/api/exchange/001: 13078 requests/sec
/api/exchange/002: 12105 requests/sec
/api/exchange/003: 10975 requests/sec
/api/exchange/004: 9882 requests/sec
/api/exchange/005: 9037 requests/sec
/api/exchange/006: 8054 requests/sec
/api/exchange/007: 6916 requests/sec
/api/exchange/008: 5864 requests/sec
/api/exchange/009: 4944 requests/sec
/api/exchange/010: 3000 requests/sec
[others]: 12426 requests/sec

set_burst_multiplier

syntax: dynamic.set_burst_mutiplier(m)

In case neither the init() nor the do_access_phase() interface specifies a burst value, the burst value will be calculated automatically.

The burst value is calculated as: rate * burst_multiplier. By default, burst_multiplier is 0.05. burst_multiplier can be set through this interface.

set_min_burst

syntax: dynamic.set_min_burst(m)

The automatically calculated burst value may be too small and a minimum value can be specified using this interface. This minimum value is specific to each worker. The default value of minimum burst is 0.5.

set_policy

syntax: dynamic.set_policy(policy)

Currently the topn_uri_by_accum_upstream_body_size and by_accum_upstream_body_size policies are supported.

The topn_uri_by_accum_upstream_body_size policy counts the URIs of the topn upstream response flows and adjusts the target speed limit based on the value of each URI.

The by_accum_upstream_body_size policy counts only all upstream traffic and adjusts the target speed limit based on the total traffic.

set_topn

syntax: ok, err = dynamic.set_topn(n)

Call this interface to set the specific value for topn when using the topn_uri_by_accum_upstream_body_size policy. The n parameter specifies the number of top URIs to consider for rate limiting.

The value of n must be an integer between 1 and 100, inclusive. The default is 20.

This interface can be called in the init_worker_by_lua* phase or in other contexts like timer.at, content_by_lua*, or access_by_lua* to dynamically change the topn value at runtime.

For example:

init_worker_by_lua_block {
    local dynamic = require "resty.limit.traffic.dynamic"
    dynamic.set_policy("topn_uri_by_accum_upstream_body_size")
    dynamic.set_topn(10)  -- Set topn to 10
}

Copyright & License

Copyright (C) 2024 by OpenResty Inc. All rights reserved.

This document is proprietary and contains confidential information. Redistribution of this document without written permission from the copyright holders is prohibited at all times.