This document visualizes the state machine encoded in the llhttp HTTP parser.
High-Level Overview
stateDiagram-v2
[*] --> start
start --> load_type : non_whitespace
start --> start : CR_LF_skip
load_type --> start_req : type_REQUEST
load_type --> start_res : type_RESPONSE
load_type --> start_req_or_res : type_BOTH
state "Request Parsing" as req_group {
start_req --> method_parsing : first_char
method_parsing --> req_first_space_before_url : method_complete
req_first_space_before_url --> req_spaces_before_url : SP
req_spaces_before_url --> url_entry : non_SP
}
state "Response Parsing" as res_group {
start_res --> res_http_major : HTTP_slash
res_http_major --> res_http_dot : digit
res_http_dot --> res_http_minor : dot
res_http_minor --> res_http_end : digit
res_http_end --> res_status_code : SP
res_status_code --> res_status_start : 3_digits
res_status_start --> res_status : SP
res_status --> res_line_almost_done : status_text
res_line_almost_done --> header_field_start : CR_LF
}
state "URL Parsing" as url_group {
url_entry --> url_entry_normal : normal_request
url_entry --> url_entry_connect : CONNECT_method
url_entry_normal --> url_start
url_start --> url_schema : alpha
url_schema --> url_schema_delim : colon
url_schema_delim --> url_server : double_slash
url_server --> url_path : slash_or_SP
url_path --> url_query_or_fragment : question_mark
url_query_or_fragment --> url_query : query_chars
url_query --> url_fragment : hash
url_fragment --> url_to_http : SP
}
url_to_http --> req_http_start : SP
req_http_start --> req_http_major : HTTP_slash
req_http_major --> req_http_dot : digit
req_http_dot --> req_http_minor : dot
req_http_minor --> req_http_complete : digit
req_http_complete --> header_field_start : CR_LF
state "Header Parsing" as header_group {
header_field_start --> span_start_header_field : token_char
header_field_start --> headers_almost_done : CR
span_start_header_field --> header_field : start_span
header_field --> header_field_colon : colon
header_field_colon --> header_value_discard_ws : colon_found
header_value_discard_ws --> span_start_header_value : non_WS
span_start_header_value --> header_value : start_span
header_value --> header_value_almost_done : CR
header_value_almost_done --> header_value_lws : LF
header_value_lws --> header_field_start : non_WS_new_header
header_value_lws --> header_value : WS_continuation
}
headers_almost_done --> after_headers_complete : LF
state "Body Handling" as body_group {
after_headers_complete --> span_start_body : has_body
after_headers_complete --> message_complete : no_body
span_start_body --> consume_content_length : Content_Length
span_start_body --> chunk_size : chunked
span_start_body --> eof : until_close
consume_content_length --> message_complete : length_zero
chunk_size --> chunk_size_digit : hex_digit
chunk_size_digit --> chunk_size_almost_done : CR
chunk_size_almost_done --> chunk_data : LF_size_gt_0
chunk_size_almost_done --> message_complete : LF_size_eq_0
chunk_data --> chunk_data_almost_done : chunk_consumed
chunk_data_almost_done --> chunk_size : CR_LF
}
message_complete --> on_message_complete : callback
on_message_complete --> is_equal_upgrade : check_upgrade
is_equal_upgrade --> after_message_complete : no_upgrade
is_equal_upgrade --> pause_upgrade : upgrade
after_message_complete --> start : keep_alive
after_message_complete --> closed : close
closed --> [*]
pause_upgrade --> [*]
Detailed Method Parsing State Machine
stateDiagram-v2
[*] --> start_req
start_req --> A_branch : A
start_req --> B_branch : B
start_req --> C_branch : C
start_req --> D_branch : D
start_req --> F_branch : F
start_req --> G_branch : G
start_req --> H_branch : H
start_req --> L_branch : L
start_req --> M_branch : M
start_req --> N_branch : N
start_req --> O_branch : O
start_req --> P_branch : P
start_req --> R_branch : R
start_req --> S_branch : S
start_req --> T_branch : T
start_req --> U_branch : U
A_branch --> ACL : CL
A_branch --> ANNOUNCE : NNOUNCE
B_branch --> BIND : IND
C_branch --> CHECKOUT : HECKOUT
C_branch --> CONNECT : ONNECT
C_branch --> COPY : OPY
D_branch --> DELETE : ELETE
D_branch --> DESCRIBE : ESCRIBE
F_branch --> FLUSH : LUSH
G_branch --> GET : ET
G_branch --> GET_PARAMETER : ET_PARAMETER
H_branch --> HEAD : EAD
L_branch --> LOCK : OCK
L_branch --> LINK : INK
M_branch --> MKCOL : KCOL
M_branch --> MKACTIVITY : KACTIVITY
M_branch --> MKCALENDAR : KCALENDAR
M_branch --> MOVE : OVE
M_branch --> MERGE : ERGE
M_branch --> MSEARCH : SEARCH
N_branch --> NOTIFY : OTIFY
O_branch --> OPTIONS : PTIONS
P_branch --> POST : OST
P_branch --> PUT : UT
P_branch --> PATCH : ATCH
P_branch --> PROPFIND : ROPFIND
P_branch --> PROPPATCH : ROPPATCH
P_branch --> PURGE : URGE
P_branch --> PLAY : LAY
P_branch --> PAUSE_METHOD : AUSE
P_branch --> PRI : RI
R_branch --> REPORT : EPORT
R_branch --> REBIND : EBIND
R_branch --> RECORD : ECORD
R_branch --> REDIRECT : EDIRECT
S_branch --> SEARCH : EARCH
S_branch --> SUBSCRIBE : UBSCRIBE
S_branch --> SOURCE : OURCE
S_branch --> SETUP : ETUP
S_branch --> SET_PARAMETER : ET_PARAMETER
T_branch --> TRACE : RACE
T_branch --> TEARDOWN : EARDOWN
U_branch --> UNLOCK : NLOCK
U_branch --> UNBIND : NBIND
U_branch --> UNSUBSCRIBE : NSUBSCRIBE
U_branch --> UNLINK : NLINK
ACL --> req_first_space_before_url
ANNOUNCE --> req_first_space_before_url
BIND --> req_first_space_before_url
CHECKOUT --> req_first_space_before_url
CONNECT --> req_first_space_before_url
COPY --> req_first_space_before_url
DELETE --> req_first_space_before_url
DESCRIBE --> req_first_space_before_url
FLUSH --> req_first_space_before_url
GET --> req_first_space_before_url
GET_PARAMETER --> req_first_space_before_url
HEAD --> req_first_space_before_url
LOCK --> req_first_space_before_url
LINK --> req_first_space_before_url
MKCOL --> req_first_space_before_url
MKACTIVITY --> req_first_space_before_url
MKCALENDAR --> req_first_space_before_url
MOVE --> req_first_space_before_url
MERGE --> req_first_space_before_url
MSEARCH --> req_first_space_before_url
NOTIFY --> req_first_space_before_url
OPTIONS --> req_first_space_before_url
POST --> req_first_space_before_url
PUT --> req_first_space_before_url
PATCH --> req_first_space_before_url
PROPFIND --> req_first_space_before_url
PROPPATCH --> req_first_space_before_url
PURGE --> req_first_space_before_url
PLAY --> req_first_space_before_url
PAUSE_METHOD --> req_first_space_before_url
PRI --> req_first_space_before_url
REPORT --> req_first_space_before_url
REBIND --> req_first_space_before_url
RECORD --> req_first_space_before_url
REDIRECT --> req_first_space_before_url
SEARCH --> req_first_space_before_url
SUBSCRIBE --> req_first_space_before_url
SOURCE --> req_first_space_before_url
SETUP --> req_first_space_before_url
SET_PARAMETER --> req_first_space_before_url
TRACE --> req_first_space_before_url
TEARDOWN --> req_first_space_before_url
UNLOCK --> req_first_space_before_url
UNBIND --> req_first_space_before_url
UNSUBSCRIBE --> req_first_space_before_url
UNLINK --> req_first_space_before_url
Chunked Transfer Encoding State Machine
stateDiagram-v2
[*] --> chunk_size
chunk_size --> chunk_size_digit : hex_digit
chunk_size_digit --> chunk_size_digit : more_hex_digits
chunk_size_digit --> chunk_size_otherwise : non_hex
chunk_size_otherwise --> chunk_parameters : semicolon
chunk_size_otherwise --> chunk_size_almost_done : CR
chunk_parameters --> chunk_parameters : param_chars
chunk_parameters --> chunk_size_almost_done : CR
chunk_size_almost_done --> check_chunk_size : LF
check_chunk_size --> span_start_body : size_gt_0
check_chunk_size --> headers_almost_done_trailers : size_eq_0
span_start_body --> consume_chunk_data
consume_chunk_data --> consume_chunk_data : reading_body
consume_chunk_data --> chunk_data_almost_done : all_bytes_read
chunk_data_almost_done --> invoke_chunk_complete : CR_LF
invoke_chunk_complete --> chunk_size : next_chunk
headers_almost_done_trailers --> trailer_headers : has_trailers
headers_almost_done_trailers --> message_complete : no_trailers
trailer_headers --> message_complete : trailers_done
message_complete --> [*]
Error States
stateDiagram-v2
[*] --> parsing
parsing --> error : invalid_input
state error {
HPE_INTERNAL
HPE_STRICT
HPE_LF_EXPECTED
HPE_UNEXPECTED_CONTENT_LENGTH
HPE_CLOSED_CONNECTION
HPE_INVALID_METHOD
HPE_INVALID_URL
HPE_INVALID_CONSTANT
HPE_INVALID_VERSION
HPE_INVALID_HEADER_TOKEN
HPE_INVALID_CONTENT_LENGTH
HPE_INVALID_CHUNK_SIZE
HPE_INVALID_STATUS
HPE_INVALID_EOF_STATE
HPE_INVALID_TRANSFER_ENCODING
}
parsing --> paused : callback_HPE_PAUSED
paused --> parsing : llhttp_resume
parsing --> paused_upgrade : upgrade_detected
paused_upgrade --> parsing : llhttp_resume_after_upgrade
Request/Response Detection (HTTP_BOTH mode)
stateDiagram-v2
[*] --> start_req_or_res
start_req_or_res --> req_or_res_method : H
start_req_or_res --> start_req : other_char
req_or_res_method --> req_or_res_method_1 : H
req_or_res_method_1 --> req_or_res_method_2 : E
req_or_res_method_1 --> req_or_res_method_3 : T
req_or_res_method_2 --> HEAD_method : AD
HEAD_method --> start_req : is_request
req_or_res_method_3 --> start_res : TP_slash
note right of start_req_or_res : Disambiguates between HTTP response and HEAD request
State Categories
| Initialization |
start, load_type |
Entry point, determine parsing mode |
| Method Parsing |
start_req, start_req_* |
Parse HTTP method (GET, POST, etc.) |
| URL Parsing |
url_*, span_start_stub_* |
Parse request URL components |
| Version Parsing |
req_http_*, res_http_* |
Parse HTTP/x.y version |
| Status Parsing |
res_status_* |
Parse response status code and text |
| Header Parsing |
header_field_*, header_value_* |
Parse header names and values |
| Body Handling |
consume_content_length, chunk_*, eof |
Handle message body |
| Completion |
message_complete, after_message_complete |
Finalize message parsing |
| Control |
closed, pause_*, error |
Connection state management |
Callbacks Triggered
on_message_begin |
Start of new message |
on_url |
URL data span |
on_url_complete |
URL parsing complete |
on_status |
Status text span (responses) |
on_status_complete |
Status parsing complete |
on_header_field |
Header name span |
on_header_field_complete |
Header name complete |
on_header_value |
Header value span |
on_header_value_complete |
Header value complete |
on_headers_complete |
All headers parsed |
on_body |
Body data span |
on_chunk_header |
Chunk size parsed |
on_chunk_complete |
Chunk data complete |
on_message_complete |
Entire message parsed |