TurboVM Bytecode Engine¶
TurboVM is an optional high-performance normalization engine for liblognorm. It compiles rulebases into bytecode at startup and executes them through a linear virtual machine with SIMD-accelerated parsing primitives. When enabled, it provides significant throughput improvements over the default recursive parser, especially on high-volume log streams.
Overview¶
The default liblognorm normalization engine (the “recursive walker”) traverses the parse DAG node-by-node for each log message, allocating json-c objects as fields are extracted. TurboVM replaces this with:
Bytecode compilation: rulebases are compiled into a compact instruction sequence at load time. Each rule becomes a linear program.
Arena allocation: all per-message memory comes from a single pre-allocated arena (~16 KB), fitting in L1 cache. Zero malloc/free per message.
SIMD parsing: character scanning, delimiter search, whitespace skipping, and IP address parsing use SSE4.2 or NEON intrinsics when available.
Typed field output: extracted fields carry their native type (string, integer, double, boolean) instead of converting everything to JSON strings.
Nested JSON: dotted field names (e.g.
source.ip) produce properly nested JSON objects ({"source":{"ip":"..."}}), enabling direct ECS (Elastic Common Schema) output.
Building with TurboVM¶
TurboVM is an optional build feature, disabled by default:
./configure --enable-turbo
The build system automatically detects the CPU architecture and enables the appropriate SIMD instruction set:
x86-64: SSE4.2 (Intel Nehalem+, AMD Bulldozer+)
ARM64: NEON (all ARMv8-A processors, including Apple M1/M2)
Other: scalar fallback (functional but without SIMD acceleration)
No additional dependencies are required.
Using with lognormalizer¶
The lognormalizer command-line tool supports turbo mode via the
-oturbo option:
$ lognormalizer -r rules.rb -e json -oturbo < messages.log
In turbo mode:
Normalization uses the TurboVM bytecode engine
Output is compact JSON with nested objects for dotted field names
Numeric fields are emitted as native JSON numbers (not strings)
The
getline()system call is used for input (more efficient thanfgets()for large-scale processing)
If a rulebase cannot be compiled to bytecode (e.g. it uses unsupported parser types), lognormalizer falls back to standard normalization automatically.
Library API¶
To enable TurboVM in your application, set the LN_CTXOPT_TURBO option
on the normalization context before loading rules:
#include <liblognorm.h>
ln_ctx ctx = ln_initCtx();
ln_setCtxOpts(ctx, LN_CTXOPT_TURBO);
ln_loadSamples(ctx, "/path/to/rules.rb");
After loading, verify that compilation succeeded:
if (ln_turbo_is_available(ctx)) {
/* TurboVM ready — ln_normalize() will use the fast path */
}
For direct string output (bypassing json-c entirely):
char *json_str = NULL;
size_t json_len = 0;
int r = ln_normalize_to_str(ctx, msg, msg_len, &json_str, &json_len);
if (r == 0 && json_str) {
/* json_str contains the normalized JSON string */
free(json_str);
}
The standard ln_normalize() function also benefits from TurboVM
when it is enabled — the bytecode engine is used internally, with
automatic fallback to the recursive walker if needed.
Supported Parsers¶
TurboVM supports 32 of the 33 parser types defined in liblognorm v2. The following parsers are compiled to bytecode:
Text:
word,alpha,string,rest,char-to,char-separated,string-to,op-quoted-string,quoted-string,literalNumeric:
number,float,hexnumberNetwork:
ipv4,ipv6,mac48Date/Time:
date-rfc3164,date-rfc5424,date-iso,time-24hr,time-12hr,duration,kernel-timestampStructured:
json,cee-syslog,cef,v2-iptables,checkpoint-lea,name-value-listSpecial:
whitespace(as skip),cisco-interface-spec
The following parser type falls back to the legacy recursive engine:
repeat: requires recursive sub-rule invocation, which is outside the scope of the single-pass VM instruction set.
The fallback is automatic and transparent — rulebases using repeat
will still work correctly via the standard engine.
Performance Notes¶
Throughput improvements depend on the rulebase complexity and message format. Typical observations:
Simple rulebases (5-10 rules): 2-3x throughput improvement
Complex rulebases (50+ rules with alternatives): 5-10x improvement
The
ln_normalize_to_str()path avoids json-c entirely and provides the highest throughput for applications that consume JSON as strings
TurboVM adds no overhead when disabled (--disable-turbo or default).
When enabled but compilation fails for a specific rule, only that rule
falls back to the recursive walker — other rules still use bytecode.