Proteggiamo il tuo ambiente digitale da qualsiasi attacco informatico. Sfrutta tutte le potenzialità della piattaforma SGBox!

Gallery

Contatti

Via Melchiorre Gioia, 168 - 20125 Milano

info@sgbox.it

+39 02 60830172

Search another article?

You are here:
< Back

1 Definition

A regex is a string of text that lets you create patterns that help match, locate, and manage text.
Regex can be a powerful and very fast method to extract parameters inside a log line to generate special objects, named on SGBox “Events”, to full evaluate a log line message.

2 Regex generation

Concepts

Matching and Capture

When build a regex, two main actions are mainly possible:

  • Match: you can use word/number o special combination to include to match the line and go on evaluating it
    ex. This is my IP. Regex: This .*? my IP.
  • Capture: in addition to simple match a word or another part of the string, you can also extract some information, by wrapping it by round parenthesis “()”. In this case the engine will separate information inside the round parenthesis from other text. These pieces of text can be turned in “Parameters” inside the “Events” SGBox objects.

Regex engine is very flexible and permit more operations, but only these action can be sufficient to generate a valid SGBox pattern.

Best Practice

Some rules must be keep in mind to improve efficency and decrease possible abnormal behaviors

  • The text must be as specific as possible in the first part. This because when regex engine start to parse the log line, it stop at first non-match from the beginning.
  • The match must be less greedier possible.
    Eg. not .* but instead .*?
PatternMatch
.*?matching any character
\s+Match any spaces
(\d+)Generic multiple numbers match and capture (Port, numerical id, numerical session id, numerical severity)
(\d{1,5})Match and capture port number (1-65535)
(\w+)Match and capture any single word
([\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3})Match and capture IP Address (not hostname)
([\d\.]+)Match and capture IP Address (not hostname)
([0-9a-fA-F:-]+)Match and capture any type of MacAddress
([0-9a-f:]+)Match and capture lowercase, : separated MacAddress
([0-9A-F:]+)Match and capture uppercase, : separated MacAddress
([0-9a-f-]+)Match and capture lowercase, - separated MacAddress
([0-9A-F-]+)Match and capture uppercase, - separated MacAddress
(.*?@.*?\..*?)Match and capture generic mail address
(?:alice|bob)Non-capturing OR match (at least one word must match)
(?:myparam)?Non-capturing optional match (match 0 or 1 time)
((\w+) .*?)Nested match and capture group. Capture first word and then the entire parameter

References

Global Summary Cheat Sheet

CharacterWhat does it do?ExampleMatches
^Matches beginning of line^abcabc, abcdef.., abc123
$Matches end of lineabc$my:abc, 123abc, theabc
.Match any charactersa.cabc, asg, a2c
|OR operatorabc|xyzabc or xyz
(...)Capture anything matched(a)b(c)Captures ‘a’ and ‘c’
(?:...)Non-capturing group(a)b(?:c)Captures ‘a’ but only groups ‘c’
[...]Matches anything contained in brackets[abc]a,b, or c
[^...]Matches anything not contained in brackets[^abc]xyz, 123, 1de
[a-z]Matches any characters between ‘a’ and ‘z’[b-z]bc, mind, xyz
{x}The exact ‘x’ amount of times to match(abc){2}abcabc
{x,}Match ‘x’ amount of times or more(abc){2,}abcabc, abcabcabc
{x,y}Match between ‘x’ and ‘y’ times.(a){2,4}aa, aaa, aaaaa
*Greedy match that matches everything in place of the *ab*cabc, abbcc, abcdc
+Matches character before + one or more timesa+cac, aac, aaac,
?Matches the character before the ? zero or one times. Also, used as a non-greedy matchab?cac, abc
\Escape the character after the backslash or create an escape sequence.a\sca c

Tools

Some tool can help you to create the right combination of regex

3 SGBox Pattern Creation and Add (Advanced)

Object definition

  • Parameter: a single extracted value that also permit correlation between different pattern/classes.
  • Pattern / Event Name: Name that identifies a specific event extracted from log.
  • Pattern / Event: An event/action/information extracted from 1 line of logs. In standard usage for every event correspond only 1 line of log.
  • Regex definition: the regex syntax that extract information from the log line that match.
  • Class: a container to group different Events.
Regex Pattern

Concepts

  • Parameter name: if possible, always assign a parameter name that is already present in the dropdown menu. For performance reason, avoid to create unnecessary parameter.
  • 💭 Capture always only the relevant information: try to convert log part as parameter only the information that you really need

Make a new pattern

To generate a new pattern you must go on LM -> Configuration -> Pattern and click on “New Pattern” button.

Now in the first part you can search the log you need to parse, by filter out the unnecessary logs and test your regex. In the right pane you can preview the captured group values that will be later transformed in parameters.

Here you must:

  1. Select the Hosts to retreive the logs.
  2. Select a compatible timerange to find the logs you need (try to reduce the timerange if the regex is correct but you cannot find anything, the search is limited to 100.000 lines due to performance limit).
  3. Enter the search or final regex
  4. Press “Search” to match the regex end extract the results
  5. In the right pane you can see the captured group match on the regex
Regex Pattern

Once you are sure that the correct logs is extracted, you can press the “Create” button and proceed to the Creation Window

Regex Pattern

Here you must:

  1. Check, fix or complete the regex
  2. Press the “Test” button to start the extract search for the example extracted log in the first box
  3. Once Parameters appear you must associate the Value column with a Parameter Name in the second column. Avoid to create new Parameter name unless absolutely essential (due to performance reason)
  4. Fullfill the Pattern Name and Description to be easily searchable in the pattern view
  5. Select “Create” to finalize the pattern creation