Datadog

Fun with Datadog

Recently we started evaluating Datadog as a monitoring tool and a big part of that is the cloudwatch metrics, log aggregation, and tracing capabilities so you can have a “single pane of glass”. I’m going to talk about what I had to do to get the log collection and apm setup since we run in an elastic beanstalk environment.

Log Aggregation

This required installing the agent so it could do log forwarding to datadog. This also means the agent is collecting host metrics, so double win. What I had to do here was add an ebextensions file for the datadog install and configuration. I am installing datadog every time to get updates whenever they come out, but I had to check and make sure the config lines I needed to add were not already there so I don’t have a thousand lines of logs_enabled: true. This was easy enough to accomplish with an if as the command: if ! grep -q "logs_enabled: true" /etc/datadog-agent/datadog.yaml; then echo "logs_enabled: true" >> /etc/datadog-agent/datadog.yaml; fi and was even easier on windows because there are no comments so I can just check for what I don’t want and replace it with what I do want: powershell.exe -Command \"((Get-Content -Path c:\\ProgramData\\Datadog\\datadog.yaml -Raw) -replace 'logs_enabled: false','logs_enabled: true' | Set-Content c:\\ProgramData\\Datadog\\datadog.yaml)\"

I’ve create a new repo for these datadog ebextensions files so they can live in one place and be pulled in by all the different components. This also allows us to modify that config without requiring a new release. I use a files option to create the log config file and then in each build replace placeholder values for things that are different, like service name. There is also a restart in my ebextensions commands to pick up the new config changes

At this point the agent sees the logs and starts forwarding them and you are ready to do any parsing changes, etc on the logs that you need to do.

Tracing (APM)

Since we use beanstalk and single container deployment we could not use the datadog agent sidecar. Since we already had the agent installed I thought we could just setup the tracing agent to send spans to the host. The issue was how to set the ip when the same ebextensions deploys all 40 hosts. That’s when I found an article talking about using the docker0 network. Their local ip, 172.17.0.1, was the same as mine and the host I was experimenting with. So for now I have hardcoded that as an environment variable in my ebextensions file:

option_settings:
    - option_name: DD_AGENT_HOST
      value: 172.17.0.1 # $(ip -4 addr show docker0 | grep -Po 'inet \K[\d.]+')"

The next hurdle I had was that the agent listens on 127.0.0.1 by default so I had to change the bind_host key to 0.0.0.0. This is easy to explain but took awhile to figure out.

Log Correlation with Traces

Now I have logs flowing into datadog and traces flowing into datadog. The next step was to correlate the two so you can see the logs for a specific span. This is supposed to be as easy as setting the DD_LOGS_INJECTION environment variable to true. But I did not get the dd.trace_id= and dd.span_id= IDs inserted into my logs. I had to add `dd.trace_id=${mdc:item=dd.trace_id:default=-1} dd.span_id=${mdc:item=dd.span_id:default=-1}` to our logs before the ids would be spit out. Then I had to make sure to parse the ids as strings instead of numbers, which it does by default. Previously I was just doing a straight key/value import for attribtues like this: `%{data::keyvalue("=","/:{}\", \\[\\]")}` but now I had to parse the beginning of the log message explicitly so I could make the Datadog trace and span IDs strings: `dd.trace_id=%{word:dd.trace_id}\s+dd.span_id=%{word:dd.span_id}` I might have been able to do this with a helper rule to parse dd.trace_id, but I am not certain. The last piece of the puzzle was to setup a Trace Id Remapper specifying dd.trace_id as the attribute. Then logs starting showing up in my traces.