What do the variables mean in the input data?

  • group this is commonly know as the treatment arm and needs to be at least four characters long. Even if you have a single arm study then you still need this to exist as a constant value.
  • subjid brief for “subject identification”. A number unique to each patient, subject, or participant in the study.
  • term this is a text description of the adverse event, e.g. “nausea”. This does not strictly have to be taken from a dictionary (medDRA, CTCAE), but it really does help the quality.
  • soc acronym for “System Organ Class.” Which is the top-most level in a hierarchy of grouping AE taken from the medDRA dictionary. we have some flexibility here, and either a numerical medDRA code, or the corresponding text.
  • serious a 0/1 value indicating if the AE was serious.
  • related a 0/1 value indicating if the AE was related to treatment.
  • fatal a 0/1 value indicating if the AE resulted in death.

Why do I need to enter the numbers exposed

The input data is taken to be the set of adverse events that have been observed: one row of data per event. There may be subjects who did not experience any AEs if the intervention has a good safety profile. So to calculate incidence rates with a correct denominator we need this extra data; one number per group.

Excess Deaths?

The eudract system does allow the unusual scenario whereby not all fatalities are captured within the AE data. If there are any such fatalities then provide counts, per group. Generally, thought you can ignore this and use the default value of 0.

SOC coding in more detail

If you use the eudraCT system manually to enter AE data, then for each event you can enter a term using free text, but must choose the SOC from a drop-down list of 27 values. Behind the scenes this is recording a numerical code, the eutctId value given in outputs, that the eudraCT system took from sources which are rather hard to find at source.

Historically, the values were taken from eutct, from which a copy of the codes were taken as per the specification document appendix. Currently the SPOR Referentials Management System is the replacement although access permissions can be variable, and are outside our control, and there is no option to download in convenient tabular format.

Hence as part of the tools provided here, we include a data set that gives the Eudract code, medDRA code, english text, to link to your data. There is a choice to link via numerical code, or text, which is specified by the argument soc_index, and must be given a value of either meddra for the numerical option, or soc_term for the text option.

Cleaning free-text AE descriptions

Most trials have free text descriptions of AEs provided by a clinician as source data. Having variant spellings or descriptions of the same term in the data provided will be misleading and split the incidence rate into multiple parts. Unlike the SOC, the eudraCT system will accept free text for terms. It up to the study team to convert and code free text into an AE dictionary or find an alternative means to clean the data.

Can I edit details after uploading

Yes, please do review and ammend by hand using the Eudract portal manually. You may want to confirm if study-wide default values (AE dictionary, version number, incidence threshold,…) are accurate.

I want to use my own code to calculate the summary statistics

As described in the Specification this is entirely possible. You can use your own code to produce the counts and statistics as needed, and then use the latter part of the tool to convert them into xml files needed by the EudraCT portal.

You will need to manipulate your outputs into three data sets for: Group-level, serious event-group , non serious event- group.

The variable names within those data sets need to be as follows:


  • title
  • deathsResultingFromAdverseEvents
  • subjectsAffectedBySeriousAdverseEvents
  • subjectsAffectedByNonSeriousAdverseEvents
  • subjectsExposed
  • deathsAllCauses

Non Serious

  • groupTitle
  • subjectsAffected
  • occurrences
  • term
  • eutctId


  • groupTitle
  • subjectsAffected
  • occurrences
  • term
  • eutctId
  • occurrencesCausallyRelatedToTreatment
  • deaths
  • deathsCausallyRelatedToTreatment


You would then use the internal function create.safety_summary

create.safety_summary(GROUP, NONSERIOUS, SERIOUS)
simple_safety_xml(safety_statistics, file="simple.xml")
eudract_convert(input="simple.xml", output="table_eudract.xml")

Where the inputs GROUP, NONSERIOUS, SERIOUS, are the data provided by your own code.


You would need to start from around line 272 of the script . An issue is that SAS cannot have variable names as long as those described above. A solution is to provide labels to such variables that match the names above. There exist pre-saved versions of SAS data, with no rows, but valid column names/labels non serious template, serious template for reference.


You would need to save the data sets using xmlsave,

use non_serious
xmlsave "non_serious", legible replace

for example and then use the last few lines from line 224 of the script . Similar to SAS there is a limit of 32 characters to variable names, but labels can be used instead, e.g.

la var related "occurrencesCausallyRelatedToTreatment"