Sanitizing BibTeX files¶
Note
Bibsane is integrated into StepUp RepRep as of version 2.3.
StepUp RepRep can clean up BibTeX files to fix issues
that would otherwise be difficult to spot or require tedious manual edits.
This feature was formerly implemented in an external tool called bibsane,
but is now integrated into StepUp RepRep.
The cleanup must always be performed after building the LaTeX document,
to be able to identify unused entries.
The following is a minimal example of the commands in a plan.py file that will clean up a BibTeX file:
from stepup.core.api import static
from stepup.reprep.api import compile_latex, sanitize_bibtex
static("paper.tex", "references.bib")
compile_latex("paper.tex")
sanitize_bibtex("references.bib", aux="paper.aux")
The sanitize_bibtex() function will read the .bib file,
and when provided with the corresponding .aux file,
it will identify which entries were actually cited in the LaTeX document.
It will then clean up the .bib file by removing unused entries,
fixing common formatting issues, and checking for missing or malformed fields.
The sanitize_bibtex() function also accepts a path_cfg argument to specify
a YAML configuration file for reprep-bibsane, i.e. the script that actually implements the cleanup.
(Without configuration file, a minimal cleanup is performed.)
For example, you can create a bibsane.yml file with the following content
to enable more checks and cleanups:
drop_entry_types: ["control"]
normalize_doi: true
duplicate_id: merge # other options: fail or ignore
duplicate_doi: merge # other options: fail or ignore
preambles_allowed: false
normalize_whitespace: true
fix_page_double_hyphen: true
# PyISO4 can be used to abbreviate journal names.
abbreviate_journal: true
custom_abbreviations:
CRAZY J0rnAL: Crazy J.
sort: true # sort key = {year}{normalized author list}{title}
citation_policies:
article:
author: must
journal: must
number: may
pages: must
title: must
volume: must
year: must
doi: must
book:
author: must
title: must
publisher: must
year: must
month: must
isbn: must
misc.url:
title: must
url: must
urldate: must
misc.dataset:
author: must
title: must
year: must
doi: must
urldate: must
publisher: must
Everything above citation_policies consists of global settings.
See BibsaneConfig for a full list of available settings.
Under citation_policies you can specify which keys are expected and allowed for each entry type.
The misc entry type is a catchall for diverse citations,
so you can specify subtypes like misc.url and misc.dataset.
In your BibTeX file, you can identify the subtype by adding a bibsane field.
For example: