Sanitizing BibTeX files¶
Note
Bibsane is integrated into StepUp RepRep as of version 2.3.
StepUp RepRep can clean up BibTeX files to fix issues
that would otherwise be difficult to spot or require tedious manual edits.
This feature was formerly implemented in an external tool called bibsane
,
but is now integrated into StepUp RepRep.
The cleanup must always be performed after building the LaTeX document,
to be able to identify unused records.
The following is a minimal example of the commands in a plan.py
file that will clean up a BibTeX file:
from stepup.core.api import static
from stepup.reprep.api import compile_latex, sanitize_bibtex
static("paper.tex", "references.bib")
compile_latex("paper.tex")
sanitize_bibtex("paper.aux")
The sanitize_bibtex()
function will read the .aux
file(s)
to identify the .bib
files used.
By default, it assumes there is just one .bib
file and rewrites it with the cleaned-up content.
If there are multiple .bib
files, you can specify an output file with path_out="clean.bib"
.
The sanitize_bibtex()
function also accepts a path_cfg
argument to specify
a YAML configuration file for rr-bibsane
, i.e. the script that actually implements the cleanup.
(Without configuration file, a minimal cleanup is performed.)
For example, you can create a bibsane.yml
file with the following content
to enable more checks and cleanups:
drop_entry_types: ["control"]
normalize_doi: true
duplicate_id: merge # other options: fail or ignore
duplicate_doi: merge # other options: fail or ignore
preambles_allowed: false
normalize_whitespace: true
fix_page_double_hyphen: true
# The json filename is relative to the location of the config file.
abbreviate_journal: 'journal_abbreviation_cache.json'
sort: true # sort key = {year}{first author lowercase normalized name}
citation_policies:
article:
author: must
journal: must
number: may
pages: must
title: must
volume: must
year: must
doi: must
book:
author: must
title: must
publisher: must
year: must
month: must
isbn: must
misc.url:
title: must
url: must
urldate: must
misc.dataset:
author: must
title: must
year: must
doi: must
urldate: must
publisher: must
Everything above citation_policies
consists of global settings.
See BibsaneConfig
for a full list of available settings.
Under citation_policies
you can specify which keys are expected and allowed for each entry type.
The misc
entry type is a catchall for diverse citations,
so you can specify subtypes like misc.url
and misc.dataset
.
In your BibTeX file, you can identify the subtype by adding a bibsane
field.
For example: