JASPAR familial binding sites

Description

We provide here the track hub corresponding to the JASPAR familial binding profiles that can be downloaded from theJASPAR database. In the track hub, users can find one track per species and genome assembly.

JASPAR is a regularly maintained open-access database storing manually curated transcription factors (TF) binding profiles as position frequency matrices (PFMs). PFMs summarize occurrences of each nucleotide at each position in a set of observed TF-DNA interactions. PFMs can be transformed to probabilistic or energistic models to construct position weight matrices (PWMs) or position-specific scoring matrices (PSSMs), which can be used to scan any DNA sequence to predict TF binding sites (TFBSs). The JASPAR database provides TFBSs predicted using the profiles in the CORE collection.

The motifs in JASPAR are collected in two ways:

In both cases, the selected motifs are manually curated. Specifically, our curators assess the quality of the motif and search for an orthogonal publication providing support to the motif as the bona fide motif recognized by the TF of interest (e.g., a motif found in ChIP-seq peaks looks similar to one found by SELEX-seq). The Pubmed ID associated with the orthogonal support is provided in the TF profile metadata.

JASPAR is the only database with this scope where the data can be used with no restrictions (open source). For a comprehensive review of models and how they can be used, please see the following reviews:

Display Conventions and Configuration

Boxes represent predicted binding sites for each of the familial binding profiles in JASPAR.

Each familial profile is named with the cluster number it belongs to. Furthermore, each familial profile is displayed using a specific RGB color according to the following table:

Methods

We adapted the methodology from (Vierstra et al Nature 2020) to generate familial binding profiles from the clusters found by the matrix clustering tool in this repository. We explicitly aimed at providing distinct familial binding profiles for motifs derived from dimer or monomer TF binding. Except for the motif clustering step, we used the default parameters in (Vierstra et al Nature 2020). As a final step, we removed the familial binding sites whose start coordinates were lower than 0 or end coordinates were greater than the chromosome size. These correspond to artifacts related to the expansion of original binding sites to fit the familial profile size.

Data Availability

JASPAR familial binding sites are publicly available at the JASPAR website.

Reference

All the data is publicly available. For further details, please refer to the associated publications:

Contact

If you have questions or comments, please write to: