Deciphering the sex bias in housekeeping gene expression in adipose tissue: a comprehensive meta-analysis of transcriptomic studies
Biology of Sex Differences volume 14, Article number: 20 (2023)
As the housekeeping genes (HKG) generally involved in maintaining essential cell functions are typically assumed to exhibit constant expression levels across cell types, they are commonly employed as internal controls in gene expression studies. Nevertheless, HKG may vary gene expression profile according to different variables introducing systematic errors into experimental results. Sex bias can indeed affect expression display, however, up to date, sex has not been typically considered as a biological variable.
In this study, we evaluate the expression profiles of six classical housekeeping genes (four metabolic: GAPDH, HPRT, PPIA, and UBC, and two ribosomal: 18S and RPL19) to determine expression stability in adipose tissues (AT) of Homo sapiens and Mus musculus and check sex bias and their overall suitability as internal controls. We also assess the expression stability of all genes included in distinct whole-transcriptome microarrays available from the Gene Expression Omnibus database to identify sex-unbiased housekeeping genes (suHKG) suitable for use as internal controls. We perform a novel computational strategy based on meta-analysis techniques to identify any sexual dimorphisms in mRNA expression stability in AT and to properly validate potential candidates.
Just above half of the considered studies informed properly about the sex of the human samples, however, not enough female mouse samples were found to be included in this analysis. We found differences in the HKG expression stability in humans between female and male samples, with females presenting greater instability. We propose a suHKG signature including experimentally validated classical HKG like PPIA and RPL19 and novel potential markers for human AT and discarding others like the extensively used 18S gene due to a sex-based variability display in adipose tissue. Orthologs have also been assayed and proposed for mouse WAT suHKG signature. All results generated during this study are readily available by accessing an open web resource (https://bioinfo.cipf.es/metafun-HKG) for consultation and reuse in further studies.
This sex-based research proves that certain classical housekeeping genes fail to function adequately as controls when analyzing human adipose tissue considering sex as a variable. We confirm RPL19 and PPIA suitability as sex-unbiased human and mouse housekeeping genes derived from sex-specific expression profiles, and propose new ones such as RPS8 and UBB.
Plain English Summary
Housekeeping genes (HKG) are involved in the maintenance of essential cellular functions. They usually present constant expression levels and are relevant because of their usefulness as internal controls in gene expression studies. However, HKG can vary the gene expression profile depending on different variables such as sex, introducing errors in the experimental results. In this study, we have performed an exhaustive systematic review and applied a massive analysis of expression data to check which HKG presents this bias and which do not. The results confirm that certain classical HKG do not perform adequately as controls when analyzing human adipose tissue considering sex as a variable. We further confirm the suitability of RPL19 and PPIA as human and mouse HKG without sex bias derived from sex-specific expression profiles, and propose new ones such as RPS8 and UBB. These results will be of great use in upcoming studies where expression data need to be normalized without the inclusion of sex bias.
A computational strategy based on massive data analysis revealed that an accurate experimental design for adipose tissue requires the adequate selection of a sex-unbiased housekeeping genes (HKG).
The extensively used 18S gene displays sex-based variability in adipose tissue, although PPIA and RPL19 do not, and hence, represent robust HKG.
New sex-unbiased human and mouse candidate HKG: RPS8 and UBB.
metafun-HKG (https://bioinfo.cipf.es/metafun-HKG): a freely available web tool to allow users to review stable expression levels of candidate HKG along the large volume of FAIR data.
Housekeeping genes (HKGs) are a large class of constitutively expressed genes subjected to low levels of regulation under various conditions. They generally perform biological actions fundamental to basic cellular functions such as the cell cycle, translation, metabolism of RNA, and cell transport [1, 2]. Thus, the stable expression of HKGs is assumed in all cells of an organism independent of the tissue, developmental stage, cell cycle state, or presence/absence of external signals [3, 4].
The use of internal controls when performing quantitative gene expression analysis (such as microarrays, RNA-sequencing [RNA-seq], and quantitative reverse transcriptase-polymerase chain reaction [qRT-PCR]) represents the most common strategy to normalize gene expression to correct for intrinsic errors related to sample manipulation and the technical protocol. The gene expression profiles obtained depend significantly on the reference genes employed as internal controls; therefore, inappropriate controls can lead to inaccurate results.
Given their fundamental roles, HKGs tend to display medium-high expression levels; this characteristic makes these genes especially suitable as internal controls/reference genes to normalize gene expression data in quantitative gene expression analysis [2, 5, 6]. Ideally, internal controls should exhibit stable gene expression across most sample types and experimental conditions to minimize undesired experimental variation; however, the literature suggests that the expression of commonly used HKGs varies depending on the experimental conditions and chosen setup and the analyzed tissue [5,6,7,8,9,10,11,12,13]. Importantly, these limitations do not invalidate the use of HKGs as a normalization strategy; instead, they support the need for a deeper understanding of how HKGs behave under different conditions or in distinct tissues. The stability of HKG expression must be validated under the particular conditions of interest of each study as a mandatory step , considering all experimental, biological, or clinical variables [7, 14,15,16]. Importantly, this should include sex as an essential variable.
The role of sex in biomedical studies has often been overlooked, despite evidence of sexually dimorphic effects in biological studies. Karp et al. recently demonstrated how sex phenotypically influenced a substantial proportion of mammalian traits, both in wildtype and mutants . Meanwhile, Oliva et al. reported the impact of sex on gene expression in various human tissues through metadata analysis by the GTEx platform, generating a catalog of sex-based differences in gene expression and the regulatory pathways involved . The authors revealed ubiquitous effects of sex on gene expression; however, they highlighted significant sex-based differences in human visceral and subcutaneous adipose tissue. Sex as an intrinsic variable has not been historically considered of immense importance. In a recent review of more than 600 animal research studies, 22% of publications did not specify animal sex . Of the reports that specified animal sex, 80% of publications included only males and 17% only females, leaving only 3% that considered animals of both sex . An analysis of the number of animal studies revealed a more significant disparity—16,152 males vs. only 3,173 females. Only seven studies (1%) reported sex-based results. Thus, the number of male-only studies and the use of male animals have become more disparate over time [20, 21]. Unfortunately, human counterpart studies do not provide any encouragement; while international institutions now consider sex as a critical variable [22, 23], the male perspective predominates in past studies. The lack of consideration of sex as a variable can accentuate/attenuate gene expression analysis, which has subsequent implications on biological or biomedical interpretations.
The quantitative analysis of gene expression data has allowed assessments of gene expression levels within different tissues and under various conditions, which has identified stable expression profiles/patterns [1, 9, 12, 24,25,26,27,28]. Public repositories of gene expression data have appeared in the last decades. The Gene Expression Omnibus (GEO) , a well-known international public repository, stores and allows access to gene expression data generated by different high-throughput technologies such as microarrays or next-generation sequencing. Exploiting and reusing the vast amount of data in these repositories has become a powerful tool for those searching for gene expression patterns across many diverse types of tissues and conditions.
A survey of 40 studies of human adipose tissue (AT) published since 2001 noted that 70% of papers employed the ACTB, GAPDH, and 18S HKGs as reference genes . Related studies have supported the use of additional HKGs (i.e., PPIA, HPRT, RPS18, or RPL19) in human AT-based studies [16, 30, 31]. Importantly, these studies failed to include sex as a biological variable, suggesting that these HKGs may not be as suitable as anticipated. In short, there exists an important limitation in gene expression studies due to the lack of inclusion of the sex perspective. In response, this study determines the gene expression variability levels of six HKGs commonly used in human and mouse adipose tissue (AT) and genes included in various whole-transcriptome microarrays available at GEO that consider sex as a covariable. Further, we identify novel candidate reference genes that do not display sex bias in human AT. We extend this analysis to experimental analyses of mouse models deposited in the GEO. Our findings reveal that studies generally lack sex specificity or employ mainly male animals; furthermore, certain conventional HKGs fail the requisite of being constitutively expressed in both sexes. Also, we establish new putative sex-unbiased HKGs (suHKGs) for gene expression analysis in male and female human AT, and putative orthologs for mouse AT. We present a general framework for reference gene selection that may be useful in gene expression studies and develop an open web tool to select adequate suHKGs according to customized experimental designs in AT.
Systematic review and data collection
A comprehensive systematic review was conducted to identify all available transcriptomics studies with adipose tissue samples at GEO. The review considered the fields: sample source (adipose), type of study (expression profiling by array), and organism of interest (Homo sapiens or Mus musculus). The search was carried out during the first quarter of 2020, with the review period covering the years 2000-2019. From the returned records, the study GSE ID, the platform GPL ID, and the study type were extracted using the Python 3.0 library Beautiful Soup. The R package GEOmetadb  was then used to identify microarray platforms and samples from adipose tissue. The top 4 and 5 most used platforms in Hsa (Table 1) and Mmu (Table 2), respectively, were selected. Given the complex nature of some of the studies, those with information regarding the sex of samples were manually determined, and the keywords used to annotate them homogenized. Finally, studies not meeting the following predefined inclusion criteria were filtered out: i) include at least 10 adipose tissue samples; ii) use one of the selected microarray platforms to analyze gene expression data; iii) present data in a standardized way; and iv) not include duplicate sample records (as superseries).
Data processing and statistical analyses
The normalized microarray expression data of the selected studies from GEO were downloaded using the GEOQuery R package. All the probe sets of each platform were converted to gene symbols, averaging expression values of multiple probe sets targeting the same gene to the median value.
Three statistical stability indicators were calculated for each gene in each study to determine the relative expression variability: the coefficient of variation (CV), the IQR/median, and the MAD/median. The CV, computed as the standard deviation divided by the mean, is used to compare variation between genes with expression levels at different orders of magnitude; however, extreme values can affect this value. Therefore, the interquartile range (IQR) divided by the median and the median absolute deviation (MAD) divided by the median (two statistics based on the median) were also considered. These measures provide more robustness in skewed distributions . Both statistics were multiplied by a correction factor of 0.75 and 1.4826 to make them comparable to the CV in normal distributions. Lastly, the gene variability scores per platform were expressed as the median of all statistics from the studies analyzed with each platform. These median values were ranked by gene variability value for each platform, with lower ranks corresponding to higher stability levels.
The described analysis pipeline was performed on three different sample groups based on sex and species: female Hsa, male Hsa, and all Mmu samples. The analysis was not performed separately for male and female mice due to the lack of female Mmu samples.
The gene variability ranks for each platform were integrated using the Rank Product (RP) method [35, 36], a non-parametric statistic identifying the elements that systematically occupy higher positions in ranked lists. This approach combines gene ranks rather than variability scores to create platform independence. The RankProd package [37, 38] was used to calculate the RP score for each gene (Eq. 1, where i is the gene, K the number of platforms, and rankij the position of gene i in the ranking of platform j). Three final rankings were obtained (one for each sample group [Mmu, Hsa female, and Hsa male samples]) by sorting the genes in increasing order of RP:
Selection of candidate HKGs
To encounter appropriate sex-unbiased HKG (suHKG) candidates, male and female Hsa samples were randomly selected, and the Mmu group was discarded. Gene functional information was then incorporated to exclude genes involved in metabolic alterations. The AnnotationDbi and org.Hs.eg.db annotation packages converted Gene Symbol to Gene name. After removing pseudogenes and non-coding genes, the associated GO terms of the remaining genes were obtained using the GO.db annotation package. Related information from all three gene ontologies were included (Biological Process, Molecular Function, Cellular component). Genes related to physiopathological conditions were filtered out, and a unique ranking by sex was generated (the male and female MetaRankings), which averages the three statistical rankings Eq. (2):
The difference in the ranking positions occupied by males and females was also calculated to reveal sex-based stability differences at a gene level.
Selecting stable suHKG with high levels of expression, followed several steps—we first (i) downloaded the "GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct.gz” file from GTEx; (ii) select the adipose tissue samples; (iii) take the gene median transcript per million (TPM) value in visceral adipose tissue; (iv) filter out from our sex-specific MetaRankings genes with median TPM < 20; (v) select the genes in the top 10% positions of each MetaRanking; and (vi) intersect the two top lists to find stable and highly expressed genes common to both sexes.
Study selection and sample processing
Subjects were recruited by the endocrinology and surgery departments at the University Hospital Joan XXIII (Tarragona, Spain) in accordance with the Helsinki declaration. Human visceral and subcutaneous AT samples were obtained during surgery from lean and obese male and female individuals. Total RNA was extracted from adipose tissue using the RNeasy lipid tissue midi kit (Qiagen Science). One microgram of RNA was reverse transcribed with random primers using the reverse transcription system (Applied Biosystems) .
Mouse AT was obtained from wild type and Irs2−/−  (insulin resistance and type 2 diabetes model) C57BL/6 littermates. According to the criteria outlined in the “Guide for the Care and Use of Laboratory Animals”, all animals received humane care . Total RNA was extracted from abdominal fat using a combined protocol including Trizol (Sigma) and RNeasy Mini Kit (Qiagen) with DNaseI Digestion. First-strand synthesis was performed using EcoDry Premix (Takara).
Gene expression analysis
Quantitative gene expression analysis was performed on 50 ng cDNA template. Real time-PCR was conducted in a LightCycler 480 Instrument IIR (Roche) using SYBR PreMix ExTaqTM (mi RNaseH Plus, Takara). Genes selected as potential HKG in human and mouse WAT were 18 s, PPIA and RPL19. Primers were designed in two consecutive exons, when possible, taking into consideration all reference sequences for mRNA in NCBI (https://www.ncbi.nlm.nih.gov/gene/; https://www.ncbi.nlm.nih.gov/nuccore/)  and aligned to search for common regions with Pairwise Sequence Alignment (https://www.ebi.ac.uk/Tools/psa/) . Alternative transcript variants were analyzed by AceView (https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/index.html)  and primers (designed either by Primer3 or PrimerBlast) amplifying most represented sequence/s were chosen (Additional file 1). All primers used in this study are noted in Additional file 2: Table S1. Crossing point (Cp) values were analyzed for stability between samples and relative quantification using 2^-ΔCt. Statistical analyses were performed with GraphPad Prism 8 (Graphpad Software V 8.0). The results are expressed as arithmetic mean ± the standard error of the mean (SEM). When two data sets were compared, a Student's t-test was used. The differences observed were considered significant when: p-value < 0.05 (*), p-value < 0.01 (**) and p-value < 0.001 (***).
A freely available web tool, called metafun-HKG (https://bioinfo.cipf.es/metafun-HKG) was created during this study to allow users to review and share the large volume of generated data and results. The front-end was developed using the Bootstrap library. This easy-to-use resource is organized into four sections: (1) a quick summary of the results obtained with the analysis pipeline in each phase. Then, for each of the studies, the detailed results of the (2) exploratory analysis and (3) variability assessment. Finally, all results are integrated and summarized in (4) gene stability meta-analysis by sex and organism. The user can interact with the web tool through graphics and tables and search information for specific genes.
Classic HKG selection
An extensive bibliographic review revealed that reference genes chosen for qRT-PCR-mediated analysis of gene expression in human AT or various types of adipocytes generally included the metabolic genes GAPDH [7, 14,15,16, 39, 44], HPRT [7, 16], PPIA [14, 39, 44], UBC [14, 45] and ribosomal genes 18S [7, 14, 16, 39, 46,47,48] and RPL19 . As these genes have been commonly used to analyze gene expression as reference genes in several experimental conditions (although the sex variable was generally not considered), we selected these six classic human AT HKG genes for evaluation when considering sex as a variable to assess their suitability as sex-unbiased HKG (suHKGs). In the case of 18S, we specifically selected 18S5 for our analysis.
Systematic review and data collection
We searched the GEO by defining the sample tissue, type of study, and organism of interest and obtained a total of 187 and 214 candidate studies for Homo sapiens (Hsa) and Mus musculus (Mmu), respectively. We selected the main microarray platforms for each species that contained the greatest number of studies; this provided 4 and 5 platforms for Hsa (Table 1) and Mmu (Table 2), respectively. We excluded 138 and 171 studies of Hsa and Mmu, respectively, as they failed to meet the inclusion criteria. Finally, we selected 49 Hsa studies and 43 Mmu studies for sex-based evaluations (Fig. 2), which involved 2724 Hsa and 1072 Mmu samples.
In Hsa, 24 (51%) of the 49 selected studies included sample information regarding sex. 10 studies covered both sexes in their analysis, while 11 included females exclusively, and 3 contained only male samples (Fig. 3A). In Mmu, 22 (51%) of the 43 selected studies informed about the sex of samples; only 1 study covered both sexes while 2 included exclusively female samples and 19 contained only male samples (Fig. 3B). Finally, we selected human samples with known sex information (681 male and 875 female samples, Additional file 2: Table S2 and Fig. S1) and all mouse samples (1072 samples, 559 known to be male and 34 from female, Additional file 2: Table S3 and Fig. S2) for analysis. Due to the low number of known female samples in mice, we excluded Mmu studies from this sex-based analysis.
Stability data meta-analysis
After downloading and annotating normalized expression data sets for the selected studies, we calculated three estimators of variability: the coefficient of variation (CV), the interquartile range divided by the median value (IQR/median), and the mean absolute deviation divided by the median value (MAD/median). Additional file 2: Fig. S3, S4, and S5 summarize the levels of variability of the six selected HAT HKGs (UBC, RPL19, RNA18S5, PPIA, HPRT1, and GAPDH) for male and female Hsa and Mmu.
We conducted a meta-analysis based on the Rank Product (RP) method to integrate statistical results from different platforms; this approach combines gene ranks rather than variability scores (creating platform independence) and identifies the elements that systematically occupy higher positions in ranked lists (giving to each element in the ranking an RP score). We calculated the RP score of 41,975 and 47,203 Hsa and Mmu genes, respectively, and then sorted all genes—in this ranking, lower positions indicate higher expression stability. 18S displayed significant variability in Hsa in both males and females; however, this gene represented the second most stable selected HKG in Mmu. Figure 4 depicts the positions occupied by the six selected HAT HKGs in Mmu, Hsa males, and Hsa females. Surprisingly, HKG stability in humans differed between female and male samples, with females displaying greater instability. Accessing the Metafun-HKG webtool provides the whole rankings with the positions and RP scores of all evaluated genes in each experimental condition.
In order to decipher differences in gene expression stability between male and female AT samples, we conducted a deconvolution analysis. Overall, we did not find consistent differences between the sexes in cellular composition across datasets (Additional file 3).
To select sex-unbiased, highly expressed, and stable human AT HKG candidates, we combined the scores of the three statistical approaches in a unique list of positions for each experimental condition (metaRanking) and filtered out genes with low expression (TPM < 20) in the GTEx database. These steps provided a list of 5,315 genes. We next intersected the top 10% (532) most stable genes in the Hsa male and Hsa female metaRankings separately, which resulted in a list of 195 candidate suHKGs (http://bioinfo.cipf.es/metafun-HKG/). This analysis revealed relative stability and expression values high enough for detection by different gene expression analysis technologies in total Hsa samples (Table 3, Fig. 5). From this list, we selected human AT HKGs that included the classical HKGs PPIA, UBC, RPL19, and RPS18 and the additional novel candidate suHKGs RPS8 and UBB. We also detected stable, highly expressed genes in one sex but not in the other (such genes included ANXA2, DDX39B, and PLIN4 in males and DNASE2, NDUFB11, and RARA in females (Additional file 2: Table S4, Fig. 5), which may be used as sex-specific reference genes. We failed to find the expression of the 18S gene in GTEx, although we searched for different aliases (RNA18S5, RNA18S1, RNA18SN1, RNA18SN5, RN18S1).
We selected PPIA, RPL19, and 18S for experimental validation according to our computational assessment of variability. We analyzed human AT mRNA from lean and obese male and female individuals by qPCR to validate the previous computational metadata analysis (Table 3; Fig. 6). Raw crossing point (Cp) value coefficient variation (CV) analysis revealed similar Cp values between male and female samples, with low CV values for PPIA and RPL19 (Fig. 6A); however, 18S exhibited significant differences in Cp values between male and female samples, which displayed high CV values (Fig. 6A). Further, gene expression analysis of multiple experimental targets revealed differing patterns when using PPIA or RPL19 compared to 18S as a HKG (Fig. 6B). We analyzed several genes involved in physiological and metabolic adipose tissue functions (e.g., IRS1, LEPR, and PPARγ) in male and female human AT samples under two different physiological conditions using potential suHKG candidates. Results obtained provided evidence for the suitability of RPL19 and PPIA as suHKGs and disqualified 18S as a HKG when considering sex as a variable (Fig. 6B). Overall, the experimental procedures validate the computational metadata analysis, discarding 18S and selecting PPIA and RPL19 as suHKG for HAT analysis.
To circumnavigate the lack of sex-based Mmu data to compute a Mmu metaRanking, we experimentally evaluated mouse orthologs (Ppia, Rpl19, and 18 s) of validated human suHKGs, in wt and in an insulin resistance, Irs2−/− ko model in male and females. Relative gene expression analysis demonstrated that the internal control affected the relative expression of different experimental targets in different experimental mouse models. 18 s used as HKG alters relative gene expression of InsR, Lepr, and Phb in males and females mouse AT samples, while Ppia and Rpl19 succeed as suHKG in mouse AT samples (Additional file 2: Fig. S6). These results confirm that mouse homologs of suHKG candidates can be used in mouse-based gene expression studies.
Metafun-HKG web tool
We created the open platform web tool Metafun-HKG (https://bioinfo.cipf.es/metafun-HKG) to allow easy access to any information related to this study. This resource contains information related to the study samples, systematic revision, gene variability scores, and stability rankings. The stability indicators for each gene evaluated by platform, species, and sex can be freely explored by users to identify profiles of interest.
Assessment of suHKG candidates
The two main objectives of this work were (i) evaluating the suitability of a group of six classic HKGs acting as human AT suHKGs and (ii) identifying genes with a stable, high expression profile that represent new Human AT suHKG candidates. Our novel strategy has reviewed the role of HKGs by considering sex, species, and platform as variables in evaluated studies.
We performed our analysis on three different sample groups based on sex and species: female Hsa, male Hsa, and all Mmu samples. We did not analyze Mmu female and male samples separately due to the lack of reported female Mmu samples in the selected studies. HKGs displayed platform-dependent variability under all conditions, given that each microarray platform has its probe design and technical protocol. Previous studies on technology dependence concluded that this factor has less determining power than the differences in transcript expression levels caused by varying cell conditions .
Results exhibit considerable differences in gene stability, including stability differences in the six classical selected HKGs between Hsa female and male samples showing higher instability in females in general term. PPIA, UBC, and RPL19 displayed high stability levels for samples from both sexes, while HPRT1 and 18S exhibited low stability levels in both sexes. Interestingly, GAPDH displayed high stability in male samples and low stability in female samples. In apparent contradiction, 18s presents high stability levels in Mmu, but this may be explained by the overwhelming presence of male samples in this group and the fact that this gene suffers a significant sex bias in mouse (Additional file 2: Fig. S6). The common absence of female samples in studies (as further evidenced by our systematic review) could explain the systematic reports of 18s as a stable HKG.
The results of this work showed a different pattern of instability of HKG expression and we wondered whether this might be related to a different distribution of cell types in males and females. To address this relevant question, a deconvolution analysis was performed in each study, which allowed us to compare all male and female participants in each microarray dataset. Deconvolution studies showed a heterogeneous cell panorama characteristic of human adipose biopsies including progenitor and differentiated adipocytes, and immune cells lineages among others. The joint evaluation of the results of all the studies showed that there were no differences in cell composition between males and females, so no relationship was identified between the patterns of instability in the expression of these genes and their cell type distribution (Additional file 3). The analysis of single-cell RNA-seq data in adipose tissue samples may provide complementary information of interest to evaluate these differential patterns by sex. There are currently few datasets of this technology, although its generation is increasing and will be an important and accurate source of information.
We propose a list of 195 suHKG candidates suitable for use as internal controls in HAT-based gene expression studies including male and female samples; these genes exhibit high expression (TPM > 20) and stability levels and a minimal influence of sex on expression patterns. As we could not reproduce the pipeline followed with human samples in mouse studies due to the lack of female mouse samples, we suggest the orthologs of proposed human suHKGs as mouse suHKGs.
We validated a selection of suHKG candidates experimentally to assess the robustness of our computational findings; overall, our gene expression analysis validated the in silico results (Table 3). PPIA, a widely used human AT HKG, and RPL19, used as a HKG in several cell types [30, 31, 50] and occasionally in human AT studies , have been validated as human AT suHKGs; however, experimental validation shows that 18S, which is widely used as human AT HKG [7, 14, 16, 39, 46,47,48], displays significant levels of variability in both male and female samples and sex-specific expression patterns (Fig. 6). These results agree with the findings of other recently published studies  and correlate with those found in mouse adipose tissue. The use of 18 s as a HKG induces apparent differences in the relative expression levels of several genes in males and females and wild type and Irs2−/− samples (Additional file 2: Fig. S6); instead, we suggest Rpl19 and Ppia as more optimal suHKGs in mouse adipose tissue analysis.
We identified several additional genes human AT suHKGs from the computational analysis, including RPS18, RPS8, and UBB (Table 3), that present characteristics such as appropriate stable and high expression levels. We also suggest the mouse orthologs of these human suHKGs as mouse suHKGs. To this end, we designed a web tool to customize the best suHKG for human or mouse adipose tissue experimental design.
Strengths and limitations
Massive data analysis of gene expression represents a pivotal tool for understanding different biological scenarios, which may eventually help elucidate mechanisms affecting basic and biomedical research. Data analyses must be assessed in the laboratory by studying relative gene expression normalized to an adequately chosen HKG. Selection of an ideal HKG remains a challenging process, although this choice will help to ensure an accurate result and must consider all experimental conditions and biological variables. Incorporating sex-based analyses into research will improve reproducibility and experimental efficiency by influencing the outcome of experiments and must be accounted for as a critical biological variable. Sex must be considered to monitor sex-based differences and similarities for all diseases and biological processes that affect both sexes, which may help reduce bias, enable social equality in scientific outcomes, and encourage new opportunities for discovery and innovation, as evidenced by several studies analyzing this issue [20, 22, 52,53,54,55].
Numerous lines of evidence suggest that the current status quo does not address fundamental issues of sex-based differences evident in gene expression. Up to date, many classic HKGs remain unevaluated when including sex as a biological variable; these include those commonly used in human AT studies (e.g., ACTB, GAPDH, and 18S) and additional HKGs such as PPIA, HPRT, RPS18, or RPL19. Using a HKG to normalize samples without assessing their behavior under the specific experimental conditions used in each study (including sex), may lead to a biased outcome. HKGs may remain stable in one sex but not in the other, as in the case of DDX39B and PLIN4 (stable in males) or NDUFB11 and RARA (stable in females), or may have stable yet distinct expression levels in both sexes, such as for 18 s in mouse. Ignoring sex and choosing a non-optimal HKG may introduce confounding variables and the inability to assess whether differences in the data derived from the experimental design or the normalization process. This source of variability in the data would reduce statistical power, thereby making it more difficult to find significant results. In this study, we analyzed the role of six conventional HAT HKG considering sex as a variable for the first time.
Many published studies do not include a sex-based perspective by omitting animal sex from reporting of the animals or performing studies with animals of only one sex (typically males). Our systematic review found that 51% of Hsa studies and 49% of Mmu studies failed to include information regarding the sex of samples, with just 19% of Hsa and a striking 2% of Mmu studies including samples from both sexes. Of note, Mmu studies including only female samples represented just 5% of the total. The small number of Mmu studies, including female sample information, represented a significant limitation of the study and prevented the creation of a Mmu meta-ranking to select highly expressed stable Mmu suHKG candidates as for Hsa. We evaluated the Mmu orthologs of the selected Hsa suHKG candidates experimentally to overcome this limitation, which confirmed their suitability as Mmu suHKGs.
Despite the widespread use of 18S RNA as a HKG, its annotation represents another limiting factor of this study; we failed to encounter this gene in the GTEx platform under any proposed alias from GeneCards. We also noted that identifiers for this gene are unstable or not included in reference assemblies. In addition, the DNA sequence of the RNA18SN5 gene (Accession Number NR_003286.4) has 99–100% identity with other ribosomal RNAs such as RNA18SN1, RNA18SN2, RNA18SN3, RNA18SN4, and RNA18SP3 (Accession Numbers NR_145820.1, NR_146146.1, NR_146152.1, NR_146119.1, NG_054871.1, respectively). Furthermore, 18S rRNA has different copy numbers among individuals and varies with age . Considering all these factors, and integrating experimental data assessing differential expression levels according to sex, makes the 18S gene less suitable as a HAT suHKG than other suHKGs proposed in this study.
Other limitations of the study included the filtering and pre-processing of biological information located in the GEO to identify the published studies with transcriptomic data of adipose tissue, and the classification of the samples depending on the sex. A primary limiting factor involved the absence of standardized vocabulary to tag sex in sample records of the studies. Even though the gene expression data in GEO are presented as a standardized expression matrix, the metadata (including sample source, tissue type, or sample sex) is reported through free-text fields written by the researcher submitting the study. The absence of standardized vocabulary and structured information constrains data mining power on large-scale data, and improvements in this regard could aid the processing of data in public repositories .
For the first time, this study presents a computational strategy that includes a massive data analysis capable to assess the sex bias in expression levels of classical and novel HKGs, over a large volume of studies and samples. This strategy revealed that an accurate experimental design for adipose tissue requires the adequate selection of a suHKG, such as PPIA, RPL19, or new options, such as RPS18 or UBB. In that context, we could finally avoid the common practice of pooling males and females or even discard the only male-presence effect. This study presents the relative expression stability of six commonly used HKGs and the variability levels of other genes covered by the analyzed microarray platforms. This strategy is aligned with the FAIR principles  (Findability, Accessibility, Interoperability, and Reusability) to ensure the further utility and reproducibility of the generated information.
Although limited to adipose tissue, our findings suggest that the sex bias in commonly used HKGs could appear in other tissues, thereby affecting the normalization process of gene expression analysis of any kind. Incorrect normalization may significantly alter gene expression data, as shown in the case of 18S, and lead to erroneous conclusions. This study highlights the importance of considering sex as a variable in biomedical studies and provides evidence that thorough analyses of HKGs as internal controls in all tissues should be promptly addressed.
Perspectives and significance
Our results focus on the importance of taking into consideration sex as a biological variable when choosing the best HKG as reference in HAT gene expression analysis. Our novel computational strategy includes massive data analysis capable to assess the sex bias in expression levels of classical and novel HKGs to select sex-unbiased HKG. Conventionally reported HKG genes include several metabolic and ribosomal genes such as GAPDH, HPRT, PPIA, UBC, 18S and RPL19. However, our novel computational strategy based on meta-analysis techniques has proven that certain classical HKGs, like one of the most extended, 18S, may fail to function adequately as the reference gene as it differentially expressed in males and females, while others like PPIA and RPL19, succeeded as reference genes. Further, following selection criteria, several markers, like RPS8 and UBB are also proposed and an open web resource (https://bioinfo.cipf.es/metafun-HKG) offered for customized experimental design.
All these results provide new useful insight in evaluating gene expression analysis in human adipose tissue under several experimental conditions and with biomedical purposes. Using an incorrect HKG may lead to inappropriate results interpretation and applications, while using a suHKG will always provide a better experimental approach, either when taking into consideration male and females as separate groups, either included in the same experimental group but properly analyzed. This study highlights the importance of considering sex as a variable in gene expression analyses in human AT and provides evidence for future extensive tissues suHKG selection to be hopefully, promptly addressed.
All datasets used in this work are available in the Gene Expression Omnibus public repository.
Chang CW, Cheng WC, Chen CR, et al. Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis. PLoS ONE. 2011;6(7): e22859. https://doi.org/10.1371/journal.pone.0022859.
Caracausi M, Piovesan A, Antonaros F, Strippoli P, Vitale L, Pelleri MC. Systematic identification of human housekeeping genes possibly useful as references in gene expression studies. Mol Med Rep. 2017;16(3):2397–410. https://doi.org/10.3892/mmr.2017.6944.
Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29(10):569–74. https://doi.org/10.1016/j.tig.2013.05.010.
Butte AJ, Dzau VJ, Glueck SB. Further defining housekeeping, or “maintenance,” genes focus on “a compendium of gene expression in normal human tissues.” Physiol Genom. 2001;7(2):95–6. https://doi.org/10.1152/physiolgenomics.2001.7.2.95.
Dheda K, Huggett JF, Bustin SA, Johnson MA, Rook G, Zumla A. Validation of housekeeping genes for normalizing RNA expression in real-time PCR. Biotechniques. 2004;37(1):112–9. https://doi.org/10.2144/04371RR03.
Bustin SA, Benes V, Garson J, et al. The need for transparency and good practices in the qPCR literature. Nat Methods. 2013;10(11):1063–7. https://doi.org/10.1038/nmeth.2697.
Chechi K, Gelinas Y, Mathieu P, Deshaies Y, Richard D. Validation of reference genes for the relative quantification of gene expression in human epicardial adipose tissue. PLoS ONE. 2012;7(4): e32265. https://doi.org/10.1371/journal.pone.0032265.
Stürzenbaum SR, Kille P. Control genes in quantitative molecular biological techniques: the variability of invariance. Comp Biochem Physiol B Biochem Mol Biol. 2001;130(3):281–9. https://doi.org/10.1016/S1096-4959(01)00440-7.
She X, Rohl CA, Castle JC, Kulkarni AV, Johnson JM, Chen R. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics. 2009;10(1):269. https://doi.org/10.1186/1471-2164-10-269.
Zhu J, He F, Song S, Wang J, Yu J. How many human genes can be defined as housekeeping with current expression data? BMC Genomics. 2008;9(1):172. https://doi.org/10.1186/1471-2164-9-172.
Suzuki T, Higgins PJ, Crawford DR. Control selection for RNA quantitation. Biotechniques. 2000;29(2):332–7. https://doi.org/10.2144/00292rv02.
Hsiao LL, Dangond F, Yoshida T, et al. A compendium of gene expression in normal human tissues. Physiol Genomics. 2001;7(2):97–104. https://doi.org/10.1152/physiolgenomics.00040.2001.
de Jonge HJM, Fehrmann RSN, de Bont ESJM, et al. Evidence based selection of housekeeping genes. PLoS ONE. 2007;2(9): e898. https://doi.org/10.1371/journal.pone.0000898.
Gabrielsson BG, Olofsson LE, Sjögren A, et al. Evaluation of reference genes for studies of gene expression in human adipose tissue. Obes Res. 2005;13(4):649–52. https://doi.org/10.1038/oby.2005.72.
Heo JS, Choi Y, Kim HS, Kim HO. Comparison of molecular profiles of human mesenchymal stem cells derived from bone marrow, umbilical cord blood, placenta and adipose tissue. Int J Mol Med. 2016;37(1):115–25. https://doi.org/10.3892/ijmm.2015.2413.
White JM, Piron MJ, Rangaraj VR, Hanlon EC, Cohen RN, Brady MJ. Reference gene optimization for circadian gene expression analysis in human adipose tissue. J Biol Rhythms. 2020;35(1):84–97. https://doi.org/10.1177/0748730419883043.
Karp NA, Mason J, Beaudet AL, et al. Prevalence of sexual dimorphism in mammalian phenotypic traits. Nat Commun. 2017;8(1):15475. https://doi.org/10.1038/ncomms15475.
Oliva M, Muñoz-Aguirre M, Kim-Hellmuth S, et al. The impact of sex on gene expression across human tissues. Science. 2020. https://doi.org/10.1126/science.aba3066.
Yoon DY, Mansukhani NA, Stubbs VC, Helenowski IB, Woodruff TK, Kibbe MR. Sex bias exists in basic science and translational surgical research. Surgery. 2014;156(3):508–16. https://doi.org/10.1016/j.surg.2014.07.001.
Tannenbaum C, Ellis RP, Eyssel F, Zou J, Schiebinger L. Sex and gender analysis improves science and engineering. Nature. 2019;575(7781):137–46. https://doi.org/10.1038/s41586-019-1657-6.
Woitowich NC, Beery A, Woodruff T. A 10-year follow-up study of sex inclusion in the biological sciences. Elife. 2020;9: e56344. https://doi.org/10.7554/eLife.56344.
McCullough LD, de Vries GJ, Miller VM, Becker JB, Sandberg K, McCarthy MM. NIH initiative to balance sex of animals in preclinical studies: generative questions to guide policy, implementation, and metrics. Biol Sex Differ. 2014;5:15. https://doi.org/10.1186/s13293-014-0015-5.
Nature. Accounting for sex and gender makes for better science. Nature. 2020;588(7837):196–196. https://doi.org/10.1038/d41586-020-03459-y.
Lee PD, Sladek R, Greenwood CMT, Hudson TJ. Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res. 2002;12(2):292–7. https://doi.org/10.1101/gr.217802.
Lee SR, Jo MJ, Lee JE, Koh SS, Kim SY. Identification of novel universal housekeeping genes by statistical analysis of microarray data. BMB Rep. 2007;40(2):226–31. https://doi.org/10.5483/BMBRep.2007.40.2.226.
Popovici V, Goldstein DR, Antonov J, Jaggi R, Delorenzi M, Wirapati P. Selecting control genes for RT-QPCR using public microarray data. BMC Bioinformatics. 2009;10(1):42. https://doi.org/10.1186/1471-2105-10-42.
Pilbrow AP, Ellmers LJ, Black MA, et al. Genomic selection of reference genes for real-time PCR in human myocardium. BMC Med Genomics. 2008;1(1):64. https://doi.org/10.1186/1755-8794-1-64.
Zhang Y, Li D, Sun B. Do housekeeping genes exist? PLoS ONE. 2015. https://doi.org/10.1371/journal.pone.0123691.
Home—GEO—NCBI. https://www.ncbi.nlm.nih.gov/geo/. Accessed 1 Dec 2021.
Galan A, Diaz-Gimeno P, Poo ME, et al. Defining the genomic signature of totipotency and pluripotency during early human development. PLoS ONE. 2013;8(4): e62135. https://doi.org/10.1371/journal.pone.0062135.
Galán A, Simón C. Monitoring stemness in long-term hESC cultures by real-time PCR. In: Turksen K, editor. Human embryonic stem cell protocols. Methods in molecular biology. Totowa: Humana Press; 2010. p. 135–50. https://doi.org/10.1007/978-1-60761-369-5_8.
Team R. Core. R: A language and environment for statistical computing. 2013.:16.
Zhu Y, Davis S, Stephens R, Meltzer PS, Chen Y. GEOmetadb: powerful alternative search engine for the gene expression omnibus. Bioinformatics. 2008;24(23):2798–800. https://doi.org/10.1093/bioinformatics/btn520.
Arachchige C, Prendergast L, Staudte R. Robust analogs to the coefficient of variation. J Appl Stat. 2020. https://doi.org/10.1080/02664763.2020.1808599.
Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 2004;573(1–3):83–92. https://doi.org/10.1016/j.febslet.2004.07.055.
Mitchell L. A parallel implementation of the Rank Product method for R. 2011; 12.
Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics. 2006;22(22):2825–7. https://doi.org/10.1093/bioinformatics/btl476.
Del Carratore F, Jankevics A, Eisinga R, Heskes T, Hong F, Breitling R. RankProd 2.0: a refactored bioconductor package for detecting differentially expressed features in molecular profiling datasets. Bioinformatics. 2017;33(17):2774–5. https://doi.org/10.1093/bioinformatics/btx292.
Ceperuelo-Mallafré V, Duran X, Pachón G, et al. Disruption of GIP/GIPR axis in human adipose tissue is linked to obesity and insulin resistance. J Clin Endocrinol Metab. 2014;99(5):E908-919. https://doi.org/10.1210/jc.2013-3350.
Withers DJ, Gutierrez JS, Towery H, et al. Disruption of IRS-2 causes type 2 diabetes in mice. Nature. 1998;391(6670):900–4. https://doi.org/10.1038/36116.
Sayers EW, Bolton EE, Brister JR, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50(D1):D20–6. https://doi.org/10.1093/nar/gkab1112.
Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–53. https://doi.org/10.1016/0022-2836(70)90057-4.
Thierry-Mieg D, Thierry-Mieg J. AceView: a comprehensive cDNA-supported gene and transcripts annotation. Genome Biol. 2006;7(1):S12. https://doi.org/10.1186/gb-2006-7-s1-s12.
Amisten S. Quantification of the mRNA expression of G protein-coupled receptors in human adipose tissue. Methods Cell Biol. 2016;132:73–105. https://doi.org/10.1016/bs.mcb.2015.10.004.
Su X, Yao X, Sun Z, Han Q, Zhao RC. Optimization of reference genes for normalization of reverse transcription quantitative real-time polymerase chain reaction results in senescence study of mesenchymal stem cells. Stem Cells Dev. 2016;25(18):1355–65. https://doi.org/10.1089/scd.2016.0031.
Ebrahimi R, Bahiraee A, Alipour NJ, Toolabi K, Emamgholipour S. Evaluation of the housekeeping genes; β-actin, glyceraldehyde-3-phosphate-dehydrogenase, and 18S rRNA for normalization in real-time polymerase chain reaction analysis of gene expression in human adipose tissue. Arch Med Lab Sci. 2018. https://doi.org/10.22037/amls.v4i3.26269.
Gómez-Abellán P, Díez-Noguera A, Madrid JA, Luján JA, Ordovás JM, Garaulet M. Glucocorticoids affect 24 h clock genes expression in human adipose tissue explant cultures. PLoS ONE. 2012;7(12): e50435. https://doi.org/10.1371/journal.pone.0050435.
Petrus P, Mejhert N, Corrales P, et al. Transforming growth factor-β3 regulates adipocyte number in subcutaneous white adipose tissue. Cell Rep. 2018;25(3):551-560.e5. https://doi.org/10.1016/j.celrep.2018.09.069.
Catalano-Iniesta L, Sánchez Robledo V, Iglesias-Osma MC, et al. Evidences for expression and location of ANGPTL8 in human adipose tissue. J Clin Med. 2020;9(2):512. https://doi.org/10.3390/jcm9020512.
Manzano-Núñez F, Arámbul-Anthony MJ, Albiñana AG, et al. Insulin resistance disrupts epithelial repair and niche-progenitor Fgf signaling during chronic liver injury. PLOS Biol. 2019;17(1): e2006972. https://doi.org/10.1371/journal.pbio.2006972.
Cherubini A, Rusconi F, Lazzari L. Identification of the best housekeeping gene for RT-qPCR analysis of human pancreatic organoids. PLoS ONE. 2021;16(12): e0260902. https://doi.org/10.1371/journal.pone.0260902.
López-Cerdán A, Andreu Z, Hidalgo MR, et al. Unveiling sex-based differences in Parkinson’s disease: a comprehensive meta-analysis of transcriptomic studies. Biol Sex Differ. 2022;13:68. https://doi.org/10.1186/s13293-022-00477-5.
Català-Senent JF, Hidalgo MR, Berenguer M, et al. Hepatic steatosis and steatohepatitis: a functional meta-analysis of sex-based differences in transcriptomic studies. Biol Sex Differ. 2021;12(1):29. https://doi.org/10.1186/s13293-021-00368-1.
Pérez-Díez I, Hidalgo MR, Malmierca-Merlo P, et al. Functional signatures in non-small-cell lung cancer: a systematic review and meta-analysis of sex-based differences in transcriptomic studies. Cancers. 2021;13(1):143. https://doi.org/10.3390/cancers13010143.
Casanova Ferrer F, Pascual M, Hidalgo MR, Malmierca-Merlo P, Guerri C, García-García F. Unveiling sex-based differences in the effects of alcohol abuse: a comprehensive functional meta-analysis of transcriptomic studies. Genes (Basel). 2020;11(9):1106. https://doi.org/10.3390/genes11091106.
Hall AN, Turner TN, Queitsch C. Thousands of high-quality sequencing samples fail to show meaningful correlation between 5S and 45S ribosomal DNA arrays in humans. Sci Rep. 2021;11(1):449. https://doi.org/10.1038/s41598-020-80049-y.
Wang Z, Lachmann A, Ma’ayan A. Mining data and metadata from the gene expression omnibus. Biophys Rev. 2019;11(1):103–10. https://doi.org/10.1007/s12551-018-0490-8.
The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. https://www.nature.com/articles/sdata201618. Accessed 1 Dec 2021.
The authors thank the Principe Felipe Research Center (CIPF) for providing access to the computer cluster. Part of the equipment employed in this work has been funded by Generalitat Valenciana and co-financed with ERDF funds (OP ERDF of Comunitat Valenciana 2014-2020). The authors also thank Stuart P. Atkinson for reviewing the manuscript.
This research was supported by and partially funded by the Institute of Health Carlos III (project IMPaCT-Data, exp. IMP/00019), co-funded by the European Union, European Regional Development Fund (ERDF, “A way to make Europe”), PID2021-124430OA-I00 funded by MCIN/AEI//501100011033/FEDER, UE (“A way to make Europe”), and SAF2017-84708-R grants. M.G. is the recipient of ACIF/2021/196 predoctoral fellowship.
Ethics approval and consent to participate
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of IISPV (Tarragona, Spain). Informed consent was obtained from all subjects involved in the study. Subjects were recruited by the Endocrinology and Surgery departments at the University Hospital Joan XXIII. The samples were stored in a tissue biobank registered at the National Register of Biobanks (Registration number #C.0003609). Committee Reference Code: CEIm 34p/2016.
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
HKG primers design process.
List of primers used for the experimental validation. Table S2. Human sample information. Table S3. Mouse sample information. Table S4. Selection of candidate sex-specific HKGs in gene expression analysis. Fig. S1. Summary of the number of female and male samples found in each Hsa study. Fig. S2. Summary of the number of female and male samples found in each Hsa study. Fig. S3. Variability levels for classic HKGs evaluated in Hsa females. Fig. S4. Variability levels for classic HKGs evaluated in Hsa males. Fig. S5. Variability levels for classic HKGs evaluated in all Mmu samples. Fig. S6. Candidate HKG analysis in mouse adipose tissue, using wt and irs2-/- KO male and female samples.
Methods and results of deconvolution analysis.
About this article
Cite this article
Guaita-Cespedes, M., Grillo-Risco, R., Hidalgo, M.R. et al. Deciphering the sex bias in housekeeping gene expression in adipose tissue: a comprehensive meta-analysis of transcriptomic studies. Biol Sex Differ 14, 20 (2023). https://doi.org/10.1186/s13293-023-00506-x