Gene ID Conversion

When comparing multiple gene sets from different sources, we often encounter missing data because the same gene may have multiple aliases. To facilitate downstream analysis, we first convert gene names to standardized gene IDs. Here, we use the org.Dm.eg.db database along with R packages like AnnotationDbi, which can comprehensively list all gene name variants associated with a given gene ID.

Additionally, we account for case sensitivity by converting gene names to a consistent letter case, ensuring that capitalization differences do not lead to mismatches.

the input data

# ------
gene <- c("Pepck", "CR44430", "CG12163", "CR43855", "CR34335", 
                "CR40469", "CG18815", "CG32138", "CG5059", "CG9821",
                "Hsromega", "CG11337", "hipk", "CR42862", "CG32103",
                "CG9836", "CG11486", "Ald", "CG11138", "CG8177",
                "roX2", "CG9911", "CG13624", "CG10082", "roX1")

Gene ID Conversion

library(org.Dm.eg.db)
gene_name <- AnnotationDbi::keys(org.Dm.eg.db, keytype = "SYMBOL")
gene2id <- AnnotationDbi::select(org.Dm.eg.db,
                                 keys = gene_name,
                                 columns = c("FLYBASE", "ALIAS"),
                                 keytype = "SYMBOL")

length(intersect(tolower(gene2id$ALIAS),tolower(gene)))
target_gene <- gene2id[tolower(gene2id$ALIAS) %in% tolower(gene),]
target_gene <- target_gene[!duplicated(tolower(target_gene$ALIAS)),]

taeget_gene

output

Bioinformatic

Gene ID Conversion

https://www.lianganmin.cn/2025/05/11/20250511-Gene-ID-Conversion/

Author

An-min

Posted on

May 11, 2025

Licensed under

LASSO回归模型 Previous

标准差与标准误 Next