Gene ID Conversion

Gene ID Conversion

.

When comparing multiple gene sets from different sources, we often encounter missing data because the same gene may have multiple aliases. To facilitate downstream analysis, we first convert gene names to standardized gene IDs. Here, we use the org.Dm.eg.db database along with R packages like AnnotationDbi, which can comprehensively list all gene name variants associated with a given gene ID.

Additionally, we account for case sensitivity by converting gene names to a consistent letter case, ensuring that capitalization differences do not lead to mismatches.

.

the input data

1
2
3
4
5
6
# ------
gene <- c("Pepck", "CR44430", "CG12163", "CR43855", "CR34335",
"CR40469", "CG18815", "CG32138", "CG5059", "CG9821",
"Hsromega", "CG11337", "hipk", "CR42862", "CG32103",
"CG9836", "CG11486", "Ald", "CG11138", "CG8177",
"roX2", "CG9911", "CG13624", "CG10082", "roX1")

.

Gene ID Conversion

1
2
3
4
5
6
7
8
9
10
11
library(org.Dm.eg.db)
gene_name <- AnnotationDbi::keys(org.Dm.eg.db, keytype = "SYMBOL")
gene2id <- AnnotationDbi::select(org.Dm.eg.db,
keys = gene_name,
columns = c("FLYBASE", "ALIAS"),
keytype = "SYMBOL")

length(intersect(tolower(gene2id$ALIAS),tolower(gene)))
target_gene <- gene2id[tolower(gene2id$ALIAS) %in% tolower(gene),]
target_gene <- target_gene[!duplicated(tolower(target_gene$ALIAS)),]

.

taeget_gene

output


Gene ID Conversion
https://www.lianganmin.cn/2025/05/11/20250511-Gene-ID-Conversion/
Author
An-min
Posted on
May 11, 2025
Licensed under