반응형

Pangenome 분석에 대해

(내가 이해하고자) 쓰는 포스트.

 

출처: wikipedia (pan-genome)

 

❗Pangenome을 위해 필요한 몇 가지 개념들

 

✔️COGs: Clusters of Orthologous Groups of proteins

 - COG db는 complete genomes의 enconded protiens를 phylogenetic classify를 위한 시도로 만들어짐.

 

 

✔️PGfams: Cross-genus families

 - The cross-genera protein families 는 대표적인 proteins를 클러스터링하여 계산 된다.

 - 대표적인 proteins는 (MCL inflation = 1.1)의 criteria로, genus-specific families.

 - 이는 corss-genera 또는 distant homologs to cluster 를 가능하게 함.

 - bv-brc.org 에서 그려주는 phylogenetic tree에 사용 됨.

 

 

✔️ SCG: Single-copy core gene

 - A gene that is found in the vast majority of genomes and yet occurs only once within a single genome.

- Single-copy core genes play a central role in pylogenetics.

- Commonly used SCGs can be identified across a set of genomes through sequence homology searches (via BLAST or HMMs).

- SCGs can also be identified de novo through pangenemics for relatively closely related genomes.

- The number of SCGs will decrease with decreasing resolutions of taxonomy.

 

 

✔️ HMMs: Hidden Markov Models

 - prediction (description) tool for a future state, given the knowledge of current state(=observation) in the sequence.

 - HMMs are widely used for many forms of sequence analysis, such as database searches, gene prediction, solving pairwise and multiple sequence alignment problems.

 - HMMs have advantages for solving the homology detection problem.

 - anvi'o 에서는 16S rRNA profiling, Bacteria_71 profiling, Protista_83 profiling 등에 사용 됨.

반응형

'Bio' 카테고리의 다른 글

miRNA  (1) 2022.12.31
NGS data 파이프라인  (0) 2022.10.24

+ Recent posts