scgpt.scbank package

scgpt.scbank.data

class scgpt.scbank.data.DataTable(name: str, data: Dataset | None = None)[源代码]

基类:object

The data structure for a single-cell data table.

data: Dataset | None = None
property is_loaded: bool
name: str
save(path: Path | str, format: Literal['json', 'parquet'] = 'json') None[源代码]
class scgpt.scbank.data.MetaInfo(on_disk_path: Path | str | None = None, on_disk_format: Literal['json', 'parquet'] = 'json', main_table_key: str | None = None, gene_vocab_md5: str | None = None, study_ids: List[int] | None = None, cell_ids: List[int] | None = None)[源代码]

基类:object

The data structure for meta info of a scBank data directory.

cell_ids: List[int] | None = None
classmethod from_path(path: Path | str) Self[源代码]

Create a MetaInfo object from a path.

gene_vocab_md5: str | None = None
load(path: Path | str | None = None) None[源代码]

Load meta info from path. If path is None, will load from the same path at on_disk_path.

main_table_key: str | None = None
on_disk_format: Literal['json', 'parquet'] = 'json'
on_disk_path: Path | str | None = None
save(path: Path | str | None = None) None[源代码]

Save meta info to path. If path is None, will save to the same path at on_disk_path.

study_ids: List[int] | None = None

scgpt.scbank.databank

class scgpt.scbank.databank.DataBank(meta_info: ~scgpt.scbank.data.MetaInfo | None = None, data_tables: ~typing.Dict[str, ~scgpt.scbank.data.DataTable] = <factory>, gene_vocab: dataclasses.InitVar[GeneVocab] = <property object>, settings: ~scgpt.scbank.setting.Setting = <factory>)[源代码]

基类:object

The data structure for large-scale single cell data containing multiple studies. See https://github.com/subercui/scGPT-release#the-data-structure-for-large-scale-computing.

append_study(study_id: int, study_data: AnnData | DataBank) None[源代码]

Append a study to the current DataBank.

参数:
  • study_id (str) – Study ID.

  • study_data (AnnData or DataBank) – Study data.

classmethod batch_from_anndata(adata: List[AnnData], to: Path | str) Self[源代码]
custom_filter(field: str, filter_func: callable, inplace: bool = True) Self[源代码]

Filter the current DataBank by applying a custom filter function to a field.

参数:
  • field (str) – Field to filter.

  • filter_func (callable) – Filter function.

  • inplace (bool) – Whether to also filter inplace.

返回:

Filtered DataBank.

返回类型:

DataBank

data_tables: Dict[str, DataTable]
delete_study(study_id: int) None[源代码]

Delete a study from the current DataBank.

filter(study_ids: List[int] | None = None, cell_ids: List[int] | None = None, inplace: bool = True) Self[源代码]

Filter the current DataBank by study ID and cell ID.

参数:
  • study_ids (list) – Study IDs to filter.

  • cell_ids (list) – Cell IDs to filter.

  • inplace (bool) – Whether to also filter inplace.

返回:

Filtered DataBank.

返回类型:

DataBank

classmethod from_anndata(adata: AnnData | Path | str, vocab: GeneVocab | Mapping[str, int], to: Path | str, main_table_key: str = 'X', token_col: str = 'gene name', immediate_save: bool = True) Self[源代码]

Create a DataBank from an AnnData object.

参数:
  • adata (AnnData) – Annotated data or path to anndata file.

  • vocab (GeneVocab or Mapping[str, int]) – Gene vocabulary maps gene token to index.

  • to (Path or str) – Data directory.

  • main_table_key (str) – This layer/obsm in anndata will be used as the main data table.

  • token_col (str) – Column name of the gene token.

  • immediate_save (bool) – Whether to save the data immediately after creation.

返回:

DataBank instance.

返回类型:

DataBank

classmethod from_path(path: Path | str) Self[源代码]

Create a DataBank from a directory containing scBank data. NOTE: this method will automatically check whether md5sum record in the manifest.json matches the md5sum of the loaded gene vocabulary.

参数:

path (Path or str) – Directory path.

返回:

DataBank instance.

返回类型:

DataBank

property gene_vocab: GeneVocab | None

The gene vocabulary mapping gene tokens to integer ids.

Link to a scBank data directory. This will only load the meta info and perform validation check, but not load the data tables. Usually, can use the .load_table method to load a data table later.

load(path: Path | str) Dataset[源代码]

Load scBank data from a data directory. Since DataBank is designed to work with large-scale data, this only loads the main data table to memory by default. This does as well load the meta info and perform validation check.

load_all(path: Path | str) Dict[str, Dataset][源代码]

Load scBank data from a data directory. This will load all the data tables to memory.

load_anndata(adata: AnnData, data_keys: List[str] | None = None, token_col: str = 'gene name') List[DataTable][源代码]

Load anndata into datatables.

参数:
  • adata (AnnData) – Annotated data object to load.

  • data_keys (list of str) – List of data keys to load. If None, all data keys in adata.X, adata.layers and adata.obsm will be loaded.

  • token_col (str) – Column name of the gene token. Tokens will be converted to indices by self.gene_vocab.

返回:

List of data tables loaded.

返回类型:

list of DataTable

load_table(table_name: str) Dataset[源代码]

Load a data table from the current DataBank.

property main_data: DataTable

The main data table.

property main_table_key: str | None

The main data table key.

meta_info: MetaInfo = None
save(path: Path | str | None, replace: bool = False) None[源代码]

Save scBank data to a data directory.

参数:
  • path (Path) – Path to save scBank data. If None, will save to the directory at self.meta_info.on_disk_path.

  • replace (bool) – Whether to replace existing data in the directory.

settings: Setting
sync(attr_keys: List[str] | str | None = None) None[源代码]

Sync the current DataBank to a data directory, including, save the updated data/vocab to files, update the meta info and save to files. NOTE: This will overwrite the existing data directory.

参数:

attr_keys (list of str) – List of attribute keys to sync. If None, will sync all the attributes with tracked changes.

track(attr_keys: List[str] | str | None = None) List[源代码]

Track all the changes made to the current DataBank and that have not been synced to disk. This will return a list of changes.

参数:

attr_keys (list of str) – List of attribute keys to look for changes. If None, all attributes will be checked.

update_datatables(new_tables: List[DataTable], use_names: List[str] | None = None, overwrite: bool = False, immediate_save: bool | None = None) None[源代码]

Update the data tables in the DataBank with new data tables.

参数:
  • new_tables (list of DataTable) – New data tables to update.

  • use_names (list of str) – Names of the new data tables to use. If not provided, will use the names of the new data tables.

  • overwrite (bool) – Whether to overwrite the existing data tables.

  • immediate_save (bool) – Whether to save the data immediately after updating. Will save to self.meta_info.on_disk_path. If not provided, will follow self.settings.immediate_save instead. Default to None.

scgpt.scbank.monitor

scgpt.scbank.setting

class scgpt.scbank.setting.Setting(remove_zero_rows: bool = True, max_tokenize_batch_size: int = 1000000.0, immediate_save: bool = False)[源代码]

基类:object

The configuration for scBank DataBank.

immediate_save: bool = False
max_tokenize_batch_size: int = 1000000.0
remove_zero_rows: bool = True

Module contents