Skip to content

Single cell performance#1645

Draft
oganm wants to merge 38 commits intohotfix-1.32.7from
single-cell-performance
Draft

Single cell performance#1645
oganm wants to merge 38 commits intohotfix-1.32.7from
single-cell-performance

Conversation

@oganm
Copy link
Copy Markdown
Member

@oganm oganm commented Mar 26, 2026

No description provided.

for ( BioMaterial bm : samplesToRemove ) {
log.debug( "Removing " + bm + "..." );
session.delete( bm );
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to delete assays/samples that belong to subsets of the experiment. Those are normally deleted when the subsets are deleted, so the correct thing to do is to delete all the subsets before deleting the experiment

At this stage, if there are still samples that refer to a sample you want to delete, the best thing to do is to detach them (by setting sourceBioMaterial to null) and warn about dangling samples. Deleting could fail if they are still in use.

Copy link
Copy Markdown
Member Author

@oganm oganm Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current implementation deletion of the subsets do not remove bioassays and biomaterials due to the existence of BioAssayDimensions and SingleCellDimensions by that point.

Both of these are only attempted to be removed here, after the subsets are already removed. I wasn't quite sure what the intended order of operations here was.

throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );
log.warn( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );
return null;
// throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic. An OBO term ID must be in this format. I think there's a method to check if a string is an OBO term ID, you should use that instead of returning null.

@arteymix
Copy link
Copy Markdown
Collaborator

You can add test cases to simulate various deletion scenarios with subsets or with more complex sample hierarchy.

ExternalDatabaseService externalDatabaseService, TaxonService taxonService,
SingleCellDataTransformationFactory singleCellDataTransformationFactory,
@Value("${cellxgene.local.singleCellData.basepath}") Path cellXGeneDownloadPath,
@Value("${gemma.download.path}/singleCellData/CELLxGENE_Transposed") Path cellXGeneTransposedPath,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also ${gemma.scratch.dir} that can be used for storing large intermediary files.

this.persister = persister;
this.arrayDesignService = arrayDesignService;
this.expressionExperimentService = expressionExperimentService;
this.cellXGeneTransposedPath = cellXGeneTransposedPath;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to simply request a path for storing scratch and then creating whatever you need in it. Transposing is not the only needed operation for CELLxGENE.

ArrayDesign platform, String datasetShortName, boolean loadSingleCellData, boolean keepPooledSample, boolean keepUnknownSample, boolean dryRun) throws IOException {
if ( expressionExperimentService.existsByShortName( datasetShortName ) ) {
throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." );
//throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." );
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a safeguard. What happens if the dataset exists already?


ExpressionExperiment ee;
try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory )
try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory,cellXGeneTransposedPath )
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier, better to have a general scratchDir instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants