Single cell performance by oganm · Pull Request #1645 · PavlidisLab/Gemma

oganm · 2026-03-26T01:21:42Z

No description provided.

Fix single cell deletion

...a-core/src/main/java/ubic/gemma/persistence/service/expression/bioAssay/BioAssayDaoImpl.java

arteymix · 2026-03-26T04:08:30Z

...n/java/ubic/gemma/persistence/service/expression/experiment/ExpressionExperimentDaoImpl.java

+                for ( BioMaterial bm : samplesToRemove ) {
+                    log.debug( "Removing " + bm + "..." );
+                    session.delete( bm );
+                }


I don't think it's a good idea to delete assays/samples that belong to subsets of the experiment. Those are normally deleted when the subsets are deleted, so the correct thing to do is to delete all the subsets before deleting the experiment

At this stage, if there are still samples that refer to a sample you want to delete, the best thing to do is to detach them (by setting sourceBioMaterial to null) and warn about dangling samples. Deleting could fail if they are still in use.

in the current implementation deletion of the subsets do not remove bioassays and biomaterials due to the existence of BioAssayDimensions and SingleCellDimensions by that point.

Both of these are only attempted to be removed here, after the subsets are already removed. I wasn't quite sure what the intended order of operations here was.

arteymix · 2026-03-26T04:10:26Z

gemma-core/src/main/java/ubic/gemma/core/ontology/OntologyUtils.java

-            throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );
+            log.warn( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );
+            return null;
+            // throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." );


This is problematic. An OBO term ID must be in this format. I think there's a method to check if a string is an OBO term ID, you should use that instead of returning null.

arteymix · 2026-03-26T04:11:47Z

You can add test cases to simulate various deletion scenarios with subsets or with more complex sample hierarchy.

…a into single-cell-performance

…osite sequences aren't initialized

arteymix · 2026-04-03T22:34:25Z

...rc/main/java/ubic/gemma/core/loader/expression/cellxgene/CellXGeneDataLoaderServiceImpl.java

            ExternalDatabaseService externalDatabaseService, TaxonService taxonService,
            SingleCellDataTransformationFactory singleCellDataTransformationFactory,
            @Value("${cellxgene.local.singleCellData.basepath}") Path cellXGeneDownloadPath,
+            @Value("${gemma.download.path}/singleCellData/CELLxGENE_Transposed") Path cellXGeneTransposedPath,


There is also ${gemma.scratch.dir} that can be used for storing large intermediary files.

arteymix · 2026-04-03T22:35:16Z

...rc/main/java/ubic/gemma/core/loader/expression/cellxgene/CellXGeneDataLoaderServiceImpl.java

        this.persister = persister;
        this.arrayDesignService = arrayDesignService;
        this.expressionExperimentService = expressionExperimentService;
+        this.cellXGeneTransposedPath = cellXGeneTransposedPath;


I think it's best to simply request a path for storing scratch and then creating whatever you need in it. Transposing is not the only needed operation for CELLxGENE.

arteymix · 2026-04-03T22:35:48Z

...rc/main/java/ubic/gemma/core/loader/expression/cellxgene/CellXGeneDataLoaderServiceImpl.java

+            ArrayDesign platform, String datasetShortName, boolean loadSingleCellData, boolean keepPooledSample, boolean keepUnknownSample, boolean dryRun) throws IOException {
        if ( expressionExperimentService.existsByShortName( datasetShortName ) ) {
-            throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." );
+            //throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." );


This is a safeguard. What happens if the dataset exists already?

arteymix · 2026-04-03T22:36:16Z

...rc/main/java/ubic/gemma/core/loader/expression/cellxgene/CellXGeneDataLoaderServiceImpl.java


        ExpressionExperiment ee;
-        try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory )
+        try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory,cellXGeneTransposedPath )


As mentioned earlier, better to have a general scratchDir instead.

…Vectors

…ataVectors when processing CellXGene datasets

…a into single-cell-performance

oganm and others added 18 commits March 11, 2026 23:26

accept malformed term URIs when processing CellXGene datasets

d8a228c

add URIs to known characteristics

47d7d6c

single cell removal test

6998956

check biomaterial deletion

89d7723

try to delete subset biomaterials

afb3361

clean everything attached to the biomaterials

ae44dd0

findBioAssayDimensions returns distinct dims

6685b7c

test for bioAssayDimension deletion

569dcd3

batch biomaterials when removing

849623e

shorthand

e6c9446

unnest biomaterial/bioassay deletion

cdc306e

remove unused import

000b271

use removeUnusedDimensions to remove the remaining dims

784dc8e

typo and refactoring

cb8d247

Merge pull request #1644 from PavlidisLab/fix-single-cell-deletion

8586282

Fix single cell deletion

typing fix

3f201ff

Merge branch 'fix-single-cell-deletion' into single-cell-performance

d57f731

CellXGene test fix

2a16091

arteymix reviewed Mar 26, 2026

View reviewed changes

...a-core/src/main/java/ubic/gemma/persistence/service/expression/bioAssay/BioAssayDaoImpl.java Outdated Show resolved Hide resolved

arteymix reviewed Mar 26, 2026

View reviewed changes

oganm and others added 8 commits March 26, 2026 18:07

check subset deletion

7cca15a

use group bys instead of distincts

99941cc

Merge branch 'fix-single-cell-deletion' into single-cell-performance

6073246

temporary dry run

bc897b2

Merge branch 'single-cell-performance' of github.com:PavlidisLab/Gemm…

2fb6239

…a into single-cell-performance

temporary dry run test fix

7a92c11

SingleCellTransformationUtils accessible file extension links

af21fc4

preserve transposed file for cellxgene

dfa08ac

oganm added 4 commits April 2, 2026 14:34

keep the finalSortedFile

c4b09c4

set wasTransposedOnDisk

e5f3b33

pass compositeSequences from designElementMapping since platform comp…

648968a

…osite sequences aren't initialized

cellxgeneconverter test fix

28c890e

arteymix reviewed Apr 3, 2026

View reviewed changes

oganm added 8 commits April 9, 2026 20:07

dry run rolls back

c1526b2

createSingleCellDataVectors for batched persistence of singleCellData…

a2bd0cb

…Vectors

createSingleCellDataVectors for batched persistence of singleCellData…

37b9633

…Vectors

remove vectors from ee and persist them separately via addSingleCellD…

be0d02b

…ataVectors when processing CellXGene datasets

X is the preferred qt. consider how non cellxgene files are processed

56bbb47

Merge branch 'single-cell-performance' of github.com:PavlidisLab/Gemm…

01be99b

…a into single-cell-performance

minor

2809ec8

refresh ee at the end of createSingleCellDataVectors

4f35c6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single cell performance#1645

Single cell performance#1645
oganm wants to merge 38 commits intohotfix-1.32.7from
single-cell-performance

oganm commented Mar 26, 2026

Uh oh!

Uh oh!

arteymix Mar 26, 2026

Uh oh!

oganm Mar 27, 2026 •

edited

Loading

Uh oh!

arteymix Mar 26, 2026

Uh oh!

arteymix commented Mar 26, 2026

Uh oh!

arteymix Apr 3, 2026

Uh oh!

arteymix Apr 3, 2026

Uh oh!

arteymix Apr 3, 2026

Uh oh!

arteymix Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oganm commented Mar 26, 2026

Uh oh!

Uh oh!

arteymix Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

oganm Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arteymix Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

arteymix commented Mar 26, 2026

Uh oh!

arteymix Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

arteymix Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

arteymix Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

arteymix Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oganm Mar 27, 2026 •

edited

Loading