Conversation
Fix single cell deletion
...a-core/src/main/java/ubic/gemma/persistence/service/expression/bioAssay/BioAssayDaoImpl.java
Outdated
Show resolved
Hide resolved
| for ( BioMaterial bm : samplesToRemove ) { | ||
| log.debug( "Removing " + bm + "..." ); | ||
| session.delete( bm ); | ||
| } |
There was a problem hiding this comment.
I don't think it's a good idea to delete assays/samples that belong to subsets of the experiment. Those are normally deleted when the subsets are deleted, so the correct thing to do is to delete all the subsets before deleting the experiment
At this stage, if there are still samples that refer to a sample you want to delete, the best thing to do is to detach them (by setting sourceBioMaterial to null) and warn about dangling samples. Deleting could fail if they are still in use.
There was a problem hiding this comment.
in the current implementation deletion of the subsets do not remove bioassays and biomaterials due to the existence of BioAssayDimensions and SingleCellDimensions by that point.
Both of these are only attempted to be removed here, after the subsets are already removed. I wasn't quite sure what the intended order of operations here was.
| throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." ); | ||
| log.warn( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." ); | ||
| return null; | ||
| // throw new IllegalArgumentException( "Term ID is not in the expected '{IDSPACE}:{LOCALID}' format." ); |
There was a problem hiding this comment.
This is problematic. An OBO term ID must be in this format. I think there's a method to check if a string is an OBO term ID, you should use that instead of returning null.
|
You can add test cases to simulate various deletion scenarios with subsets or with more complex sample hierarchy. |
…a into single-cell-performance
…osite sequences aren't initialized
| ExternalDatabaseService externalDatabaseService, TaxonService taxonService, | ||
| SingleCellDataTransformationFactory singleCellDataTransformationFactory, | ||
| @Value("${cellxgene.local.singleCellData.basepath}") Path cellXGeneDownloadPath, | ||
| @Value("${gemma.download.path}/singleCellData/CELLxGENE_Transposed") Path cellXGeneTransposedPath, |
There was a problem hiding this comment.
There is also ${gemma.scratch.dir} that can be used for storing large intermediary files.
| this.persister = persister; | ||
| this.arrayDesignService = arrayDesignService; | ||
| this.expressionExperimentService = expressionExperimentService; | ||
| this.cellXGeneTransposedPath = cellXGeneTransposedPath; |
There was a problem hiding this comment.
I think it's best to simply request a path for storing scratch and then creating whatever you need in it. Transposing is not the only needed operation for CELLxGENE.
| ArrayDesign platform, String datasetShortName, boolean loadSingleCellData, boolean keepPooledSample, boolean keepUnknownSample, boolean dryRun) throws IOException { | ||
| if ( expressionExperimentService.existsByShortName( datasetShortName ) ) { | ||
| throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." ); | ||
| //throw new IllegalArgumentException( "An ExpressionExperiment with short name " + datasetShortName + " already exists in the database." ); |
There was a problem hiding this comment.
This is a safeguard. What happens if the dataset exists already?
|
|
||
| ExpressionExperiment ee; | ||
| try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory ) | ||
| try ( SingleCellDataLoader dataLoader = new CellXGeneAnnDataSingleCellDataConfigurer( dataPath, singleCellDataTransformationFactory,cellXGeneTransposedPath ) |
There was a problem hiding this comment.
As mentioned earlier, better to have a general scratchDir instead.
…ataVectors when processing CellXGene datasets
…a into single-cell-performance
No description provided.