HADOOP-1593. [ABFS] Add vectored read support in ABFS driver by anmolanmol1234 · Pull Request #8400 · apache/hadoop

anmolanmol1234 · 2026-04-02T17:41:52Z

This PR introduces vectored read support in the Azure Blob File System (ABFS) driver to improve read performance for workloads that issue multiple small, non-contiguous read requests.

Vectored reads enable batching of multiple read ranges into fewer network calls, reducing request overhead and improving throughput—especially beneficial for analytics engines like Spark.

Current ABFS read implementation performs sequential, independent read operations for each requested range. This leads to:
Increased number of network calls
Higher latency for small/random reads
Inefficient utilization of bandwidth

Vectored I/O addresses these issues by coalescing multiple read requests into a single or fewer backend calls.

… HADOOP-15963_poc

hadoop-yetus · 2026-04-02T19:57:37Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 53s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 13s		trunk passed
+1 💚	compile	1m 1s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 4s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 58s		trunk passed
+1 💚	mvnsite	1m 7s		trunk passed
+1 💚	javadoc	0m 59s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 58s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 35s		trunk passed
+1 💚	shadedclient	34m 39s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	35m 11s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 37s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 35s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 35s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 25s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 10 new + 5 unchanged - 0 fixed = 15 total (was 5)
+1 💚	mvnsite	0m 39s		the patch passed
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 17s		the patch passed
+1 💚	shadedclient	32m 56s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 10s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		134m 18s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/1/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux bec554755465 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `437ffc8`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/1/testReport/
Max. process+thread count	574 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/1/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2026-04-02T22:32:54Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 52s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	47m 14s		trunk passed
+1 💚	compile	1m 1s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 56s		trunk passed
+1 💚	mvnsite	1m 7s		trunk passed
+1 💚	javadoc	0m 59s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 57s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 36s		trunk passed
+1 💚	shadedclient	34m 1s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	34m 33s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 37s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 34s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 25s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5)
+1 💚	mvnsite	0m 39s		the patch passed
+1 💚	javadoc	0m 28s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 16s		the patch passed
+1 💚	shadedclient	33m 33s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 8s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		133m 20s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/2/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux ad47bd66149a 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `975bf73`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/2/testReport/
Max. process+thread count	586 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/2/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

manika137 · 2026-04-11T05:27:34Z

+      return System.identityHashCode(range);
+    }
+  }
+}


manika137 · 2026-04-11T05:32:30Z

@@ -1078,7 +1105,7 @@ public int minSeekForVectorReads() {
   */
  @Override
  public int maxReadSizeForVectorReads() {


the method mentions the read size but we're returning the max gap size

this is the name of the method in the superclass

But why are we returning the seek size here? Why not actual read size?

manika137 · 2026-04-11T06:28:30Z

+                buffer.getPath(), r.getOffset(), destOffset, length, left);
+
+            if (left < 0) {
+              LOG.error("fanOut: pending bytes went negative  possible duplicate write:"


nit: non-printable character

manika137 · 2026-04-11T08:52:39Z

+          if (end >= unit.getOffset() + unit.getLength()) {
+            existing.setBufferType(BufferType.VECTORED);
+            existing.addVectoredUnit(unit);
+            existing.setAllocator(allocator);


could we have multiple readvector calls with overlapping ranges that could reset the allocator here? Or isAlreadyQueued section wont go through for overlapping ranges?

Overlapping ranges are not allowed in vectored reads. validateAndSortRanges in VectoredReadUtils class takes care of this.

manika137 · 2026-04-11T08:55:38Z

    return bufferManager;
  }

+VectoredReadHandler getVectoredReadHandler() {


nit: format

manika137 · 2026-04-11T10:36:35Z

+    long bufferEnd = bufferStart + bytesRead;
+
+    /* Iterate over all combined logical units mapped to this buffer */
+    for (CombinedFileRange unit : units) {


could the following scenario be possible: while we fan-out here, we have another vectorRead call come in overlapping the ranges shared by this buffer and getting attached to the same buffer as a unit

Overlapping ranges are not allowed, List<? extends FileRange> sortedRanges = VectoredReadUtils.validateAndSortRanges(
ranges, Optional.of(fileLength)); this takes care of that

manika137 · 2026-04-11T10:53:26Z

+       */
+      if (isAlreadyQueued(stream.getETag(), unit.getOffset())) {
+        ReadBuffer existing = findQueuedBuffer(stream, unit.getOffset());
+        if (existing != null && existing.getStream().getETag() != null  && stream.getETag()


same doubt as RBMV1, do we wait for UNAVAILABLE state readbuffers too?

yes addressed above

manika137 · 2026-04-11T10:56:39Z

   * @param abfsConfiguration the configuration to set for the ReadBufferManagerV2.
   */
-  public static void setReadBufferManagerConfigs(final int readAheadBlockSize,
+  public static void setReadBufferManagerConfigs(int readAheadBlockSize,


why this change?

hadoop-yetus · 2026-04-13T08:30:27Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 22s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	25m 2s		trunk passed
+1 💚	compile	0m 36s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	0m 42s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 36s		trunk passed
+1 💚	mvnsite	0m 46s		trunk passed
-1 ❌	javadoc	0m 38s	/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt	hadoop-azure in trunk failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌	javadoc	0m 40s	/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt	hadoop-azure in trunk failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚	spotbugs	0m 59s		trunk passed
+1 💚	shadedclient	16m 15s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	16m 32s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 22s		the patch passed
+1 💚	compile	0m 20s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 20s		the patch passed
+1 💚	compile	0m 22s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 22s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 16s		the patch passed
+1 💚	mvnsite	0m 24s		the patch passed
-1 ❌	javadoc	0m 21s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt	hadoop-azure in the patch failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌	javadoc	0m 21s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt	hadoop-azure in the patch failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚	spotbugs	0m 47s		the patch passed
+1 💚	shadedclient	15m 0s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 50s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 21s		The patch does not generate ASF License warnings.
		68m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/9/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname	Linux be7fc4980ad7 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `3c6ad3e`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/9/testReport/
Max. process+thread count	610 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/9/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2026-04-13T08:42:36Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	18m 44s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	47m 6s		trunk passed
+1 💚	compile	1m 2s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 56s		trunk passed
+1 💚	mvnsite	1m 10s		trunk passed
-1 ❌	javadoc	1m 0s	/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt	hadoop-azure in trunk failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌	javadoc	0m 57s	/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt	hadoop-azure in trunk failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚	spotbugs	1m 36s		trunk passed
+1 💚	shadedclient	33m 53s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	34m 27s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 36s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 34s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 26s		the patch passed
+1 💚	mvnsite	0m 39s		the patch passed
-1 ❌	javadoc	0m 29s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.10+7-Ubuntu-124.04.txt	hadoop-azure in the patch failed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04.
-1 ❌	javadoc	0m 29s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1.txt	hadoop-azure in the patch failed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1.
+1 💚	spotbugs	1m 18s		the patch passed
+1 💚	shadedclient	34m 18s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 13s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		151m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/8/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname	Linux 8dbb7b62d7b3 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `cfcb033`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/8/testReport/
Max. process+thread count	590 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/8/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

bhattmanish98 · 2026-04-13T14:09:33Z

    return isFirstByteConsumed() && isLastByteConsumed();
  }
+
+  void addVectoredUnit(CombinedFileRange u) {


Java doc for all the newly created methods

bhattmanish98 · 2026-04-13T14:11:54Z

+  // Allocator used for vectored fan-out; captured at queue time */
+  private IntFunction<ByteBuffer> allocator;
+  // Tracks whether fanOut has already been executed
+  private final AtomicInteger fanOutDone = new AtomicInteger(0);


Would it be better to keep fanOutDone as AtomicBoolean instead of AtomicInteger? We don't have to compare the value in isFanOutDone() in that case.

bhattmanish98 · 2026-04-13T14:15:07Z

+  ReadBuffer findInList(final Collection<ReadBuffer> buffers,
+      final AbfsInputStream stream, long requestedOffset) {
+    for (ReadBuffer buffer : buffers) {
+      if (buffer.getStream() == stream


can buffer be null, if yes it will result into null pointer exception here.

No buffer can't be null here

anmolanmol1234 · 2026-04-14T07:59:53Z

@steveloughran @mukund-thakur requesting you to kindly review the PR. Thanks

hadoop-yetus · 2026-04-14T10:15:19Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 52s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 32s		trunk passed
+1 💚	compile	1m 1s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 2s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 54s		trunk passed
+1 💚	mvnsite	1m 4s		trunk passed
+1 💚	javadoc	0m 57s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 52s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 37s		trunk passed
+1 💚	shadedclient	34m 14s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	34m 47s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 36s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 35s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 35s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 26s		the patch passed
+1 💚	mvnsite	0m 38s		the patch passed
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 28s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 15s		the patch passed
+1 💚	shadedclient	33m 26s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 11s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		132m 31s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/10/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname	Linux 3b91f8c5a9ca 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `339d7e3`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/10/testReport/
Max. process+thread count	577 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/10/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

steveloughran

have a look at #7105 / HADOOP-19105. Improve resilience in vector reads. https://issues.apache.org/jira/browse/HADOOP-19105

to see why I now think trying to merge ranges is a PITA.

do make sure that your error handling explicitly releases any allocated buffers before raising exceptions; the PositionedReadable interface was extended to support this

steveloughran · 2026-04-14T14:54:33Z

@@ -1184,7 +1209,7 @@ ReadBufferManager getReadBufferManager() {
   */
  @Override
  public int minSeekForVectorReads() {


I'm actually going to recommend not coalescing vectors unless you have a good strategy to deal with partial failures, memory releases, retries etc.

I did try to do that in s3a and gave up, even after extending the api to allow a "release" operation to be passed (which parquet passes down FWIW).

We've never seen failures with s3 in production and coalesced ranges as parquet/orc rowgroups are too far apart. So it's better to focus on retry and recovery there than range coalescing

In case of Azure, we do 4 MB buffer reads, so if don't do coalescing, we are wasting a lot of read data per range and hence we found that merging ranges which fall in one buffer i.e. 4 MB is giving better results

steveloughran · 2026-04-14T14:55:21Z

+   * Performs a vectored direct read by fetching multiple non-contiguous
+   * ranges in a single operation.
+   */
+  VECTORED_DIRECT_READ("VDR"),


azure does multirange? nice

Here this direct read is when we are not able to queue a vectored read in the read ahead queue due to read ahead buffers not being available, we do a direct readRemote call

anmolanmol1234 · 2026-04-15T10:56:09Z

have a look at #7105 / HADOOP-19105. Improve resilience in vector reads. https://issues.apache.org/jira/browse/HADOOP-19105

to see why I now think trying to merge ranges is a PITA.

do make sure that your error handling explicitly releases any allocated buffers before raising exceptions; the PositionedReadable interface was extended to support this

Have tried to take care of this in the VectoredReadHandler fanOut and directRead methods, can you please check if you see any concerns with the code there.

anujmodi2021 · 2026-04-20T02:05:10Z

@@ -1078,7 +1105,7 @@ public int minSeekForVectorReads() {
   */
  @Override
  public int maxReadSizeForVectorReads() {


But why are we returning the seek size here? Why not actual read size?

… HADOOP-15963

hadoop-yetus · 2026-04-21T13:59:42Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	18m 35s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 9s		trunk passed
+1 💚	compile	1m 2s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 2s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 57s		trunk passed
+1 💚	mvnsite	1m 7s		trunk passed
+1 💚	javadoc	1m 0s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 56s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 35s		trunk passed
+1 💚	shadedclient	35m 28s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	36m 1s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 36s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 34s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 26s		the patch passed
+1 💚	mvnsite	0m 38s		the patch passed
+1 💚	javadoc	0m 28s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 18s		the patch passed
+1 💚	shadedclient	34m 32s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 10s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		154m 9s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/11/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname	Linux b5723321dbb5 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `7899f50`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/11/testReport/
Max. process+thread count	569 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/11/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

anmolanmol1234 · 2026-04-27T06:34:53Z

Hi @mukund-thakur gentle reminder for reviewing the PR

mukund-thakur · 2026-04-27T21:11:37Z

Hi @mukund-thakur gentle reminder for reviewing the PR

Hi @anmolanmol1234 Thanks for implementing this in ABFS. I skimmed over the changes quickly and I think it is fairly complex thus would be better if someone from ABFS team review and commit this. I haven't touched this code in long time. I would be happy to answer interface related questions. Thanks.

mukund-thakur · 2026-04-27T20:59:50Z

  @Override
  public int maxReadSizeForVectorReads() {
-    return S_2M;
+    return client.getAbfsConfiguration().getMaxSeekForVectoredReads();


It should be in lines of maxReadSizeForVectorReads

mukund-thakur · 2026-04-27T21:08:08Z

+   * Configuration key that defines the maximum gap between adjacent read ranges
+   * for merging ranges during vectored reads in ABFS: {@value}.
+   */
+  public static final String FS_AZURE_MAX_SEEK_FOR_VECTORED_READS =


It would be good if you use consistent parameter names.
fs.s3a.vectored.read.max.merged.size is the name in S3A.

mukund-thakur · 2026-04-27T21:13:48Z

+import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ZERO;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.validateVectoredReadResult;
+
+public class ITestVectoredRead extends AbstractAbfsIntegrationTest {


Please make sure to run ABFS contract tests ITestAbfsFileSystemContractVectoredRead

yes all tests are passing here

… HADOOP-15963

anmolanmol1234 · 2026-05-06T07:56:18Z

Hi @steveloughran have addressed your concerns, can you take a look once. Thanks

hadoop-yetus · 2026-05-06T10:30:41Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	18m 13s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+0 🆗	xmllint	0m 1s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 2 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 48s		trunk passed
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	0m 57s		trunk passed
+1 💚	mvnsite	1m 6s		trunk passed
+1 💚	javadoc	1m 0s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 57s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 36s		trunk passed
+1 💚	shadedclient	35m 23s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	35m 56s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 37s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 34s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 26s		the patch passed
+1 💚	mvnsite	0m 39s		the patch passed
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 29s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	1m 17s		the patch passed
+1 💚	shadedclient	33m 43s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 11s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		153m 45s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/12/artifact/out/Dockerfile
GITHUB PR	#8400
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint spotbugs checkstyle
uname	Linux 2130be252f3b 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `b0b9224`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/12/testReport/
Max. process+thread count	567 (vs. ulimit of 10000)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8400/12/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

anmolanmol1234 added 22 commits January 16, 2026 04:30

vectored read config changes

8c28b18

Vectored read code

08617b7

Fix tests

95cbb73

Made changes for inprogress list

d03d3cd

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

1e62df3

… HADOOP-15963_poc

Merge conflicts

3f36997

Checkstyle fix

c4313e9

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

5043c90

… HADOOP-15963_poc

Fix checkstyle

0106fc1

Fix checkstyle

1430b4a

Add explanations

0927ca1

Add debug log statements

08ca94a

Checkstyle fixes

02834ef

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

64e5da6

… HADOOP-15963_poc

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

eb52c29

… HADOOP-15963_poc

fix null issue

1ea571e

Fix vectored read

279e7f4

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

7de0740

… HADOOP-15963_poc

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

7c46352

… HADOOP-15963_poc

range validation fixes

25fa821

fix issues

5b2632a

Fix checkstyle

437ffc8

github-actions Bot added trunk TOOLS ABFS labels Apr 2, 2026

fix checkstyle

975bf73

mukund-thakur self-requested a review April 2, 2026 21:20

fix checkstyle failures

ba7c9ff

manika137 reviewed Apr 11, 2026

View reviewed changes

anmolanmol1234 added 2 commits April 12, 2026 23:08

Resolve merge conflicts

cfcb033

Address PR comments

3c6ad3e

bhattmanish98 reviewed Apr 13, 2026

View reviewed changes

PR comments

339d7e3

steveloughran reviewed Apr 14, 2026

View reviewed changes

anujmodi2021 reviewed Apr 20, 2026

View reviewed changes

anmolanmol1234 added 2 commits April 21, 2026 00:24

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

4bd2ec8

… HADOOP-15963

PR comments

7899f50

mukund-thakur reviewed Apr 27, 2026

View reviewed changes

manika137 approved these changes Apr 29, 2026

View reviewed changes

anmolanmol1234 added 2 commits May 5, 2026 23:08

Merge branch 'trunk' of https://github.com/anmolanmol1234/hadoop into…

e3a6094

… HADOOP-15963

PR comments

b0b9224

Conversation

anmolanmol1234 commented Apr 2, 2026

Uh oh!

hadoop-yetus commented Apr 2, 2026

Uh oh!

hadoop-yetus commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Apr 13, 2026

Uh oh!

hadoop-yetus commented Apr 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmolanmol1234 commented Apr 14, 2026

Uh oh!

hadoop-yetus commented Apr 14, 2026

Uh oh!

steveloughran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmolanmol1234 commented Apr 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mukund-thakur commented Apr 27, 2026 •

edited

Loading