HADOOP-19795: ABFS. GetPathStatus Optimization on OpenFileForRead#8212
HADOOP-19795: ABFS. GetPathStatus Optimization on OpenFileForRead#8212manika137 wants to merge 13 commits intoapache:trunkfrom
Conversation
Test Results============================================================
|
|
💔 -1 overall
This message was automatically generated. |
| getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT | ||
| || ((VersionedFileStatus) fileStatus).getEncryptionContext() | ||
| != null)) { | ||
| getClient().getEncryptionType() != EncryptionType.ENCRYPTION_CONTEXT |
There was a problem hiding this comment.
additional space changes can be reverted
| contextEncryptionAdapter = new ContextProviderEncryptionAdapter( | ||
| getClient().getEncryptionContextProvider(), getRelativePath(path), | ||
| fileEncryptionContext.getBytes(StandardCharsets.UTF_8)); | ||
| getClient().getEncryptionContextProvider(), getRelativePath(path), |
| encryptionContext.getBytes(StandardCharsets.UTF_8)); | ||
| } | ||
| } else { | ||
| if (parseIsDirectory(resourceType)) { |
There was a problem hiding this comment.
Can be moved to common part as is getting checked in both the cases
| * - restrictGpsOnOpenFile config is enabled with null FileStatus and encryptionType not as ENCRYPTION_CONTEXT | ||
| * In this case, we don't need to call GetPathStatus API. | ||
| */ | ||
| else { |
There was a problem hiding this comment.
Won't this lead to going ahead and opening the stream without checks? Do we fail later for this case ?
| INVALID_APPEND_OPERATION("InvalidAppendOperation", HttpURLConnection.HTTP_CONFLICT, null), | ||
| UNAUTHORIZED_BLOB_OVERWRITE("UnauthorizedBlobOverwrite", HttpURLConnection.HTTP_FORBIDDEN, | ||
| "This request is not authorized to perform blob overwrites."), | ||
| INVALID_RANGE("InvalidRange", 416, |
There was a problem hiding this comment.
416 should come from a constant defined in HttpURLConnection class
| // Reset Read Type back to normal and set again based on code flow. | ||
| getTracingContext().setReadType(ReadType.NORMAL_READ); | ||
| if (shouldAlwaysReadBufferSize()) { | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
nit: space after if
| // Reset Read Type back to normal and set again based on code flow. | ||
| getTracingContext().setReadType(ReadType.NORMAL_READ); | ||
| if (shouldAlwaysReadBufferSize()) { | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
add a comment for this condition as well
| private final int footerReadSize; // default buffer size to read when reading footer | ||
| private final int readAheadQueueDepth; // initialized in constructor | ||
| private final String eTag; // eTag of the path when InputStream are created | ||
| private String eTag; // eTag of the path when InputStream are created |
There was a problem hiding this comment.
nit: InputStream is created
| } | ||
| } | ||
|
|
||
| String getRelativePath(final Path path) { |
| tracingContext, | ||
| contextEncryptionAdapter).getResult(); | ||
|
|
||
| String resourceType = |
There was a problem hiding this comment.
use client.checkIsDir method which handles null case as well
| } | ||
| contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE). | ||
| split(AbfsHttpConstants.FORWARD_SLASH)[1]); | ||
| eTag = op.getResult().getResponseHeader("ETag"); |
There was a problem hiding this comment.
use extractEtagHeader method
| if (Objects.equals(resourceType, DIRECTORY)) { | ||
| throw directoryReadException(); | ||
| } | ||
| contentLength = Long.parseLong(op.getResult().getResponseHeader(HttpHeaderConfigurations.CONTENT_RANGE). |
There was a problem hiding this comment.
same use extractContentLen method
| if (ere.getStatusCode() == HttpURLConnection.HTTP_NOT_FOUND) { | ||
| throw new FileNotFoundException(ere.getMessage()); | ||
| int status = ere.getStatusCode(); | ||
| if(ere.getErrorMessage().contains(readOnDirectoryErrorMsg)){ |
There was a problem hiding this comment.
nit: space after if
|
|
||
| } catch (AzureBlobFileSystemException gpsEx) { | ||
| AbfsRestOperationException gpsEre = (AbfsRestOperationException) gpsEx; | ||
| if(gpsEre.getErrorMessage().contains(readOnDirectoryErrorMsg)){ |
There was a problem hiding this comment.
nit: space after if
| } | ||
|
|
||
| // Default: propagate original error | ||
| throw new IOException(ex); |
There was a problem hiding this comment.
should be done once only on line 715
| This is required since contentLength is not available yet to determine prefetch block size. | ||
| */ | ||
| bytesRead = readInternal(getFCursor(), getBuffer(), 0, getBufferSize(), false); | ||
| if(shouldRestrictGpsOnOpenFile() && isFirstRead()) { |
There was a problem hiding this comment.
nit: space after if
|
Description of PR
JIRA: https://issues.apache.org/jira/browse/HADOOP-19795
We do a getPathStatus call during file open for read. This call is primarily used to fetch the file’s metadata properties before the actual read begins.
We are now introducing an optional, config-driven read flow that avoids the getPathStatus call during open and instead derives required metadata from the read response itself.
How was this patch tested?
New tests were added and test suite was run. Adding the results in comments below