Merged
Conversation
We decided to use only single quotes for strings to make it consistent. I kept docstrings in triple double quotes because of PEP 257.
They were added to give additional information but are not needed here any more in this place
…-processing-worker add bashlib processing worker, require Python 3.7
MehmedGIT
reviewed
Mar 27, 2023
Contributor
MehmedGIT
left a comment
There was a problem hiding this comment.
Should be good to be merged now!
Comment on lines
230
to
233
| try: | ||
| # Only checks if the process queue exists, if not raises ValueError | ||
| self.rmq_publisher.create_queue(processor_name, passive=True) | ||
| except ChannelClosedByBroker as error: |
Contributor
There was a problem hiding this comment.
I was expecting this to raise just a ValueError according to the doc if the queue doesn't exist, however, that seems to not be the case. The channel is closed.
Comment on lines
239
to
242
| finally: | ||
| # Reconnect publisher - not efficient, but works | ||
| # TODO: Revisit when reconnection strategy is implemented | ||
| self.connect_publisher(enable_acks=True) |
Contributor
There was a problem hiding this comment.
Hence, a reconnection is needed here. I know that it's more efficient to just open a new channel but don't want to deal with that now - will be more appropriate to invest time to implement correctly the reconnection scheme internally in the rabbitmq_utils.
Fix the silly mistake I made.
bertsky
reviewed
Mar 28, 2023
bertsky
reviewed
Mar 28, 2023
MehmedGIT
added a commit
that referenced
this pull request
Mar 28, 2023
This was referenced Mar 28, 2023
Member
Details> > ```shell > # 1. > $ ocrd processing-worker --queue= --database= > > # 2. > $ --queue= --database= > ``` > > We (me and @joschrew) are aware that in the [spec](https://github.com/OCR-D/spec/blob/master/web_api.md) there is a change that is not adapted here, yet: > > ```shell > # 1. Use ocrd CLI bundled with OCR-D/core > $ ocrd server --type=worker --queue= --database= > > # 2. Use processor name > $ --server --type=worker --queue= --database= > ``` > > Where the `--type` can be either: `worker` (processing worker) > > ```shell > ocrd server --type=worker --queue= --database= > ``` > > or `server` (the REST API wrapper based on #884). > > ```shell > ocrd server --type=server --database= > ``` > > However, it's a bad idea to extend the processing worker code to support the REST API processing server (aka standalone processing worker that has nothing to do with the bigger Processing Server in the spec and does not need the queues). > > Now, try to imagine a good way to explain to the OCR-D community without confusing them: > > 1. why the `` is a server, but is not a server when `--type=worker` and is referred to with `processing-worker` and no direct requests can be sent > 2. why the `` is a server, and actual server when `--type=server` and is referred to with `processing-server` > 3. why both are grouped together under `ocrd server...` or `... --server ...` but potentially implemented together under `processing_worker.py` > 4. why Processing Server (PS-big) and processing servers (PS-small), standalone ocrd processors, are different concepts but both referred to with `processing server`. > > Sounds confusing? It's even more when it has to be implemented in a clean way. So there are 3 main reasons to not adapt that yet: > > * The priority changed a bit and the higher priority now is to deploy working Processing Server, Workflow Server, and Workspace Server together on a live VM instance, so KITODO can use that to continue their development. > * Identify and fix problems that arise when combining the 3 servers above from 2 different repositories ([Processing-Server #974](https://github.com//pull/974) and reference WebAPI impl). > * To not complicate the current implementation without first trying to think how to separate concepts properly and avoid potential problems. Ideally: > > 1. the standalone processing servers (aka server processing workers) should be implemented as a separate class (name suggestions?), sibling of `ProcessingWorker`, and both will share the common methods. > 2. the CLI for both should be separated with improved naming conventions.continued in #1032 |
This was referenced Mar 29, 2023
tdoan2010
approved these changes
Mar 30, 2023
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request implements the processing server (OCR-D/spec#222). The processing server starts a rabbitmq, mongodb and processing workers via ssh and docker. It receives calls to run ocr-d processors. The jobs are enqueued and the workers read the jobs from the queue and execute them. In the mongdb the jobs are stored and status changes of the jobs are reflected in the database and can be queried through the processing workers API.
To start a worker it must be defined in the configuration file. There the host of the worker has to be set. It is expected that the worker executables (e.g. ocrd-dummy) are available in the PATH. PATH can be modified via ~/.bash_profile or ~/.profile.
Example configuration file:
Example calls for the processing-server endpoints:
Run a processor:
curl 'http://localhost:8080/processor/ocrd-dummy' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "path": "/home/testuser/.local/share/ocrd-workspaces/data/mets.xml", "input_file_grps": ["OCR-D-IMG"], "output_file_grps": ["IMG-COPY-1"], "parameters": { "copy_files": true } }'Request processor status:
curl 'http://localhost:8080/processor/ocrd-dummy/<insert-job-id-here>'List available processors:
curl 'http://localhost:8080/processor'Get information about a single processor
curl 'http://localhost:8080/processor/ocrd-dummy'