GitHub user hansva added a comment to the discussion: Question About Database Connections in Apache Hop Server Mode
I'll try to write a full and cohesive answer. Let's start by rephrasing the original question: > I want to run a pipeline/workflow on Hop Server using the Rest API directly, > without using Hop GUI or hop-run. How do I do this? ## Intro ### Hop Server **What it is** Hop server is a stateless server, its main purpose is to be used as an extension to Hop GUI to run a pipeline or workflow in a remote environment **What it isn't** Hop Server isn't the typical server that you would use for scheduling/monitoring it does not retain state or history, it does not store this information (except in memory for a short period). After a restart all previous information is lost. By circling back to what it is, we can also discuss why it is poorly documented. We do **not** want it to be used in a stand-alone way; it wasn't made for this. We did add the endpoints to our [documentation](https://hop.apache.org//manual/latest/hop-server/rest-api.html) because people were asking for them, but honestly, it was never designed to be used without using the GUI or Hop Run. There are better ways, eg. using short lived containers which provide more flexibility and in combination with airflow ([tutorial](https://hop.apache.org//manual/latest/how-to-guides/run-hop-in-apache-airflow.html)) you can also use webhooks to start things. Shameless plug: We (know.bi) are working on something better which we hope to showcase soon. This covers our disclaimer, let's get back to the subject. ## Running a single pipeline The process to start something on Hop Server is split up in 3 different categories: - A single pipeline - A single workflow - A workflow with other pipelines/workflows let's discuss starting a single pipeline, there are 3 steps that need to be taken to start a pipeline on Hop Server. ### registerPipeline The first step is to send the pipeline and all needed environment information to the server. As stated before the server is stateless so it knows nothing it needs all information to create a successful execution. The XML format of the request: ``` <pipeline_configuration> <pipeline> </pipeline> <pipeline_execution_configuration> <variables></variables> <parameters></parameters> <pass_export>N</pass_export> <log_level>Basic</log_level> <log_file>N</log_file> <log_filename/> <log_file_append>N</log_file_append> <create_parent_folder>N</create_parent_folder> <clear_log>Y</clear_log> <show_subcomponents>Y</show_subcomponents> <run_configuration>local</run_configuration> </pipeline_execution_configuration> <metastore_json> </metastore_json> </pipeline_configuration> ``` 3 blocks of information need to be included in this request: **pipeline:** this one is simple it's the hpl file that you wish to execute on the server. **pipeline_execution_configuration:** This block contains an export of all Hop variables in the <variables> section and parameters/variables you have defined in the Run Options dialog  The variables section will also contain all variables you have defined in your environment, if you have defined database username/password and so on to an environment file they get added there. Each variable looks like `<variable><name>VARIABLE_NAME</name><value>VALUE</value></variable>` **metastore_json:** This is the part where it gets hard. The metastore_json is a Base64 encoded gzip stream. To get a fast/simple preview of what's in this you could take the example from our docs and throw it in [this](https://www.bugdays.com/gzip-base64) website. It boils down to a json containing all objects you have defined in the metadata perspective/metadata folder. example if you only have a PostgreSQL connection, but it also needs to contain your run targets and all other objects that are available in your metadata folder. Another note: each database type can have different fields (just like in the UI) most of them are shared, but eg MSSQL Server has more fields. ``` { "rdbms": [ { "rdbms": { "POSTGRESQL": { "databaseName": "postgres", "pluginId": "POSTGRESQL", "indexTablespace": null, "dataTablespace": null, "accessType": 0, "hostname": "localhost", "password": "", "pluginName": "PostgreSQL", "port": "5432", "servername": null, "attributes": { "SUPPORTS_TIMESTAMP_DATA_TYPE": "N", "QUOTE_ALL_FIELDS": "N", "SUPPORTS_BOOLEAN_DATA_TYPE": "Y", "FORCE_IDENTIFIERS_TO_LOWERCASE": "N", "PRESERVE_RESERVED_WORD_CASE": "Y", "SQL_CONNECT": "", "FORCE_IDENTIFIERS_TO_UPPERCASE": "N", "PREFERRED_SCHEMA_NAME": "" }, "manualUrl": "", "username": "postgres" } }, "name": "pg" } ] } ``` After building and sending this request to the server (POST) you will get a response: ``` <webresult> <result>OK</result> <message>Pipeline 'variables' was added to HopServer with id 08bdff17-0d75-43a3-b890-05783376cbb2</message> <id>08bdff17-0d75-43a3-b890-05783376cbb2</id> </webresult> ``` ### prepareExec After you get back the Id you have to hit the prepareExec with a GET request `GET /hop/prepareExec/?name=variables&xml=Y&id=08bdff17-0d75-43a3-b890-05783376cbb2` response: ``` <webresult> <result>OK</result> <message/> <id/> </webresult> ``` This will prepare the pipeline for execution and it will enter a "waiting state" ### startExec The final step is a GET to startExec to start the actual execution `GET /hop/startExec/?name=variables&xml=Y&id=08bdff17-0d75-43a3-b890-05783376cbb2` response ``` <webresult> <result>OK</result> <message/> <id/> </webresult> ``` You can follow up how everything is going with the pipelineStatus endpoint. ## Closing note These steps should help you use the REST API directly to start a pipeline, running a single workflow is a similar process. Running a combination of workflows and pipelines requires more work as this is a specially crafted zip file that is sent to the server. Happy coding, Hans GitHub link: https://github.com/apache/hop/discussions/4634#discussioncomment-11422350 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
