Re: Colab vs Local IDE

Ramesh Mathikumar Thu, 29 Oct 2020 19:03:25 -0700

Hi Kyle,

Here is the full stack trace. Printed the parameters for you as well.



{'runner': 'PortableRunner', 'streaming': False, 'beam_services': {}, 
'type_check_strictness': 'DEFAULT_TO_ANY', 'pipeline_type_check': True, 
'runtime_type_check':
 False, 'direct_runner_use_stacked_bundle': True, 
'direct_runner_bundle_repeat': 0, 'direct_num_workers': 1, 
'direct_running_mode': 'in_memory', 'dataflow_endpoint
': 'https://dataflow.googleapis.com', 'project': None, 'job_name': None, 
'staging_location': None, 'temp_location': None, 'region': None, 
'service_account_email':
None, 'no_auth': False, 'template_location': None, 'labels': None, 'update': 
False, 'transform_name_mapping': None, 'enable_streaming_engine': False, 
'dataflow_kms
_key': None, 'flexrs_goal': None, 'hdfs_host': None, 'hdfs_port': None, 
'hdfs_user': None, 'hdfs_full_urls': False, 'num_workers': None, 
'max_num_workers': None, '
autoscaling_algorithm': None, 'machine_type': None, 'disk_size_gb': None, 
'disk_type': None, 'worker_region': None, 'worker_zone': None, 'zone': None, 
'network': N
one, 'subnetwork': None, 'worker_harness_container_image': None, 
'sdk_harness_container_image_overrides': None, 'use_public_ips': None, 
'min_cpu_platform': None, '
dataflow_worker_jar': None, 'dataflow_job_file': None, 'experiments': None, 
'number_of_worker_harness_threads': None, 'profile_cpu': False, 
'profile_memory': False
, 'profile_location': None, 'profile_sample_rate': 1.0, 'requirements_file': 
None, 'requirements_cache': None, 'setup_file': None, 'beam_plugins': None, 
'save_main
_session': False, 'sdk_location': 'default', 'extra_packages': None, 
'job_endpoint': 'localhost:8099', 'artifact_endpoint': None, 
'job_server_timeout': 60, 'enviro
nment_type': 'LOOPBACK', 'environment_config': None, 'sdk_worker_parallelism': 
1, 'environment_cache_millis': 0, 'output_executable_path': None, 
'artifacts_dir': N
one, 'job_port': 0, 'artifact_port': 0, 'expansion_port': 0, 'flink_master': 
'[auto]', 'flink_version': '1.10', 'flink_job_server_jar': None, 
'flink_submit_uber_ja
r': False, 'spark_master_url': 'local[4]', 'spark_job_server_jar': None, 
'spark_submit_uber_jar': False, 'spark_rest_url': None, 'on_success_matcher': 
None, 'dry_r
un': False, 'wait_until_finish_duration': None, 'pubsubRootUrl': None}


WARNING:root:Make sure that locally built Python SDK docker image has Python 
3.6 interpreter.
Traceback (most recent call last):
  File "SaiStudy - Apache-Beam-Spark.py", line 34, in <module>
    | 'Write results' >> beam.io.WriteToText(outputs_prefix)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\apache_beam\pipeline.py",
 line 555, in __exit__
    self.result = self.run()
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\apache_beam\pipeline.py",
 line 534, in run
    return self.runner.run_pipeline(self, self._options)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\apache_beam\runners\portability\portable_runner.py",
 line 388, in run_pipe
line
    job_service_handle = self.create_job_service(options)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\apache_beam\runners\portability\portable_runner.py",
 line 304, in create_j
ob_service
    return self.create_job_service_handle(server.start(), options)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\apache_beam\runners\portability\job_server.py",
 line 56, in start
    grpc.channel_ready_future(channel).result(timeout=self._timeout)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\grpc\_utilities.py",
 line 140, in result
    self._block(timeout)
  File 
"C:\Users\rekharamesh\AppData\Local\Programs\Python\Python36-32\lib\site-packages\grpc\_utilities.py",
 line 86, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError


On 2020/10/29 03:04:46, Kyle Weaver <kcwea...@google.com> wrote: 
> > Is there any difference in running the spark or Flink runners from Colab
> vs Local.
> 
> Google Colab is hosted in a Linux virtual machine. Docker for Windows is
> missing some features, including host networking.
> 
> > 4. python "filename.py" should run but getting raise
> grpc.FutureTimeoutError()
> 
> Can you provide the stack trace?
> 
> On Wed, Oct 28, 2020 at 4:34 PM Ramesh Mathikumar <meetr...@googlemail.com>
> wrote:
> 
> > Hi Team,
> >
> > Is there any difference in running the spark or Flink runners from Colab
> > vs Local. The code runs with no issues in Google Colab environment but it
> > does not run on my local environment.
> >
> > This is for windows.
> >
> > Steps:
> >
> > 1. Start Flink or Spark on local machine
> > 2. Make sure Spark and Flink runs on local machine
> > 3. If Spark - start docker like this -- docker run -p 8099:8099 -p
> > 8098:8098 -p 8097:8097 apache/beam_spark_job_server:latest
> > --spark-master-url=spark://localhost:7077
> > 4. python "filename.py" should run but getting raise
> > grpc.FutureTimeoutError()
> >
> >
> > Parameters as follows
> >
> > SPARK:
> > options = PipelineOptions([
> >     "--runner=PortableRunner",
> >     "--job_endpoint=localhost:8099",
> >     "--environment_type=LOOPBACK"
> > ])
> >
> > FLINK:
> > options = PipelineOptions([
> >     "--runner=FlinkRunner",
> >     "--flink_version=1.8",
> >     "--flink_master=localhost:8081",
> >     "--environment_type=LOOPBACK"
> > ])
> >
>

Re: Colab vs Local IDE

Reply via email to