This looks like the same issue that Scio encountered with the Google API Client libraries: https://github.com/spotify/scio/issues/388
I think that if the `value` is null, you are supposed by BigQuery to omit the key rather than include it with a null value. On Fri, Feb 17, 2017 at 11:38 PM, Dan Halperin <[email protected]> wrote: > It looks to me like the NPE comes from the Google API client library. It > looks like maybe you are creating an invalid tablerow (null key? null > value?) > > at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap. > java:419) > > Dan > > On Fri, Feb 17, 2017 at 3:19 PM, Kenneth Knowles <[email protected]> wrote: > >> Hi Tobias, >> >> The specific error there looks like you have a forbidden null somewhere >> deep inside the output of logLine.toTableRow(). Hard to say more with this >> information. >> >> Kenn >> >> On Fri, Feb 17, 2017 at 4:46 AM, Tobias Feldhaus < >> [email protected]> wrote: >> >>> It seems like this is caused by the fact that the workaround I am using >>> to write >>> daily-partitioned tables in batch mode does not work. >>> >>> My problem is that with more than 1000 days, the date-sharded table in >>> BQ will >>> be too large to be converted automatically via a simple “bq partition” >>> command >>> into a partitioned table as such table cannot have more than 1000 days. >>> >>> So the solution will be a divide-and-conquer strategy I guess. >>> >>> On 17.02.17, 11:36, "Tobias Feldhaus" <[email protected]> >>> wrote: >>> >>> Hello, >>> >>> could it be, that it's no longer possible to run pipelines with a >>> BigQuery sink >>> locally on the dev machine? I migrated a "Read JSON from GCS, parse >>> and >>> write to BQ" pipeline to Apache Beam 0.5.0 from the Dataflow SDK. >>> All tests are green, the pipeline runs successfully on the Dataflow >>> service with >>> the test files, but locally with the DirectRunner I get a NPE. >>> >>> It happens right after I create the TableRow element which I even >>> double >>> checked not to be null. Even when I artificially create a LogLine >>> element in this step without taking the one from the input the NPE >>> is thrown: >>> >>> >>> static class Outputter extends DoFn<LogLine, TableRow> { >>> (...) >>> LogLine logLine = c.element(); >>> >>> TableRow tableRow = logLine.toTableRow(); >>> tableRow.set("ts", c.timestamp().toString()); >>> >>> if (c != null && tableRow != null){ >>> try { >>> >>> c.output(tableRow); >>> } >>> catch(NullPointerException e){ >>> LOG.error("catched NPE"); >>> e.printStackTrace(); >>> } >>> } >>> >>> The corrensponding Stacktrace looks like this: >>> >>> ERROR: catched NPE >>> java.lang.NullPointerException >>> at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap. >>> java:419) >>> at java.util.AbstractMap.hashCode(AbstractMap.java:530) >>> at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap. >>> java:419) >>> at java.util.AbstractMap.hashCode(AbstractMap.java:530) >>> at java.util.Arrays.hashCode(Arrays.java:4146) >>> at java.util.Objects.hash(Objects.java:128) >>> at org.apache.beam.sdk.util.WindowedValue$TimestampedValueInGlo >>> balWindow.hashCode(WindowedValue.java:409) >>> at java.util.HashMap.hash(HashMap.java:338) >>> at java.util.HashMap.get(HashMap.java:556) >>> at org.apache.beam.runners.direct.repackaged.com.google.common. >>> collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:193) >>> at org.apache.beam.runners.direct.repackaged.com.google.common. >>> collect.AbstractSetMultimap.put(AbstractSetMultimap.java:128) >>> at org.apache.beam.runners.direct.repackaged.com.google.common. >>> collect.HashMultimap.put(HashMultimap.java:49) >>> at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFac >>> tory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBun >>> dleFactory.java:112) >>> at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputMa >>> nager.output(ParDoEvaluator.java:198) >>> at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnContext.ou >>> tputWindowedValue(SimpleDoFnRunner.java:352) >>> at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessCon >>> text.output(SimpleDoFnRunner.java:553) >>> at ch.localsearch.dataintel.logfiles.FrontendPipeline$Outputter >>> .processElement(FrontendPipeline.java:181) >>> at ch.localsearch.dataintel.logfiles.FrontendPipeline$Outputter >>> $auxiliary$sxgOpc6N.invokeProcessElement(Unknown Source) >>> at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessE >>> lement(SimpleDoFnRunner.java:199) >>> at org.apache.beam.runners.core.SimpleDoFnRunner.processElement >>> (SimpleDoFnRunner.java:161) >>> at org.apache.beam.runners.core.PushbackSideInputDoFnRunner.pro >>> cessElement(PushbackSideInputDoFnRunner.java:111) >>> at org.apache.beam.runners.core.PushbackSideInputDoFnRunner.pro >>> cessElementInReadyWindows(PushbackSideInputDoFnRunner.java:77) >>> at org.apache.beam.runners.direct.ParDoEvaluator.processElement >>> (ParDoEvaluator.java:134) >>> at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingT >>> ransformEvaluator.processElement(DoFnLifecycleManagerRemovin >>> gTransformEvaluator.java:51) >>> at org.apache.beam.runners.direct.TransformExecutor.processElem >>> ents(TransformExecutor.java:139) >>> at org.apache.beam.runners.direct.TransformExecutor.run(Transfo >>> rmExecutor.java:107) >>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor >>> s.java:511) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>> Executor.java:1142) >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>> lExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Best, >>> Tobias >>> >>> >>> >>> >> >
