Richard Downer created DRILL-8318: ------------------------------------- Summary: httpd format parser throws exception on log item with malformed query string Key: DRILL-8318 URL: https://issues.apache.org/jira/browse/DRILL-8318 Project: Apache Drill Issue Type: Bug Affects Versions: 1.19.0 Environment: drill-embedded
openjdk version "1.8.0_342" OpenJDK Runtime Environment Corretto-8.342.07.1 (build 1.8.0_342-b07) OpenJDK 64-Bit Server VM Corretto-8.342.07.1 (build 25.342-b07, mixed mode) Ubuntu 20.04.4 LTS (Focal Fossa) Running under WSL on Windows 11 Reporter: Richard Downer Attachments: testcase I am running Apache Drill over my httpd-style access logs. These are collecting data from requests on the open Internet, which sometimes means questionable requests made by remote Internet users (sometimes with hostile intent). One such style of request looks like this: {{151.236.216.243 - - [15/Sep/2022:20:18:07 +0000] "GET /?=PHPE9568F36-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 301 178 "-" "curl/7.54.0"}} I have put this request into a new log file containing only this line, as a test case. I initiate a query: {{select request_receive_time, request_status_last, request_firstline_method, request_firstline_uri from table(dfs.`/home/richard/drill/access-logs/nginx/testcase`(type=>'httpd', logFormat=>'combined')) where request_status_last = 404;}} This produces this error: {{Error: DATA_READ ERROR: Error reading HTTPD file at line number 0}} {{Error occurred during setter call: null caused by "java.lang.StringIndexOutOfBoundsException: String index out of range: -1" when calling "public void org.apache.drill.exec.store.httpd.HttpdLogRecord.setWildcard(java.lang.String,java.lang.String)" for key = "STRING:request.firstline.uri.query.*" name = "STRING:request.firstline.uri.query" value = "Value\{filled=STRING, s='PHPE9568F36-D428-11d2-A769-00AA001ACF42', l=null, d=null}" castsTo = "[STRING]"}} {{Format plugin: httpd}} {{Format plugin: HttpdLogFormatPlugin}} {{Plugin config name: null}} {{Fragment: 0:0}} While I appreciate that the query string part of the request is probably malformed according to a strict interpretation, this is a real request seen "in the wild" and I would prefer that Drill is robust enough to deal with the type of garbage requests frequently seen on real web server. Thank you for your assistance - if I can provide any more information that would help please let me know! -- This message was sent by Atlassian Jira (v8.20.10#820010)