[ https://issues.apache.org/jira/browse/SPARK-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716264#comment-14716264 ]
Yi Zhou commented on SPARK-10310: --------------------------------- This issue found in real-world case. Hopefully it can be fixed in Spark 1.5.0 code. Thanks in advance ! > [Spark SQL] All result records will be popluated in ONE line due to missing > the correct line/filed delimeter > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-10310 > URL: https://issues.apache.org/jira/browse/SPARK-10310 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Yi Zhou > Priority: Blocker > > There is real case using python stream script in Spark SQL query. We found > that all result records were wroten in ONE line as input from "select" > pipeline for python script and so it cause script will not identify each > record.Other, filed separator in spark sql will be '^A' or '\001' which is > inconsistent/incompatible the '\t' in Hive implementation. > #################Key Query####################: > CREATE VIEW temp1 AS > SELECT * > FROM > ( > FROM > ( > SELECT > c.wcs_user_sk, > w.wp_type, > (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec > FROM web_clickstreams c, web_page w > WHERE c.wcs_web_page_sk = w.wp_web_page_sk > AND c.wcs_web_page_sk IS NOT NULL > AND c.wcs_user_sk IS NOT NULL > AND c.wcs_sales_sk IS NULL --abandoned implies: no sale > DISTRIBUTE BY wcs_user_sk SORT BY wcs_user_sk, tstamp_inSec > ) clicksAnWebPageType > REDUCE > wcs_user_sk, > tstamp_inSec, > wp_type > USING 'python sessionize.py 3600' > AS ( > wp_type STRING, > tstamp BIGINT, > sessionid STRING) > ) sessionized > #############Key Python Script################# > for line in sys.stdin: > user_sk, tstamp_str, value = line.strip().split("\t") > ############Result Records example from 'select' ################## > ^V31^A3237764860^Afeedback^U31^A3237769106^Adynamic^T31^A3237779027^Areview > ############Result Records example in format###################### > 31 3237764860 feedback > 31 3237769106 dynamic > 31 3237779027 review -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org