Re: Information Needed

Munagala Ramanath Mon, 30 May 2016 11:43:14 -0700

You'll need to find which node is running YARN (Resource Manager) and look
for log files on that machine.
Usually, they are under */var/log/hadoop-yarn* or */var/log/hadoop* or
similar locations.


The files themselves will have names that vary depending on your
installation; some examples:
yarn-<user>-resourcemanager-<host>.log
hadoop-cmf-yarn-RESOURCEMANAGER-<host>.log.out

Look for lines similar to the following that reference the *failed
application id*; they will tell you what containers
were allocated on which nodes on behalf of this application. You may then
have to ssh into those nodes
and check the specific container logs for more specific information on why
it may have failed.

Ram
-----------------------------------

2016-05-24 02:53:42,636 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=<user>
 OPERATION=AM Allocated Container        TARGET=SchedulerApp
RESULT=SUCCESS  APPID=application_1462948052533_0036
 CONTAINERID=container_1462948052533_0036_01_022468
2016-05-24 02:53:42,636 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_1462948052533_0036_01_022468 of capacity
<memory:1536, vCores:1> on host <host:port>, which has 9 containers,
<memory:24064, vCores:9> used and <memory:180736, vCores:15> available
after allocation
2016-05-24 02:53:42,636 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_1462948052533_0183_01_036933 Container Transitioned from
ALLOCATED to ACQUIRED


On Mon, May 30, 2016 at 9:24 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
[email protected]> wrote:

> Hello,
>
>
>
> I am trying to read a Vertica Database table with partitioning, tried my
> hand on partitioning by overriding the definepartition method. My launch is
> getting successful but when I see the Monitor tab , the Job is in failed
> state and I cannot see the logs for failed job on the DT Web console.
>
>
>
> Is there any way that I can view the logs on the UNIX machine for the
> failed jobs?
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR)
> *Sent:* 2016, May, 27 9:33 AM
> *To:* [email protected]
> *Subject:* RE: Information Needed
>
>
>
> Hi Ram,
>
>
>
> Thank you so much, it worked !!
>
>
>
> I have done with single input feed reading and parsing by using
> configuration file.
>
>
>
> I would like to do this for 100 feeds and 100 configuration files by using
> partitioning. I guess I have to know how to set individual feed directory
> and configuration file per partition , If I am not wrong.
>
>
>
> While I wait for your sample code to use partitioning , I will meanwhile
> try to understand the partitioning.
>
>
>
> Your support is well appreciated.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Munagala Ramanath [mailto:[email protected]
> <[email protected]>]
> *Sent:* 2016, May, 26 7:32 PM
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> You need to return null from readEntity() when br.readLine() returns null
> to signal that the EOF
>
> is reached.
>
>
>
> Ram
>
>
>
> On Thu, May 26, 2016 at 2:07 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hi Priyanka,
>
>
>
> There is only a single file in the directory and there are no external
> updates, the same code was working for simple file read from HDFS but when
> added the parsing part it is going to infinite loop.
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Priyanka Gugale [mailto:[email protected]]
> *Sent:* 2016, May, 26 5:03 PM
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> There is a setting called "scanIntervalMillis" that keeps scanning the
> input directory for newly added files. In your case if new files are
> getting added to the directory? Or if the input file timestamp is getting
> updated?
>
>
>
> -Priyanka
>
>
>
> On Thu, May 26, 2016 at 12:36 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hello,
>
>
>
> I am trying to read a file from HDFS and parse using XML configuration
> file and print on console. The issue I am facing is read file is going in
> infinite loop, I am not sure how to set file read to only once. Please help.
>
>
>
> My Operator code:
>
>
>
> package com.rbc.aml.cnscan.operator;
>
>
>
> import com.datatorrent.api.DefaultOutputPort;
>
> import com.datatorrent.lib.io.fs.AbstractFileInputOperator;
>
> import com.datatorrent.netlet.util.DTThrowable;
>
> import com.rbc.aml.cnscan.utils.ClientRecord;
>
>
>
> import net.sf.flatpack.DataSet;
>
> import net.sf.flatpack.DefaultParserFactory;
>
> import net.sf.flatpack.Parser;
>
>
>
> import org.apache.hadoop.conf.Configuration;
>
> import org.apache.hadoop.fs.FileSystem;
>
> import org.apache.hadoop.fs.Path;
>
> import org.slf4j.Logger;
>
> import org.slf4j.LoggerFactory;
>
>
>
> import java.io.BufferedReader;
>
> import java.io.IOException;
>
> import java.io.InputStream;
>
> import java.io.InputStreamReader;
>
> import java.io.StringReader;
>
>
>
> import javax.validation.constraints.NotNull;
>
>
>
> public class FeedInputOperator extends
> AbstractFileInputOperator<ClientRecord> {
>
>     private static Logger LOG =
> LoggerFactory.getLogger(FeedInputOperator.class);
>
>
>
>     protected transient BufferedReader br;
>
>     protected String fileName;
>
>     public transient DefaultOutputPort<ClientRecord> output = new
> DefaultOutputPort<>();
>
>
>
>     @NotNull
>
>     private String configFile = null;
>
>
>
>     public String getConfigFile() {
>
>         return configFile;
>
>     }
>
>
>
>     public void setConfigFile(String file) {
>
>         configFile = file;
>
>     }
>
>
>
>     @Override
>
>     protected InputStream openFile(Path path) throws IOException {
>
>         InputStream is = super.openFile(path);
>
>         fileName = path.getName();
>
>         System.out.println("input file is"+fileName);
>
>         br = new BufferedReader(new InputStreamReader(is));
>
>         return is;
>
>     }
>
>
>
>     @Override
>
>     protected void closeFile(InputStream is) throws IOException {
>
>         super.closeFile(is);
>
>         br.close();
>
>         br = null;
>
>     }
>
> // interface to the hadoop file system
>
>     private transient FileSystem fs;
>
>
>
>     private FileSystem getFS() {
>
>         if (fs == null) {
>
>             try {
>
>                 fs = FileSystem.get(new Configuration());
>
>             } catch (Exception e) {
>
>                 throw new RuntimeException("Unable to get handle to the
> filesystem");
>
>             }
>
>         }
>
>         return fs;
>
>     }
>
>     @Override
>
>     protected ClientRecord readEntity() throws IOException {
>
>                 String line = br.readLine();
>
>                 System.out.println("line is "+line);
>
>                 ClientRecord rec = new ClientRecord();
>
>                 try {
>
>             InputStream is = getFS().open(new Path(configFile));
>
>             Parser parser =
> DefaultParserFactory.getInstance().newFixedLengthParser(
>
>                     new InputStreamReader(is), new StringReader(line));
>
>             parser.setIgnoreExtraColumns(true);
>
>             final DataSet ds = parser.parse();
>
>             if (ds == null || ds.getRowCount() == 0) {
>
>                 throw new RuntimeException("Could not parse record");
>
>             }
>
>
>
>             if (ds.next()) {
>
>                 for (String col: ds.getColumns()) {
>
>                     LOG.debug("Col: " + col);
>
>                 }
>
>
>
>                 rec.sourceId = ds.getString("SYS_SRC_ID");
>
>                 rec.number = ds.getString("CLNT_NO");
>
>                 rec.divisionId = ds.getString("DIV_ID");
>
>                 rec.lastName = ds.getString("CLNT_NM");
>
>                 rec.firstName = ds.getString("CLNT_FRST_NM");
>
>                 rec.type = ds.getString("CLNT_TYP");
>
>                 rec.status = ds.getString("STS");
>
>                 rec.dob = ds.getString("DOB");
>
>                 rec.address1 = ds.getString("ST_ADDR_1_1");
>
>                 rec.address2 = ds.getString("ST_ADDR_1_2");
>
>                 rec.address3 = ds.getString("ST_ADDR_1_3");
>
>                 rec.address4 = ds.getString("ST_ADDR_1_4");
>
>                 rec.fileName = fileName;
>
>             }
>
>                 }catch(java.io.IOException e) {
>
>             DTThrowable.rethrow(e);
>
>         }
>
>
>
>         LOG.debug("Record: {}", rec);
>
>         return rec;
>
>     }
>
>
>
>     @Override
>
>     protected void emit(ClientRecord tuple) {
>
>
>
>         output.emit(tuple);
>
>     }
>
> }
>
>
>
>
>
> Regards,
>
> Surya Vamshi
>
> *From:* Mukkamula, Suryavamshivardhan (CWM-NR) [mailto:
> [email protected]]
> *Sent:* 2016, May, 24 2:57 PM
> *To:* [email protected]
> *Subject:* RE: Information Needed
>
>
>
> Thank you ram.
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]
> <[email protected]>]
> *Sent:* 2016, May, 24 2:53 PM
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> I'll make a sample available in a day or two.
>
>
>
> Ram
>
>
>
> On Tue, May 24, 2016 at 11:33 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hi ,
>
>
>
> Thank you ram, do you have any sample code that deals with multiple
> directories?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]]
> *Sent:* 2016, May, 24 12:08 PM
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> For scheduling, there is no built-in support but you have a simple script
> that starts the application at a
>
> predetermined time (using, for example, dtcli commands or the REST API),
> then, when you are sure
>
> that all data for the day has been processed and the application is idle,
> you can shutdown the application
>
> (again, using either dtcli or the REST APIs). I would suggest using a
> scripting language like Ruby or
>
> Python since they make many things easier than they shell.
>
>
>
> Handling multiple directories is a little more involved: you'll need to
> override the definePartition() method
>
> of the AbstractFileInputOperator and possibly the DirectoryScanner as well.
>
>
>
> Ram
>
>
>
> On Tue, May 24, 2016 at 6:16 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hello,
>
>
>
> Thank you all for your valuable inputs. My use case is there will be 100
> feeds on HDFS in different locations , not from the same location and I
> have to read them using DT and load into Data base daily once , what is the
> best way to schedule the Data torrent batch job? And how would I achieve
> the parallel processing when my files are in different folders ?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Munagala Ramanath [mailto:[email protected]]
> *Sent:* 2016, May, 20 2:35 PM
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> It appears that SFTP support in the Hadoop file system will not be
> available till 2.8.0:
>
> https://issues.apache.org/jira/browse/HADOOP-5732
>
>
>
> So you might have to write your own SFTP operator or write to HDFS and use
> an
>
> external script to write to SFTP.
>
>
>
> Ram
>
>
>
> On Fri, May 20, 2016 at 11:21 AM, Devendra Tagare <
> [email protected]> wrote:
>
> Hi Surya,
>
>
>
> Good to know the DB reads are working as expected.
>
>
>
> Here's a list of operators you can use/refer for the next use-case,
>
>
>
> HDFS input - for reading multiple input files in parallel you can set
> partitionCount on the AbstractFileInputOperator for parallel
> reads.LineByLineFileInputOperator is a concrete implementation for reading
> one line at a time.
>
>
>
> xml parsing - there is a XmlParser in Malhar that takes in a xml string
> and emits a POJO.
>
>
>
> Combining multiple files into one  - could you please give us a sense of
> the volume and the frequency of writes you expect so we can recommend
> something appropriate ?
>
>
>
> SFTP push - need to check on this one.Will revert.
>
>
>
> @Community, please feel free to chip in.
>
>
>
> Thanks,
>
> Dev
>
>
>
>
>
> On Fri, May 20, 2016 at 8:54 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hi Devendra,
>
>
>
> Thank you , It is working now and I also could read the properties from
> xml file. I could also set the batch size and time gap for next database
> hit.
>
>
>
> Now , my another requirement is to read 50 different files from HDFS ,
> parse them using xml mapping and sftp as a single file to a UNIX box. Can
> you please suggest me the best practice like using parallel processing or
> partitioning?
>
>
>
> Do you have any sample code for parallel processing or partitioning and
> also how would I run the batch Job is there any batch scheduler that data
> torrent provides?
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Devendra Tagare [mailto:[email protected]]
> *Sent:* 2016, May, 18 4:19 PM
>
>
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> Hi,
>
>
>
> Can you try something like this,
>
>
>
>  JdbcPOJOInputOperator opr = dag.addOperator("JdbcPojo", new
> JdbcPOJOInputOperator());
>
>     JdbcStore store = new JdbcStore();
>
>     opr.setStore(store);
>
> The properties would then be set on this store object.
>
> From the code snippet provided earlier, the store was not being set on the
> JdbcInputOperator2
>
> Thanks,
>
> Dev
>
>
>
> On Wed, May 18, 2016 at 12:50 PM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hi ,
>
>
>
> Hello,
>
>
>
> When I have tried using below property as you suggested, my launch itself
> is failing. When I don’t use store and directly assign
> (‘dt.application.CountryNameScan.operator.<operatorName>.prop*.*databaseUrl’)
> launch is successful but application run is failing with null pointer
> exception since ‘store’ object is null.
>
>
>
> I see that in AbstractStoreInputOperator.java there is ‘store’ variable
> and I am not clear how the value is set to it.
>
>
>
> Regards,
>
> Surya Vamshi
>
>
>
> *From:* Devendra Tagare [mailto:[email protected]]
> *Sent:* 2016, May, 18 12:57 PM
>
>
> *To:* [email protected]
> *Subject:* Re: Information Needed
>
>
>
> Hi,
>
>
>
> The property on the store is not getting set since ".store." qualifier is
> missing.Try the below for all store level properties.
>
>
>
>
>
> <property>
>
>     <name>dt.application.CountryNameScan.operator.<operatorName>.prop
> *.store.*databaseUrl</name>
>
>     <value>{databaseUrl}</value>
>
>   </property>
>
>
>
> Thanks,
>
> Dev
>
>
>
> On Wed, May 18, 2016 at 8:38 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <
> [email protected]> wrote:
>
> Hello,
>
>
>
> Thank you Shubam.
>
>
>
> I have tried using AbstractJdbcInputOperator. Below is the Operator and
> the error that I am getting. My observation is ‘*store’* and ‘*context’*
> objects are null. Please help to solve this issue.
>
>
>
> *Error Logs:*
>
>
>
> java.lang.NullPointerException
>
>                 at
> com.datatorrent.lib.db.AbstractStoreInputOperator.setup(AbstractStoreInputOperator.java:77)
>
>                 at
> com.rbc.aml.cnscan.operator.AbstractJdbcInputOperator.setup(AbstractJdbcInputOperator.java:99)
>
>                 at
> com.rbc.aml.cnscan.operator.JdbcInputOperator2.setup(JdbcInputOperator2.java:29)
>
>                 at
> com.rbc.aml.cnscan.operator.JdbcInputOperator2.setup(JdbcInputOperator2.java:13)
>
>                 at com.datatorrent.stram.engine.Node.setup(Node.java:161)
>
>                 at
> com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1287)
>
>                 at
> com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:92)
>
>                 at
> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1361)
>
>
>
>
>
> *Operator Class:*
>
>
>
> import java.sql.ResultSet;
>
> import java.sql.SQLException;
>
>
>
> import com.datatorrent.api.Context;
>
> import com.datatorrent.api.Context.OperatorContext;
>
> import com.datatorrent.api.DefaultOutputPort;
>
> import java.sql.PreparedStatement;
>
> import com.datatorrent.api.Operator;
>
>
>
>
>
> public class JdbcInputOperator2 extends AbstractJdbcInputOperator<Object>
>
>                                 implements
> Operator.ActivationListener<Context.OperatorContext> {
>
>
>
>                 private transient PreparedStatement preparedStatement;
>
>
>
>                 private String query;
>
>
>
>                 // @OutputPortFieldAnnotation(schemaRequired = true)
>
>                 public final transient DefaultOutputPort<Object>
> outputPort = new DefaultOutputPort<Object>();
>
>
>
>                 public JdbcInputOperator2() {
>
>                                 super();
>
>                 }
>
>
>
>                 @Override
>
>                 public void setup(Context.OperatorContext context) {
>
>                                 super.setup(context);
>
>
>
>                                 try {
>
>                                                 preparedStatement =
> store.connection.prepareStatement(queryToRetrieveData());
>
>                                                 System.out.println("store
> value is"+store);
>
>                                 } catch (Exception e) {
>
>
>
>                                 }
>
>
>
>                 }
>
>
>
>                 @Override
>
>                 public Object getTuple(ResultSet result) {
>
>                                 // TODO Auto-generated method stub
>
>                                 StringBuilder sb = new StringBuilder();
>
>                                 try {
>
>                                                 System.out.println("result
> set"+result);
>
>                                                 while (result.next()) {
>
>
> sb.append(result.getString("CLNT_NO"));
>
>
> sb.append(",");
>
>
> sb.append(result.getString("TR_NO"));
>
>
> System.out.println("tuple value"+sb.toString());
>
>                                                 }
>
>                                 } catch (SQLException e) {
>
>                                                 // TODO Auto-generated
> catch block
>
>                                                 e.printStackTrace();
>
>                                 }
>
>
>
>                                 return sb.toString();
>
>                 }
>
>
>
>                 @Override
>
>                 public String queryToRetrieveData() {
>
>                                 // TODO Auto-generated method stub
>
>                                 return query;
>
>                 }
>
>
>
>                 @Override
>
>                 public void activate(OperatorContext arg0) {
>
>                                 // TODO Auto-generated method stub
>
>
>
>                 }
>
>
>
>                 @Override
>
>
> ...
>
> [Message clipped]

Re: Information Needed

Reply via email to