1. How can we rename the output file?

Ans. We can rename by implementing multiple format output class.

  1. Define distributed cache?

Ans. It is used on web servers to provide non local storage for serving multiple regions and transaction throughout.

  1. Name some of the components of MapReduce Job?

Ans.

  • Mapper class
  • Main driver class
  • Reducer class
  1. Can we write a map reduce program in any language other than Java?

Ans. Yes, it can be written in oodles of languages like Python, PHP, C++ and R.

  1. What is the purpose of shuffling and sorting?

Ans. It determines which reducer instance will receive which intermediate values and keys. The process of sending data to reducer from mapper is known as shuffling, while sorting is used to sort the output key value pairs from the mapper.

  1. What are the main job control options specified by MapReduce?

Ans.

  • submit ()
  • waitforcompletion(boolean)
  1. What is the use of MapReduce partitioner?

Ans. The use is to ensure that all the value of a single key gets to the same reducer, ultimately which helps distribution of map output over the reducers.

  1. Name some important parameters of a mapper?

Ans. Following are the important parameters of a mapper:

  • Text and Intwritable
  • Longwritable and text
  1. What happens when a node fails during the write process?

Ans. In that case, a new mode that has the other data nodes opens up  until the file is closed.

  1. How can you split 100 lines of input as a single split?

Ans. This can be done using class NLineInputFormat.

  1. What is InputFormat?

Ans. It explains the input-specification for a MapReduce Job. It depends on the InputFormat of the job to split up the input file into logical InputSplit instances.

  1. What are the benefits of map side join?

Ans.

  • Helps in decreasing the cost that is incurred for sorting in the reduce stages
  • Helps in developing the performance of the task by reducing the time to finish the task
  1. What are the primary phases of a reducer?

Ans.

  • Sort
  • Shuffle
  • Reduce
  1. How can you control reporting in Hadoop?

Ans. By using Hadoop-metrics.properties

  1. Is it possible to search files using wildcards?

Ans. Yes.

  1. What is YARN?

Ans. YARN stands for Yet Another Resource Negotiator is a cluster management technology.

 

By bpci