What are issues in traditional file processing?

1.Data duplication: When files are duplicated and held in a number of locations situations can arise that will cause data to be inconsistent.

  • Corrections or modifications made in one location may not be updated in another. For example, customer address files held by the Accounts Department may be updated while those held by Sales are not updated. For the customer this may mean that the account arrives but the goods do not.
  • Modifications made to data files may also lead to less obvious discrepancies. For example a suburb name may be spelt differently in two locations e.g. Allambie, Allamby. A report generated calculating sales to customers by suburb may then include the same customers twice. This may not be obvious if the report is a summary style report.

2. Poor data control: File systems have no centralized control of the data descriptions. Tables and field names may be used in different locations to mean different things. For example, the Sales department’s files may list a customer as having a single Name field that is made up of customers Initial and last name e.g. I Smith. The Accounts department may keep the customer’s name in three separate fields; First name, Initial, Last Name. This may make it difficult to compare the data in the two files or at least require additional time in programming the comparison.

3. Inadequate data manipulation capabilities: Data in traditional file systems is not easily related, particularly if the files have been developed for separate purposes. If the organization requires information to be generated that accesses data from several unrelated files the task may prove difficult or require re-entry of data. For example, in a library the catalogue of books may be held in one file. Books on order for the library may be held in another file. When books are received the catalogue will need to be manually updated if the two files are not related.

4. Program data dependence: File data is stored within each of the applications that use that data e.g., A sales transaction program may have several files relevant to it, Customer, Stock_in_hand, Sale_Info. These files are integrated into the program.

5. Limited data sharing: This dependence of the data on the program means that the files are not necessarily suitable for a new program that is being developed. The new program may need its data in another form or require additional data that is not held.

6. Lengthy development times: Each new application requires development of the program along with the development of the relevant files for that application. Although the data may be held elsewhere in the organization the data will need to be imported or re-entered into the new files. This takes time. As organizations grow and change they need to change their internal applications quickly to meet new demands. Lengthy development times are a disadvantage.

7. Program maintenance: File maintenance can be time consuming in traditional file processing systems. Changes to files mean changes to application programs.






What is big data? What tools would an organization use to manage big data?

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

While the term “big data” is relatively new, the act of gathering and storing large amounts of information for eventual analysis is ages old. The concept gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three Vs:

Volume: Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased the burden.

Velocity: Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.

Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.

The importance of big data doesn’t revolve around how much data you have, but what you do with it. You can take data from any source and analyze it to find answers that enable: 1) cost reductions, 2) time reductions, 3) new product development and optimized offerings, and 4) smart decision making.

How Big Data Analytics work: Companies start by identifying significant business opportunities that may be enhanced by superior data and then determine whether Big Data Analytics solutions are needed. If they are, the business will need to develop the hardware, software and talent required to capitalize on Big Data Analytics. That often requires the addition of data scientists who are skilled in asking the right questions, identifying cost-effective information sources, finding true patterns of causality and translating analytic insights into actionable business information.

To apply Big Data Analytics, companies should:

  • Select a pilot (a business unit or functional group) with meaningful opportunities to capitalize on Big Data Analytics
  • Establish a leadership group and team of data scientists with the skills and resources necessary to drive the effort successfully
  • Identify specific decisions and actions that can be improved
  • Determine the most appropriate hardware and software solutions for the targeted decisions
  • Decide whether to purchase or rent the system
  • Establish guiding principles such as data privacy and security policies
  • Test, learn, share and refine
  • Develop repeatable models and expand applications to additional business areas

Companies use Big Data Analytics to:

  • Improve internal processes, such as risk management, Customer Relationship Management, supply chain logistics or Web content optimization
  • Improve existing products and services
  • Develop new product and service offerings
  • Better target their offerings to their customers
  • Transform the overall business model to capitalize on real-time information and feedback

One of the tools for Managing Big Data is Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Apache Hadoop: An open-source software that allows to store and process large amounts of data across clusters of computers, systems and files, Apache™ Hadoop® provides the tools for extracting intelligence from data through analysis and visualization.