Skip to content

Written by Anonymous 22 January 2014

 

 

 

 

 

 

 

 

Laura Cox: do a data health check


Laura Cox, Chief Financial and Operating Officer at Ringgold kicked off her talk with a focus on what constitutes healthy data. Good quality, reliable and consistent data helps make good decisions. You gain insight into customers and business

relationships as well as support strategic planning, decision making and ongoing

business operations.

 


 

Poor data has real consequences. It is hard to get a true picture of relationships

with institutions, can lead to a lack of quality author (and affiliation)

data and an inability to see overlap between authors,

members and customers. It can drive inaccurate holdings and revenue reports leading to protracted time and effort which can cost your business time and money. Healthy records are complete, accurate, free of duplications, current, consistent and conform with standard identifiers. 

 


What are unique identifiers and how can they help? 

 

They are numeric or alpha-numeric designations which

are associated with a single entity. Entities can be institutions, persons or

pieces of content. They enable the disambiguation of each entity and provide a proper understanding of customer, author,

reader or institution as well as a proper identification of content object,

article, product or package. They can also be used internally or in conjunction

with external partners.

 


Why should we worry about data now? 

 

Cox cited the 2012 STM Report (

Ware, M and Mabe, M. The STM Report, 2012) which stated that the number of researchers and the number of article are both increasing by 3% per

annum. The number of journals is increasing by 3.5% per

annum and growth in China has been in double digits

for over 15 years. At the same time there is increased demand for anytime/anywhere

access while library budgets are frozen or being cut,

less money for more content.

 


 

Institutional Identifiers can be used for disambiguation (e.g. which UCL?), consolidating different versions (many ways of describing the University of Oxford and its institutions). They provide a hierarchy view (institute within an

institution) and reinforce uniqueness. This means you can use them for a gap analysis. 

 


The Kafka-esque 'Identifiers identified' slide

The main challenge is around multiple data sources. There are system data silos, multiple locations - geographic data silos, data entered by different people for

different purposes, data from 3rd parties in the supply

chain and data from bought in sources. These things aren’t integrated. Typical publisher systems include financial, CRM or sales databases, authentication system, fulfilment, usage statistics, submissions systems and so on.

 


 

Cox advised that the first thing to do is think about your data and implement a data governance plan. What data is held, where and how is it accessed? How can it be used to benefit business and work across silos? But always bear in mind where are you now and where do you want to

go?

 


 

Another recommendation was to improve data capture. If you can, use web forms as they minimise variance in data input. Implement required fields, use date validation and at a minimum use naming

conventions. There are a number of tools such as address validation, postcode look-up, institution validation/lookup. Avoid free-text fields and make institutional identifiers a

requirement.

 


 

You can use an institutional identifier as a lynch pin

to link internal systems for better data integration. It can prevent duplicate account creations, help keep data up-to-date and systems

synchronized. It also enables staff to use data more effectively, break down silos, simplify data transmission and provide more insight and power to analyse and

understand the business.

 


 

So what can you do now?

 

  • Engage with the problems
  • Think about resources. Time? Money? Systems?
  • How do you want it to work – look at priorities
  • Have a data governance policy
  • Appoint a data champion and document everything
  • Create some basic rules for data entry
  • Use universal identifiers to clean and link your data
  • Work with suppliers and customers to use institutional identifiers ot strengthen the supply chain.