Big data and analytics are taking center stage in many smart city projects all over the world. A smart city, after all, is about using information and communication technology to better manage a city’s assets and infrastructure. It is also about leveraging technology to improve the quality of life for citizens, such as improving individual mobility, enhancing connectivity, or improving the environment.
However, many government or public sector organizations that are building smart cities of tomorrow are still not able to fully take advantage of their data today. This can be due to a number of reasons, such as limitations of their existing IT infrastructure or constraints from working within a traditionally designed government setup where information is stored in silos.
Public sector organizations like these need an enterprise data hub (EDH) solution that can help them manage and operationalize big data across departments and within agencies, supporting both current and future needs. An improved data management architecture, like an EDH built with and powered by Apache Hadoop, will be able to help lead governments in a direction where big data can be securely shared, processed, and analyzed.
When implementing an EDH, public sector organizations have a couple of key concerns. First, they need to comply with stringent regulatory mandates, and second, they need to ensure data security.
The open source Hadoop platform makes perfect sense for public sector organizations considering an EDH because it is flexible, scalable and secure.
The following are five most important questions that public sector organizations need to ask before implementing an EDH with Hadoop. These essential questions can help agencies take stock of their readiness and build out enterprise data solutions.
1) How do I prepare a secure foundation for Hadoop?
Government agencies — including the defense, financial and healthcare sectors — are seeing a critical advantage in analyzing and using data, especially when it comes to accessing large, historical data sets.
Public sector IT leaders may have concerns about Hadoop because it is an open source software solution. That is, Hadoop is freely available and developers are free to study, change and distribute the software framework. The spirit behind open source means that Hadoop is developed in a collaborative, public manner. This also means that Hadoop embraces innovation, which can sometime translate into updated and new components that are released every month, if not every day.
Nevertheless, while the technology may evolve and change quickly, many of the tried and true security rules and best practices for hardening the system foundations can control these changes to maintain effective security and governance. Even more important is that this can be done without hampering innovation.
To ensure that data security is observed, public sector organizations that adopt Hadoop need to apply well known security measures to the underlying infrastructure and systems. For example:
Turning off services that are not required
Restricting access to users
Limiting super-user permissions
Locking down network ports and protocols
Enabling audit, logging and monitoring
Applying all the latest OS security patches
Using centralized corporate authentication
Enabling encryption in-transit
2) What type of perimeter security is in place?
With a Hadoop cluster installed on a secure platform, the next questions to address revolves around the perimeter security: who can access the Hadoop cluster, from where, and how are users authenticated?
Perimeter security restricts users by requiring entry through a secure gateway over secured networks and with approved credentials. Just as agencies need multiple data sources and multiple frameworks to truly instill a data-driven workflow within their organizations, government leaders also need a secure enough network system that is agile and can handle a variety of workforce needs.
3) What security regulations must I meet?
There are two kinds of interests when it comes to compliance: those that have to be compliant, and those that want to follow compliance guidelines.
For those that must be compliant, they are usually operating under a mandate such as FISMA, which establishes the compliance and regulations required, including data encryption. Data encryption is the safety lock to the most sensitive data an organization has access to.
As for those that are following compliance guidelines, they typically do so to establish differentiation, mitigate risks, and promote a culture and mindset of security.
However, public sector organizations need to keep in mind that compliance is not just about the technology. It is also about the people and processes. Organizations first need have a security culture in place before. For instance, users need to consistently adhere to simple security guidelines like encrypting sensitive data and locking devices with secure passwords.
4) Who are the ‘need to know’ users on the Hadoop platform?
It is important for a public sector organization to only share data on a need-to-know basis internally. This is, however, where many public sector agencies struggle the most. There are sub-groups and divisions built into larger agencies, and with increased organizational complexity comes increased difficulty in monitoring and accessing data.
The power to bring data together, like that of a Hadoop-powered EDH, also comes with a challenge: who are the ‘need-to-know’ users within a larger organization that require access to critical data?
Solutions like Apache Sentry that enable role-based access controls to fine grain data sets may be useful here. Users are defined by ‘need-to-know’ roles rather than organizational structures. Essentially, Sentry is the critical, central authorization framework that gives the Hadoop platform the ability to store sensitive data, while providing secure access to the agency’s ‘need-to-knows.’
5) How do I monitor and audit Hadoop security after it goes-live?
Auditing the Hadoop platform is the final key piece of an effective and secure data practice. Auditing allows planners to know how users have been using the platform and address any anomalies that may be suspicious. Tools that inspect the Hadoop logs and enable predictive tracking can help them detect bad behaviors and target them before they become a bigger threat.
Simply running a data audit and indexing data can help identify new data and security permissions and policies that need to be applied. It is about gaining that visibility into data usage and the routes taken. Public sector teams need this visibility to know where the data is coming from, and how it is being used all the time without exceptions.
Safeguarding data in a smart city
The current data explosion will continue and public sector organizations will have to manage more and more data. This is especially so in a smart city environment where citizens and the government are more connected than ever before, and where data powers everything from the public transport system to the water and waste management system.
A Hadoop based EDH, offering flexibility, scalability and security, allows public sector organizations to be future ready today. IT leaders in the public sector need to know that it is possible to ensure compliance and security in Hadoop. Moreover, the continued innovation in the platform will also allow public sector organizations to strengthen that security over time.