DATA PERSISTENCE
DATA PERSISTENCE
DATA, FILES, DATABASES, AND DBMSS
Data
- Data are row facts
- Can be processed (by application components) and converted to meaningful information
Data persistence
- Working data is contained in computer memory
- Memory is volatile
- Data should be saved into non-volatile storages for persistence
Data persistence techniques
- Data can be stored in
- Files
- Databases
- Files VS Databases
Data arrangement
- Un-structured
- Semi-structured
- Structured
Database
- Databases are created and managed in database servers
- SQL is used to process databases
- DDL –CRUD databases
- DML –CRUD data in databases
Database type
- Hierarchical databases
- Network databases
- Relational databases
- Non-relational databases (NoSQL)
- Object-oriented databases
- Graph databases
- Document databases
DBMSs
- DBMSs are used to connect to the DB servers and manage the DBs and data in them
- PHPMyAdmin
- MySQL Workbench
Data arrangement
- Data warehouses
- Big Data
- Volume
- Variety
- Velocity
APPLICATION TO FILES/DB
- Files and DBs are external components
- They are existing outside the software system
- Software can connect to the files/DBs to perform CRUD operations on data
- File –File path, URL
- DB –connection string
- To process data in DB
- SQL statements
- Prepared statements
- Callable statements
SQL statements
- Execute standard SQL statements from the application
Statement stmt= con.createStatement(); stmt.executeUpdate(“update STUDENT set NAME =”+ name + “ where ID =”+ id + “)”;
Prepared statements
- The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.
PreparedStatementpstmt= con.prepareStatement("update STUDENT set NAME = ? where ID = ?");
pstmt.setString(1, "MyName");
pstmt.setInt(2, 111);
pstmt.executeUpdate();
Callable statements
- Execute stored procedures
CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?, ?)}");
cstmt.execute();
OBJECT RELATIONAL MAPPING
- There are different structures for holding data at runtime
- Application holds data in objects
- Database uses tables (entities)
- How to map data in objects to the tables?
- Object Relational Mapping (ORM)
Mismatches between relational and object models
- Granularity: Object model has more granularity than relational model.
- Subtypes: Subtypes (means inheritance) are not supported by all types of relational databases.
- Identity: Like object model, relational model does not expose identity while writing equality.
- Associations: Relational models cannot determine multiple relationships while looking into an object domain model.
- Data navigation: Data navigation between objects in an object network is different in both models.
ORM implementations in JAVA
- Java Beans
- JPA
A POJO should not:
- Extend pre-specified classes.
- Implement pre-specified interfaces.
- Contain pre-specified annotations.
Beans
- Beans are special type of Pojos. There are some restrictions on POJO to be a bean.
- All JavaBeans are POJOs but not all POJOs are JavaBeans.
- Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs because Serializable is a marker interface and therefore not of much burden.
- Fields should be private. This is to provide the complete control on fields.
- Fields should have getters or setters or both.
- A no-argconstructor should be there in a bean.
- Fields are accessed only by constructor or getter setters.
Bean to DB
JPA architecture
NOSQL
- Relational DBs are good for structured data
- For semi-structured and un-structured data, some other types of DBs can be used
- Key-value stores
- Document databases
- Wide-column stores
- Graph stores
Benefits of NoSQL
- When compared to relational databases, NoSQL databases aremore scalable and provide superior performance,and their data model addresses several issues that the relational model is not designed to address:
- Large volumes of rapidly changing structured, semi-structured, and unstructured data
NoSQL DB servers
- MongoDB
- Cassandra
- Redis
- Amazon DynamoDB
- Hbase
Hadoop
- The Apache Hadoop software library is a framework that allows for the distributed processing of large data setsacross clusters of computers using simple programming models.
- It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
- Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Hadoop core concepts
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop Map Reduce: A YARN-based system for parallel processing of large data sets.
IR
- Data in the storages should be fetched, converted into information, and produced for proper use
- Information is retrieved via search queries
- Keyword search
- Full-text search
- The output can be
- Text
- Multimedia
- The information retrieval process should be
- Fast/performance
- Scalable
- Efficient
- Reliable/Correct
Comments
Post a Comment