About
John Vicencio created the SMCoogle (es-em-coo-gle) Search Engine web program in Professor James Geddes Jr's Computer Science 80 (Internet Programming)
2014 class at Santa Monica College.
Concept
SMCoogle Search Engine web application (app) is a program that searches, records a URL address (and content of it) and
identifies search items in a database based on recorded contents from it. SMCoogle demonstrates dynamic web programming (PHP with SQL on Linux Apache MySQL PHP or
LAMP framework) with responsive design (CSS3 with HTML5) as the User Interface (UI) based on the
Model View Controller or MVC programming model.
- User Interface - SMCoogle Search Engine web app shows a search field on the home (the default) page.
A user types and submits a keyword item to show results that are retrieved from the database. The "look-and-feel" of
the UI is assembled so that it looks inviting to search and or index a URL address. When the user uses a mobile device
like a smartphone or a tablet, the UI adjusts and responds to the size of the device. For example, the buttons resize to
larger sizes, font adjusts the size proportionate to the screen size, and images to smaller size to fit the device screen size
all using CSS3/CSS. On the structure of the website using HTML5,
the whole web app is designed with semantic elements like <header>, <footer>, <nav>, and <section>.
These HTML5 elements provide "meaning" to the structure of the web page rather than using the regular HTML4 (or simply HTML)
elements; but not replacing HTML4 as most HTML5 is based on HTML in the first place.
- MVC programming model - the search engine web app concept is based on a programming model where
the data is retrieved from the MySQL database (model). PHP and SQL are used to program and to query requests (controller)
where the user sees this information (view).
- Dynamic programming - rather than simply displaying information, the search engine web app displays it
dynamically since this information that is displayed changes. For example, the search box displays the keyword
that the user searches using PHP. Another example is validation of the Search, Submit and Contact Forms where PHP
validates empty input boxes so that users are prompted to enter a string. In some cases, HTML5's input element attribute
are used to make sure that users have entered a valid type of input like an email adress or so that users
are required to enter them. In one instance, the footer dynamically shows a copyright year where it changes yearly
using JavaScript.
- Responsive design - the "look-and-fell" of the web app is based on a concept where the content
adapts to the screen size such as a smartphone. For example, the image changes using CSS3's media query and viewport
depending on the available pixel sizes. The web app, however, concentrates on the programming aspect rather than
the design of it including the color theory.
Architecture
SMCoogle's construction (its programming) follows this bueprint. It follows the MVC approach
since the scripts used (controller) send/receive the request from the user keyword submission (view) from the database (model).
Search engine web page itself is a dynamic website which changes based on the user keyword submission; it is
a program or a web app that runs on a browser but mostly the same technologies used on many (static) websites (HTML/HTML5, CSS/CSS3) in addition to
some programming languages (JavaScript, PHP) as well as a database on the server.
-
Server and application set-up - SMCoogle web app is hosted on a web server. The server uses
Apache on a Linux operating system. It hosts the MySQL
database. The server-side programming language used is PHP. PHP and MySQL work
really well together where PHP communicates to MySQL database. For creating databases, tables and all other
database objects, an internet programmer, web programmer and or a web (website) developer can either create them directly or through a script.
All the codes and/or files are located on the server. In other words, SMCoogle web app is based on
the LAMP framework. Using any editors, a web developer can work on a project like this web app.
Then transfer all the codes and files over to the web server using an FTP application.
Some internet/web programmer use a local developing environment with local server using a software called WAMP/XAMP/MAMP.
Also, it's very important to enable allow_url_fopen = On from the php.ini since many web hosting services
disables this feature by default due to security risks. Since this web app opens the data needed
from the URL entered, it's imperative that this feature is enabled.
-
Folder and file structures - the root directory (smccs80) has seven documents with three folders
(css, image, and include) so that documents are organized depending on their intended purposes. Files are *.php
for the web pages, *.css for style, and images like *.jpg. A configuration file called php.ini
enables Linux-Apache server to use a particular PHP function. You can upload all these files to some host server on a root directory
which corresponds to a web page address like http://somehost.com.
-
Document structure (HTML and HTML5) - uses HTML5 that starts with <!doctype html> on top of HTML(4)
elements <html>, <head>, <title>, and <body>. To make the document (web page) meaningful, HTML5's
elements are used to make sure that it make sense where the top section, the body section, and the bottom of the document.
- <header> - contains the header name title where you see "SMCoogle Search Engine" as a label as well
as the <nav> section where you see the menu "Home, About, Contact, and Submit."
- <section> - contains any contents in between the <body> element to show the
search box with a button and including the <aside> section which simply shows an image depending on what page
(or not if search results were to be displayed).
- <footer> - the bottom section of the document shows the copyright including the Privacy page
- HTML entities - HTML displays characters and other contents (like images) onto the browser.
Characters such as less than (<) and greater than (>) signs could be
interpreted by the browsers as a HTML element such as using <p>. To show the actual character itself as greater than sign
on the code itself you would enter the HTML entities. To display '<' on the browser, on the code, you would enter
<;. This is useful to show parts of the codes with using
special characters or symbols like equal sign '=' to '&=;.' This is also useful to convert
special characters and averting hack inserts to scripts (see details below about PHP scripts).
-
Style layout (CSS and CSS3) - the styles on colors, sizes, and the way blocks are positioned or display are
controlled by CSS styles.css, linked inside the <head> element located in the css folder.
- Color display - SMCoogle Search Engine web app uses the dark header and nav with grey background
of the body element. Different sections of the document are separated visually to show where the header, menu, content and
footer are.
- Ease of use - the whole document is positioned in the middle with some padding from the top and the bottom
using the container class.
Input boxes are sized so that users can enter a search keyword easily with a big blue button.
- Mobile screen friendly - using CSS3's @media query where styles are sized so that
images, padding and font size are sized to be read on small devices up to 630px or from most modern smartphones to
the tablet.
- Other on CSS3 - hovering over different screen shots that are floated to the right
zooms out using transition to create an animation without Flash.
-
Functionality, logic and control (PHP, SQL and an instance of JavaScript) - PHP is mainly the technology that is used
to program SMCoogle web app that uses SQL to query the database and a bit of a JavaScript to display a dynamic date-year on the footer.
The document structure is separated so that HTML5 structures it in different parts. Pages that are common
among different documents in the root directory are separated to make sure changes are made just one time rather than
changing them separately on each page. This is demonstrated by using PHP that adds the nav.php on the top part of the HTML5
document and footer.php at the bottom of it. Pages or scripts are compartmentalize for easy reuse
and general web app management.
-
Search query on the search page (index.php) - the search section of the web app uses a
GET method on an HTML form elment since the information provided by the user isn't confidential as well
that it creates a useful URL host.com/index.php?keyword=John&submit=Search in case users want to
save the query where the keyword typed is 'John' as an example. The PHP script is in the same page using
$_SERVER['PHP_SELF'] (see form's action attribute from the screen shot). For security purposes, htmlspecialchars() PHP function
is employed to avert savy hackers who can inject sneaky codes that will mess up the web app.
The server-side programming language (PHP) or script checks if the form is empty. It'll remind users to type a keyword search item in case they submitted an empty
input box. Using include("filelocation/here.php") function, the script from main page pulls other scripts
from the include folder.
- Connection and verifying authentication from a host (Linux/Apache) with a MySQL database
(include/dbconnect.php) using $dbconnect = new mysqli($servername, $dbusername, $dbpassword, $dbname)
(authenticates a connection to database using servername for the MySQL host, dbusername and dbpassword for username and password, and dbname for
the database).
- Creating a table in the database on the first time
(include/dbcreatetable.php) by connecting to an assigned create query
(with an SQL expression CREATE TABLE $tablename where the tablename is the variable with the
creating query expression) provided that table variable isn't found (using a query
DESCRIBE TABLE $tablename). Ideally, this applies to creating a database
(CREATE DATABASE $databasename).
-
Finally, once all the database connection and set-up are completed, the search script begins
(include/dbsearchquery.php). On the search script, it makes sure that
the user entered a keyword if not there will be a display that the input box is empty.
If a keyword is entered a search query using a select statement and a wildcard '%'
is used to check the database if the row in pagecontent column matches anything that is
associated with the keyword (SELECT * FROM $tablename WHERE pagecontent LIKE '%$keyword%'").
For example, if the user typed 'John', the query searches for anything that looks like 'John' before and after
the word in a sentence or a paragraph. Then a conditional statement if $queryresult->num_rows > 0
or if a search produces at least one record, it loops through the record
or while($row = $queryresult->fetch_assoc()), counts using substr_count($row[pagecontent], $keyword)
then collect the numbers through accumulation in $querytotalcount += $queryresultcount
and displays them.
-
Data integrity (inserting or sending clean data to and through the server) - users can add
a URL on the 'Submit' page (addurl.php) where the script file inserturl.php. In it, the script checks
if the input box is empty or not. If a keyword is entered, the script cleans it to make sure that there are no funny
characters that could inject malicious code. A cleandata() custom-made PHP function
was made to do this (cleandata.php inside the include folder). Another use of this custom-made PHP function
is to make sure that the Contact page (contact.php in the root directory) have clean input data like name, email and message.
Going back to the Submit page, the script uses a function called file_get_contents("some url"). The input field on
the Submit page assigns the typed keyword on a PHP url variable where it reads the external URL file into a string.
That string is assigned to pagedata variable where any single quotes is replaced with space. The cleandata() function took
care of cleaning the data where any special characters are converted to HTML entities (demonstrated using htmlspecialchars() function
in custom-made cleandata() function). Although validation is another form of measure to handle data
inputs on Contact page, for example, the focus of the web app is the concept of search engine even though it does show
some HTML5 validation using required, pattern (regex) or text type (such as 'url' where HTML5 forces the user to enter only http://url.com type of
string) attributes. Ideally, validations should also not just only be done on both the client-side
(even though HTML5 and JavaScript could be disabled) and server-side (in this case PHP or some other like ASP.NET).
Also, each folders in the directory has index.php so that users will not be able
to look at what files are in these folders.