Invited Talks
- "Data Management Techniques
for Smartphone Networks"
Location: Athens, Greece (10th
Intl. ACM MobiDE'11, Athens, Greece, in conjunction with
SIGMOD/PODS 2011)
Date: June 12th, 2011.
Abstract: Smartphone devices have emerged as powerful
computational
platforms equipped with multitude of sensors that are capable
of
generating vast amounts of data (geo-location, audio, video,
etc.)
Collections of such devices connected to the Internet yield
Smartphone
Networks, which can be utilized for opportunistic and
participatory
sensing applications in intelligent transportation systems,
social
networking applications, city planning and others. The uptake
of
applications in this domain, is currently severely hampered by
the fact
that these devices have: i) a limited energy budget (i.e.,
smartphone
devices still operate on batteries), ii) limited connectivity
(i.e.,
not all regions offer unlimited Internet connectivity at the
same
cost); and iii) high privacy constraints (i.e., these devices
might
reveal the identity and habits of their custodians.) In this
talk, I
will present a collection of data management techniques that
deal with
Smartphone Networks. In particular, I will start out with
SmartTrace, a
powerful framework for finding similar trajectories in a
smartphone
network without disclosing the traces of the participating
users.
SmartTrace relies on an in-situ data storage model, where
geo-location
data is recorded locally on smartphones for both performance
and
data-disclosure reasons. SmartTrace then deploys an efficient
top-K
query-processing algorithm that exploits distributed
trajectory
similarity measures, resilient to spatial and temporal noise,
in order
to derive the most relevant answers quickly and efficiently. I
will
then introduce SmartOpt, a multi-objective query optimizer
that enables
efficient content searches in smartphone networks. I will also
introduce Proximity, a spatial neighborhood computation
framework for
smartphone networks. My talk will be succeeded by the
presentation of
SmartNet, our in-house programming cloud for smartphone
networks.
- "Energy Efficient Data
Management in Smartphone Networks"
Location: Arlington,
Virginia, USA (US National Science Foundation Workshop on
Sustainable Energy Efficient Data Management),
Date: April 2nd, 2011
-
Abstract: Smartphone computational platforms equipped with
multitude of
sensors and capable of generating vast amounts of data
(geo-location,
audio, video, etc.) On the other hand, these devices operate
on a
strict energy budget, thus have a limited lifetime on a single
charge.
Consequently, we need to identify new energy-aware algorithms
and
techniques to provide innovative, feature-rich applications
and
services. In this white paper, we start out by providing
recent trends
in Smartphone technology and Smartphone networks. Our
description is
succeeded by an anatomy of the energy costs associated with
data
processing in a Smartphone Network. We conclude with prominent
research
directions in energy-aware data management for Smartphone
networks.
- "Querying Smartphone
Networks with SmartTrace"
Location: Department
of Computer Science, University of Pittsburgh, Pittsburgh,
PA, USA, March 29th, 2011.
Date: March 29th, 2011.
-
Abstract: Smartphone devices have emerged as powerful
computational
platforms equipped with multitude of sensors that are capable
of
generating vast amounts of data (geo-location, audio, video,
etc.)
Collections of such devices connected to the Internet yield
Smartphone
Networks, which can be utilized for opportunistic and
participatory
sensing applications in intelligent transportation systems,
social
networking applications, city planning and others. The uptake
of
applications in this domain, is currently severely hampered by
the fact
that these devices have: i) a limited energy budget (i.e.,
smartphone
devices still operate on batteries), ii) limited connectivity
(i.e.,
not all regions offer unlimited Internet connectivity at the
same
cost); and iii) high privacy constraints (i.e., these devices
might
reveal the identity and habits of their custodians).
In this talk I will present SmartTrace, a powerful framework
for
finding similar trajectories in a smartphone network, without
disclosing the traces of participating users. Our framework,
coined
SmartTrace, quickly answers queries of the form: “Report the
users that
move more similar to Q, where Q is some query trace.”
SmartTrace relies
on an in-situ data storage model, where geo-location data is
recorded
locally on smartphones for both performance and
data-disclosure
reasons. SmartTrace then deploys an efficient top-K
query-processing
algorithm that exploits distributed trajectory similarity
measures,
resilient to spatial and temporal noise, in order to derive
the most
relevant answers to Q quickly and efficiently. We assess our
propositions with realistic and real workloads from Microsoft
Research
Asia and other sources. Our study reveals that SmartTrace
computes the
desired results with 74% less energy consumption and 13%
faster than
its centralized and decentralized counterparts. My talk will
be
succeeded by a summary of related research efforts, namely
SmartNet, an
innovative programming cloud for smartphone networks; and
SmartOpt, a
multi-objective query optimizer that enables efficient content
searches
in smartphone networks.
- "Query Routing Trees for
Wireless Sensor Networks"
Location: Nicosia, Cyprus (Open University of Cyprus,
Cyprus)
Date: March 9th, 2011.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor the
physical
world at an extremely high fidelity. In order to collect the
data
generated by these tiny-scale devices, sensors are typically
organized
in structures coined Query Routing Trees (QRTs). Our study
reveals that
predominant data acquisition systems construct QRTs in ad-hoc
manners
leading to a significant waste of energy. In this talk I will
present
MicroPulse+, a framework for minimizing the consumption of
energy
during data collection in Sensor Networks. MicroPulse+
eliminates a
variety of data transmission and data reception inefficiencies
using a
collection of in-network algorithms. In particular,
MicroPulse+
introduces: i) the Workload-Aware Routing Tree (WART)
algorithm, which
is established on profiling recent data collection activity
and on
identifying the bottlenecks using an in-network execution of
the
critical path method; and ii) the Energy-driven Tree
Construction (ETC)
algorithm, which balances the workload among nodes and
minimizes data
collisions. The talk will conclude with an outlook into
current and
future research work.
-
Location: Athens, Greece (University of Athens, Greece)
Date: July 23rd, 2010.
Abstract:
In this talk I will present a family of algorithms for Top-k
ranking of
query results in a distributed environment. A Top-K query
focuses on
the subset of most relevant answers for two reasons: i) to
minimize the
cost metric that is associated with the retrieval of all
answers; and
ii) to improve the quality of the answer set such that the
user is not
overwhelmed with irrelevant results. I will start out by
providing an
overview of Top-K query processing algorithms for centralized
and
middleware systems. I will then highlight the limitations of
these
algorithms and focus on three novel algorithms we developed
designated
for networked environments (i.e., Peer-to-Peer Networks,
Wireless
Sensor Networks and Smartphone Networks). I will also present
evaluation studies of these algorithms on: i) a Wireless
Sensor Network
testbed of 54 sensor devices; ii) a Peer-to-Peer testbed of
1000 peers
deployed on 75 linux workstations; and iii) A smartphone
network
deployment on Android-based smartphone devices. The talk will
conclude
with an overview of related research problems that I am
currently
working on and an outlook to future work.
-
Panel Chair: Anastasia Ailamaki (EPFL, Switzerland)
Location: Ayia Napa, Cyprus (HDMS'10 Symposium)
Date: July 3rd, 2010.
-
Location: Hawthorne, NY, USA (IBM T.J. Watson Research
Center)
Date: May 27th, 2010.
Abstract:
In this talk I present the fundamental concepts behind
distributed
Top-K query processing algorithms. A Top-K query focuses on
the subset
of most relevant answers for two reasons: i) to minimize the
cost
metric that is associated with the retrieval of all answers;
and ii) to
improve the quality of the answer set such that the user is
not
overwhelmed with irrelevant results. I will start out by
providing an
overview of state-of-the-art Top-K query processing algorithms
for
centralized and middleware systems. I will then highlight the
limitations of these algorithms and focus on two novel
algorithms we
developed designated for networked environments (i.e.,
Wireless Sensor
Networks, Peer-to-Peer Networks, Vehicular Networks, etc.) I
will also
present evaluation studies conducted on: i) a Peer-to-Peer
testbed of
1000 peers deployed on 75 workstations; ii) a Wireless Sensor
Network
testbed of 54 sensor devices and iii) A Smartphone Network,
deployed on
a number of Android-based smartphone devices. The talk will
conclude
with an overview of related research problems that I am
currently
working on and an outlook to future applications of the
presented
ideas.
-
Location: Kansas City, MO, USA (SSPC-WAN Workshop, 11th
International Conference on Mobile Data Management (MDM
2010)
)
Date: May 23rd, 2010.
Abstract:
In this presentation I will present a powerful and distributed
spatio-temporal query processing framework, coined HUB-K.
Our framework can be utilized to promptly answer queries of
the form:
``Report the objects (i.e., trajectories) that follow a
similar spatio-temporal
motion to Q, where Q is some query trajectory.''
HUB-k, relies on an in-situ data storage model, where
spatio-temporal data remains
on the smartphone that generated the given data, as well a
state-of-the-art
top-k query processing algorithms, which exploit distributed
trajectory similarity measures in order to identify the
correct answers promptly.
We present preliminary design choices, an outline of our
preliminary implementation
and an outlook to future challenges.
-
Location: Dagstuhl, Germany (Seminar 10042: Semantic
Challenges in Sensor Networks)
Date: January 26th, 2010.
Abstract:
The widespread deployment of mobile phones along with the
massive
production of sensors for every aspect of modern life provides
evidence
that Computer Science research and education will evolve
dramatically
over the next few years. The boundaries of Mobile Devices and
Sensor
Devices are nowadays blurring as the former devices are
already
equipped with a multitude of sensing capabilities, including
GPS (which
enables the derivation of geospatial coordinates),
accelerometers
(which enable the derivation of orientation, vibration and
shock) and
an exciting set of other sensors (e.g., proximity sensors,
ambient
light sensors, while more traditional sensors such as
temperature,
acoustic, magnetometers and others will be integrated in these
devices
very soon). That creates the notion of Mobile Sensor Devices
that will
become even more ubiquitous than their predecessor
"smart-phone"
devices.
In this talk, I will provide an overview and definitions of
Mobile-Sensor-Network (MSN) related platforms and
applications. In
particular, I will show how applications in environmental
monitoring,
body sensor networks, vehicular sensor networks and
intelligent
transportation systems have brought a dramatic shift on how
spatio-temporal data is nowadays generated. I will then
outline some
semantic challenges that arise in this context including:
vastness,
uncertainty, data integration, query processing and privacy. I
will
also address some more general challenges that currently
hinder the
evolution and uptake of semantic MSNs.
-
Location: Barcelona, Spain (DAMA Group, Polytechnic
University of Catalonia (UPC))
Date: December 15th 2008.
Abstract: In this talk I will present the fundamental concepts
of
distributed Top-K query processing algorithms. A Top-K query
focuses on
a subset of most relevant answers for two reasons: i) to
minimize the
cost metric that is associated with the retrieval of all
answers; and
ii) to improve the quality of the answer set such that the
user is not
overwhelmed with irrelevant results. I will start out by
providing an
overview of state-of-the-art Top-K query processing algorithms
for
centralized DBMS systems. I will then highlight the
limitations of
these algorithms and focus on the Threshold Join Algorithm
(TJA), our
distributed top-k query processing algorithm designated for
distributed
computing networks (i.e., Wireless Sensor Networks,
Peer-to-Peer
Networks, Vehicular Networks, etc.) I will finally present an
evaluation study conducted with our middleware system deployed
over a
network of 1000 peers on 75 workstations.
-
Location: Zurich, Switzerland (IBM Research, Zurich)
Date: December 12th 2008.
Abstract:
Wireless Sensor Networks offer a non-intrusive and
non-disruptive
technology that enables users to monitor the physical world at
an
extremely high fidelity. Research in this area has to this day
primarily focused on the trade-off between local computation
and
communication in order to minimize the transfer of data over
the
fundamentally expensive wireless link. On the contrary, we
focus on the
challenges of storing sensor readings locally at each node.
This
In-Situ storage paradigm offers a novel perspective for
conserving
energy in Wireless Sensor Networks as the communication
channel is only
accessed for answering on-demand queries rather than for
percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be
complemented by
efficient access methods that will speed up the execution of
queries
when required.
In this talk I will present MicroHash, an external memory
index
structure that is tailored to the distinct characteristics of
the most
prevalent type of non-volatile memory used in sensor systems,
namely
flash memory. MicroHash exploits the asymmetric read/write
characteristics of flash memory in order to offer high
performance
indexing and searching capabilities in the presence of energy
and
storage media lifetime constraints.
-
Location: Zurich, Switzerland (ETH Zurich, Communication
Systems Group (CSG))
Date: December 12th 2008.
Abstract: In this talk I will present the fundamental concepts
of
distributed Top-K query processing algorithms. A Top-K query
focuses on
a subset of most relevant answers for two reasons: i) to
minimize the
cost metric that is associated with the retrieval of all
answers; and
ii) to improve the quality of the answer set such that the
user is not
overwhelmed with irrelevant results. I will start out by
providing an
overview of state-of-the-art Top-K query processing algorithms
for
centralized DBMS systems. I will then highlight the
limitations of
these algorithms and focus on the Threshold Join Algorithm
(TJA), our
distributed top-k query processing algorithm designated for
distributed
computing networks (i.e., Wireless Sensor Networks,
Peer-to-Peer
Networks, Vehicular Networks, etc.) I will finally present an
evaluation study conducted with our middleware system deployed
over a
network of 1000 peers on 75 workstations.
-
Location: Beijing, China (The 9th International Conference
on Mobile Data Management (MDM'08))
Date: April 27-30, 2008.
Abstract: Wireless Sensor Networks create an innovative
technology that
enables users to monitor and study the physical world at an
extremely
high resolution. Query processing in such ad-hoc environments
is a
challenging task due to the complexities imposed by the
inherent energy
and communication constraints. To this end, the research
community has
proposed to take into account user-defined parameters in order
to
derive the K most relevant (or Top-K) answers quickly and
efficiently.
A Top-K query returns the subset of most relevant answers, in
place of
all answers, for two reasons: i) to minimize the cost metric
that is
associated with the retrieval of all answers; and ii) to
improve the
recall and the precision of the answer set, such that the user
is not
overwhelmed with irrelevant results.
This tutorial presents the fundamental concepts behind
distributed
Top-K query processing and the adaptations of these algorithms
to
distributed and wireless sensor networks. It additionally
provides a
gentle overview of rudimentary and advanced techniques
covering a
significant body of research in this domain. The tutorial will
start
out with an overview of the most influential centralized and
middleware
Top-K query processing algorithms and then proceed with an
elaborate
description of distributed Top-K ranking algorithms for
One-time Top-K
Queries, Continuous Top-K Queries and Approximate Top-K
Queries.
Finally, it will provide an outlook to compelling future
applications
that can be constructed on the foundation of these algorithms.
Although
the tutorial is specifically geared towards Wireless Sensor
Networks,
many of the presented ideas find extensions in other mobile
environments such as Adhoc Networks, Vehicular Networks and
the Mobile
Web.
-
Location: Cambridge, UK (Microsoft Research Cambridge,
Systems and Networking Group)
Date: January 11th 2008.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor the
physical
world at an extremely high fidelity. Research in this area has
to this
day primarily focused on the trade-off between local
computation and
communication in order to minimize the transfer of data over
the
fundamentally expensive wireless link. On the contrary, we
focus on the
challenges of storing sensor readings locally at each node.
This
In-Situ storage paradigm offers a novel perspective for
conserving
energy in Wireless Sensor Networks as the communication
channel is only
accessed for answering on-demand queries rather than for
percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be
complemented by
efficient access methods that will speed up the execution of
queries
when required. In this talk I will present MicroHash, an
external
memory index structure that is tailored to the distinct
characteristics
of the most prevalent type of non-volatile memory used in
sensor
systems, namely flash memory. MicroHash exploits the
asymmetric
read/write characteristics of flash memory in order to offer
high
performance indexing and searching capabilities in the
presence of
energy and storage media lifetime constraints.
-
Location:
Stockholm, Sweden (Department of Electronic, Computer and
Software
Systems (ECS), KTH - Royal Institute of Technology).
Date: December 28th, 2006.
Abstract: The emerging Peer-to-Peer (P2P) model has become a
very powerful and attractive paradigm for developing
Internet-scale services for sharing resources, including files
and documents. The distributed nature of these systems, where
nodes are typically located across different networks and
domains, inherently hinders the efficient retrieval of
information. In this talk I will present techniques to perform
content-based search over data repositories that are
geographically scattered over peers of different networks.
Data repositories in this context contain documents of text,
audio, video or other semi-structured data and the task is to
locate a certain set of keywords or multimedia features. We
present the components of the pFusion architecture, an open
source system that builds on work in unstructured P2P systems
and topologically-aware overlay construction techniques. Our
empirical results using datasets from AKAMAI, NLANR and TREC,
show that the architecture we propose is both efficient and
practical. In this talk I will also overview other related
research activities in Grid, P2P and Sensor systems that we
are currently involved in.
-
Location: Crete, Greece (Institute of Computer Science
(ICS) of the Foundation for Research and Technology � Hellas
(FORTH)).
Date: June 8th, 2006.
Abstract: Emerging applications in Sensor and Peer-to-Peer
networks make the concept of data integration without
centralization nowadays more meaningful than ever. In these
environments, data is generated continuously and potentially
automatically across geographically diverse locations.
Organizing data in centralized repositories is becoming
prohibitively expensive and in many occasions impractical.
Storing data in-situ however, complicates query processing
because data relations are fragmented over a number of remote
sites. Furthermore, accessing these fragmented relations is
only feasible by traversing a network of other nodes. This
makes the execution of a query an even more complex task. We
claim that in many occasions it might more beneficial to find
the K highest ranked (or Top-K) answers, for some user defined
parameter K, if this can minimize the query execution cost.
In this talk, I will present techniques to efficiently answer
Top-K queries in a distributed environment. A Top-K query
returns the K highest ranked answers to a user defined
similarity function. At the same time it also minimizes some
cost metric, such as the utilization of the communication
medium, which is associated with the retrieval of the desired
answer set. I will provide an overview of state-of-the-art
algorithms that solve the Top-K problem in a centralized
setting and show why these are not applicable to the
distributed case. I will then focus on the Threshold Join
Algorithm (TJA), which is a novel solution for executing Top-K
queries in a distributed environment. I will also present
results from our performance study with a real middleware
testbed deployed over a network of 75 workstations.
Talks in English
- "Disclosure-free
GPS Trace Search in Smartphone Networks"
Location: Lulea, Sweden (IEEE MDM'11)
Date: June 7th, 2011.
Abstract: In
this paper we present a powerful distributed framework for
finding
similar trajectories in a smartphone network, without
disclosing the
traces of participating users. Our framework, coined
SmartTrace,
exploits opportunistic and participatory sensing in order to
quickly
answer queries of the form: "Report the users that move more
similar to
Q, where Q is some query trace". SmartTrace, relies on an
in-situ data
storage model, where geo-location data is recorded
locally on
smartphones for both performance and data-disclosure reasons.
SmartTrace then deploys an efficient top-K query processing
algorithm
that exploits distributed trajectory similarity measures,
resilient to
spatial and temporal noise, in order to derive the most
relevant
answers to Q quickly and efficiently. We assess our ideas with
realistic and real workloads from Microsoft Research Asia and
other
sources. Our study reveals that SmartTrace computes the
desired results
with 74% less energy consumption and 13% faster than its
centralized
and decentralized counterparts. Our experimental results also
confirm
our analytical study.
- "Multi-Objective
Query Optimization in Smartphone Social Networks"
Location: Lulea, Sweden (IEEE MDM'11)
Date: June 7th, 2011.
Abstract: The
bulk of social network applications for smartphones (e.g.,
Twitter,
Facebook, Foursquare, etc.) currently rely on centralized or
cloud-like
architectures in order to carry out their data sharing and
searching
tasks. Unfortunately, the given model introduces both
data-disclosure
concerns (e.g., disclosing all captured media to a central
entity) and
performance concerns (e.g., consuming precious smartphone
battery and
bandwidth during content uploads). In this paper, we present a
novel
framework, coined SmartOpt, for searching objects (e.g.,
images,
videos, etc.) captured by the users in a mobile social
community. Our
framework, is founded on an in-situ data storage model, where
captured
objects remain local on their owner's smartphones and searches
then
take place over a novel lookup structure we compute
dynamically, coined
the Multi-Objective Query Routing Tree (MO-QRT). Our structure
concurrently optimizes several conflicting objectives (i.e.,
it
minimizes energy consumption, minimizes search delay and
maximizes
query recall), using a Multi-objective Evolutionary Algorithm
based on
Decomposition (MOEA/D) that calculates a diverse set of high
quality
non-dominated solutions in a single run. We assess our ideas
with
mobility patterns derived by Microsoft's Geolife project and
social
patterns derived by DBLP. Our study reveals that SmartOpt can
yield
query recall rates of 95%, with one order of magnitude less
time and
two orders of magnitude less energy than its competitors.
- "Query
Routing Trees for Wireless Sensor Networks"
Location: Nicosia, Cyprus (EPL 671 - Computer Science:
Research and Technology)
Date: February 15th, 2011.
Abstract: Wireless Sensor Networks offer a
non-intrusive and
non-disruptive technology that enables users to monitor the
physical
world at an extremely high fidelity. In order to collect the
data
generated by these tiny-scale devices, sensors are typically
organized
in structures coined Query Routing Trees (QRTs). Our study
reveals that
predominant data acquisition systems construct QRTs in ad-hoc
manners
leading to a significant waste of energy. In this talk I will
present
MicroPulse+, a framework for minimizing the consumption of
energy
during data collection in Sensor Networks. MicroPulse+
eliminates a
variety of data transmission and data reception inefficiencies
using a
collection of in-network algorithms. In particular,
MicroPulse+
introduces: i) the Workload-Aware Routing Tree (WART)
algorithm, which
is established on profiling recent data collection activity
and on
identifying the bottlenecks using an in-network execution of
the
critical path method; and ii) the Energy-driven Tree
Construction (ETC)
algorithm, which balances the workload among nodes and
minimizes data
collisions. The talk will conclude with an outlook into
current and
future research work.
-
Location:
Indianapolis, Indiana, USA (The 9th International ACM
Workshop on Data
Engineering for Wireless and Mobile Access (MobiDE'10), in
conjunction
with ACM SIGMOD/PODS�10)
Date: Jun. 6th 2010.
Abstract: We present a
novel distributed
algorithm (MHS) that constructs a query routing tree that
minimizes
collisions during query execution. It was shown in previous
work that
minimizing collisions during query execution saves significant
amount
of energy[1]. In the same paper it is shown that balancing the
node
degrees of a query routing tree significantly reduces
collisions during
query execution.
We address the inefficiencies of the previously proposed
algorithm and
propose a simpler, purely distributed, parameter-free, cheaper
and more
efficient algorithm. Our resulting query trees are optimally
balanced,
guarantee minimum collisions and minimum latency for query
execution
and allow for opportunistic in-network processing. MHS poses
the
minimum possible communication overhead to the network and is
parameter-free as opposed to previously proposed algorithms.
Our
proposed algorithm can be used for acquiring data from the
nodes of any
distributed systems where the main objective is to minimize
the
communication cost.
-
Location:
Kansas City, USA (The 11th International Conference on
Mobile Data
Management (MDM 2010), to be held in the city of Kansas
City, Missouri,
from May 23rd, 2010 to May 26th, 2010.
Date: May 25th 2010.
-
Location:
Lyon, France (The 6th Intl. Workshop on Data Management for
Sensor
Networks (DMSN�09), in conjunction with VLDB�09, Lyon,
France, 2009)
Date: Aug 24th 2009.
Abstract: In long-term deployments of
Wireless Sensor Networks, it is often more efficient to store
sensor
readings locally at each device and transmit those readings to
the user
only when requested (i.e., in response to a user query). Many
of the
techniques that collect information from a sensor network
require that
the data is sorted on some attribute (e.g., range queries,
top-k
queries, join queries, etc.) Yet, the underlying storage
medium of
these devices (i.e., Flash media) presents some unique
characteristics
which renders traditional disk-based sorting algorithms
inefficient in
this context. In this paper we devise the FSort algorithm, an
efficient
external sorting algorithm for flash-based sensor devices with
a small
memory footprint. FSort minimizes the expensive write/delete
operations
of flash memory minimizing in that way the consumption of
energy. In
particular, FSort uses a top-down replacement selection
algorithm in
order to produce sorted runs on flash media in a log-based
manner.
Sorted runs are then recursively merged in order to yield the
sorted
result. Our experimentation with real traces from Intel
Research
Berkeley show that FSort greatly outperforms the traditional
External
Mergesort Algorithm both in regards to time and energy
consumption. We
found similar advantages in regards to the wearability
constraints of
flash media.
-
Location: Taipei, Taiwan (The 10th International
Conference on Mobile Data Management (MDM'09))
Date: May 20th 2009.
Abstract: This paper assumes a set of n mobile sensors that
move in the
Euclidean plane as a swarm. Our objectives are to explore a
given
geographic region by detecting spatio-temporal events of
interest and
to store these events in the network until the user requests
them. Such
a setting finds applications in mobile environments where the
user
(i.e., the sink) is infrequently within communication range
from the
field deployment. Our framework, coined SenseSwarm,
dynamically
partitions the sensing devices into perimeter and core nodes.
Data
acquisition is scheduled at the perimeter, in order to
minimize energy
consumption, while storage and replication takes place at the
core
nodes which are physically and logically shielded to threats
and
obstacles. To efficiently identify the nodes laying on the
perimeter of
the swarm we devise the Perimeter Algorithm (PA), an efficient
distributed algorithm with a low communication complexity. For
storage
and fault-tolerance we devise the Data Replication Algorithm
(DRA), a
voting-based replication scheme that enables the exact
retrieval of
events from the network in cases of failures. Our trace-driven
experimentation shows that our framework can offer significant
energy
reductions while maintaining high data availability rates. In
particular, we found that when failures are less than 60\%
failure then
we can recover over 80\% of generated events exactly.
-
Location: Taipei, Taiwan (SenTIE'09 workshop - collocated
with MDM'09)
Date: May 20th 2009.
Abstract: Continuous queries in Wireless Sensor Networks
(WSNs) are
founded on the premise of Query Routing Tree structures
(denoted as T
), which provide sensors with a path to the querying node.
Predominant
data acquisition systems for WSNs construct such structures in
an
ad-hoc manner and therefore there is no guarantee that a given
query
workload will be distributed equally among all sensors. That
leads to
data collisions which represent a major source of energy
waste. In this
paper we present the Energy-driven Tree Construction (ETC)
algorithm,
which balances the workload among nodes and minimizes data
collisions,
thus reducing energy consumption, during data acquisition in
WSNs. We
show through real micro-benchmarks on the CC2420 radio chip
and
trace-driven experimentation with real datasets from Intel
Research and
UC-Berkeley that ETC can provide significant energy reductions
under a
variety of conditions prolonging the longevity of a wireless
sensor
network.
-
Location: HPCL, Department of Computer Science, University
of Cyprus
Date: February 14th 2008.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor the
physical
world at an extremely high fidelity. Research in this area has
to this
day primarily focused on the trade-off between local
computation and
communication in order to minimize the transfer of data over
the
fundamentally expensive wireless link. On the contrary, we
focus on the
challenges of storing sensor readings locally at each node.
This
In-Situ storage paradigm offers a novel perspective for
conserving
energy in Wireless Sensor Networks as the communication
channel is only
accessed for answering on-demand queries rather than for
percolating
each and every event to a centralized database. Storing large
quantities of data locally at each sensor has to be
complemented by
efficient access methods that will speed up the execution of
queries
when required. In this talk I will provide an overview of
recent
developments in Wireless Sensor Network Technology and
highlight some
important data indexing and searching challenges that arise in
this
context. In particular, I will present MicroHash which is an
external
memory index structure that is tailored to the distinct
characteristics
of flash memory, the most prevalent type of non-volatile
memory used in
sensor systems.
-
Location: Hilton, Nicosia, Cyprus
Date: January 21st 2008.
Abstract: ICGrid (Intensive Care Grid) is a distributed
platform that
enables the seamless integration, correlation and retrieval of
clinically interesting episodes across Intensive Care Units,
which is
currently under development by our group. Such a task requires
huge
processing and data storage capabilities, which are common
attributes
of Grid infrastructures. ICGrid is based on a hybrid
architecture that
combines i) a heterogeneous set of monitors that sense the
inpatients
and ii) Grid technology that enables the storage, processing
and
information sharing task between Intensive Care Units.
-
Location: Paris, France (Coregrid Network of Excellence)
Date: January 15th 2008.
Abstract: The objective of Grid computing is to make
processing power
as accessible and easy to use as electricity and water. The
last decade
has seen an unprecedented growth in Grid infrastructures which
nowadays
enables large-scale deployment of applications in the
scientific
computation domain. One of the main challenges in realizing
the full
potential of Grids is to make these systems {\em dependable}.
In this
presentation we present {\em FailRank}, a novel framework for
integrating and ranking information sources that characterize
failures
in a grid system. After the failing sites have been ranked,
these can
be eliminated from the job scheduling resource pool yielding
in that
way a more predictable and dependable infrastructure. We also
present
the tools we developed towards evaluating the FailRank
framework. In
particular, we present the {\em FailBase Repository} which is
a 38GB
corpus of state information that characterizes the EGEE Grid
for one
month in 2007. Such a corpus paves the way for the community
to
systematically uncover new, previously unknown patterns and
rules
between the multitude of parameters that can contribute to
failures in
a Grid environment.
-
Location: Vienna, Austria (4th Intl. Workshop on Data
Management for Sensor Networks DMSN'07 (with VLDB'07))
Date: Sep. 24th, 2007.
Abstract: This paper assumes a set of $n$ mobile sensors that
move in the Euclidean plane as a swarm. Our objectives are to
explore a given geographic region by detecting and aggregating
spatio-temporal events of interest and to store these events
in the network until the user requests them. Such a setting
finds applications in environments where the user (i.e., the
sink) is infrequently within communication range from the
field deployment. Our framework, coined SenseSwarm,
dynamically partitions the sensing devices into perimeter and
core nodes. Data acquisition is scheduled at the perimeter in
order to minimize energy consumption while storage and
replication takes place at the core nodes which are physically
and logically shielded to threats and obstacles. To
efficiently identify the perimeter of the swarm we devise the
Perimeter Algorithm (PA), an efficient distributed algorithm
with a message complexity of O(p + n), where p denotes the
number of nodes on the perimeter and $n$ the overall number of
nodes. For storage and replication we devise a spatio-temporal
in-network aggregation scheme based on minimum bounding
rectangles and minimum bounding cuboids. Our trace-driven
experimentation shows that our framework can offer significant
energy reductions while maintaining high data availability
rates.
-
Location: Nicosia, Cyprus (Cyprus Summer School on
Intelligent Systems)
Date: July 4th 2007.
Abstract: In this talk I will introduce the distributed
spatio-temporal similarity search problem: given a query
trajectory Q, we want to find the trajectories that follow a
motion similar to Q, when each of the target trajectories is
segmented across a number of distributed nodes. We propose two
novel algorithms, UB-K and UBLB-K, which combine local
computations of lower and upper bounds on the matching between
the distributed subsequences and Q. Such an operation
generates the desired result without pulling together all the
distributed subsequences over the fundamentally expensive
communication medium. Our solutions find applications in a
wide array of domains, such as cellular networks, wildlife
monitoring and video surveillance. Our experimental evaluation
using realistic data demonstrates that our framework is both
efficient and robust to a variety of conditions. In this talk,
I will also present techniques to efficiently answer Top-K
queries in a distributed environment. A Top-K query returns
the K highest ranked answers to a user defined similarity
function. At the same time it also minimizes some cost metric,
such as the utilization of the communication medium, which is
associated with the retrieval of the desired answer set. I
will provide an overview of state-of-the-art algorithms that
solve the Top-K problem in a centralized setting and show why
these are not applicable to the distributed case. I will then
focus on the Threshold Join Algorithm (TJA), which is a novel
solution for executing Top-K queries in a distributed
environment. I will also present results from our performance
study with a real middleware testbed deployed over a network
of 75 workstations.
-
Location:
Crete, Greece (CoreGRID Workshop on Grid Programming Model
Grid and P2P
Systems Architecture Grid Systems, Tools and Environments)
Date: June 12th 2007.
Abstract: The objective of Grid
computing is to make processing power as accessible and easy
to use as
electricity and water. The last decade has seen an
unprecedented growth
in Grid infrastructures which nowadays enables large-scale
deployment
of applications in the scientific computation domain. One of
the main
challenges in realizing the full potential of Grids is to make
these
systems {\em dependable}. In this paper we present {\em
FailRank}, a
novel framework for integrating and ranking information
sources that
characterize failures in a grid system. After the failing
sites have
been ranked, these can be eliminated from the job scheduling
resource
pool yielding in that way a more predictable and dependable
infrastructure. We also present the tools we developed towards
evaluating the FailRank framework. In particular, we present
the {\em
FailBase Repository} which is a 38GB corpus of state
information that
characterizes the EGEE Grid for one month in 2007. Such a
corpus paves
the way for the community to systematically uncover new,
previously
unknown patterns and rules between the multitude of parameters
that can
contribute to failures in a Grid environment.
-
Location:
Mannheim, Germany (IEEE First International Workshop on Data
Intensive
Sensor Networks 2007, in conjunction with MDM 2007))
Date: May 11th 2007.
Abstract: In this paper we present MicroPulse, a novel
framework for adapting the waking window of a sensing device S
based on the data workload incurred by a query Q. Assuming a
typical tree-based aggregation scenario, the waking window is
defined as the time interval t during which S enables its
transceiver in order to collect the results from its children.
Minimizing the length of t enables S to conserve energy that
can be used to prolong the longevity of the network and hence
the quality of results. Our method is established on profiling
recent data acquisition activity and on identifying the
bottlenecks using an in-network execution of the Critical Path
Method. We show through trace-driven experimentation with a
real dataset that MicroPulse can reduce the energy cost of the
waking window by three orders of magnitude.
-
Location: Mannheim, Germany (The 8th International
Conference on Mobile Data Management (MDM'07))
Date: May 10th 2007.
Abstract: In this paper we introduce MINT (Materialized
In-Network Top-k) Views, a novel framework for optimizing the
execution of continuous monitoring queries in sensor networks.
A typical materialized view V maintains the complete results
of a query Q in order to minimize the cost of future query
executions. In a sensor network context, maintaining
consistency between V and the underlying and distributed base
relation R is very expensive in terms of communication. Thus,
our approach focuses on a subset V' (\subseteq V) that unveils
only the k highest-ranked answers at the sink for some user
defined parameter k. We additionally provide an elaborate
description of energy-conscious algorithms for constructing,
pruning and maintaining such recursively-defined in-network
views. Our trace-driven experimentation with real datasets
show that MINT offers significant energy reductions compared
to other predominant data acquisition models.
-
Location: Nicosia, Cyprus (EPL 671 - Computer Science:
Research and Technology)
Date: March 20th, 2007.
Abstract: In this talk, I will present techniques to
efficiently answer Top-K queries in a distributed environment.
A Top-K query returns the K highest ranked answers to a user
defined similarity function. At the same time it also
minimizes some cost metric, such as the utilization of the
communication medium, which is associated with the retrieval
of the desired answer set. I will provide an overview of
state-of-the-art algorithms that solve the Top-K problem in a
centralized setting and show why these are not applicable to
the distributed case. I will then focus on the Threshold Join
Algorithm (TJA), which is a novel solution for executing Top-K
queries in a distributed environment. I will also present
results from our performance study with a real middleware
testbed deployed over a network of 75 workstations.
-
Location: Nicosia, Cyprus (EPL651 - Data Management for
Mobile Computing, Department of Computer Science (UCY)).
Date: April 26th 2007.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor and
understand
the physical world at an extremely high fidelity. Research to
this day
has primarily focused on the trade-off between local
computation and
communication, in order to offset the expensive transfer of
data over
the fundamentally unreliable wireless link. On the contrary,
we focus
on the challenges of storing sensor readings locally at each
node. This
In-Situ storage paradigm offers a novel perspective for
conserving
energy, as we access the communication channel to answer
on-demand
queries rather than for percolating each and every event to a
centralized database. Storing large quantities of data locally
at each
node has to be complemented by efficient index structures that
will
enable access to data when required.
In this talk we present MicroHash, an external memory index
structure which is tailored to the distinct characteristics of
the most
prevalent type of non-volatile memory used in sensor systems,
namely
flash memory. Our index structure exploits the asymmetric
read/write
and wear characteristics of flash memory in order to offer
high
performance indexing and searching capabilities in the
presence of a
low energy budget.
-
Location: Sophia-Antipolis, France (CoreGRID Industrial
Conference).
Date: December 1st, 2006. (Best Demo
Award)
Abstract:
Intensive Care Units (ICUs) at hospitals utilize cutting edge
technology in order to acquire the physiological state of
inpatients,
which are in a critical (life-threatening) physiological
state, at an
extremely high fidelity. In particular, ICUs utilize a very
large
number of monitoring and sensing devices that are continuously
attached
on inpatients in order to uncover the physiological state of
the
inpatients. Such measurements can then be utilized for i)
education,
ii) early diagnosis and iii) for defining early warning
systems that
identify when a human life is jeopardy. A problem with the
current
setting is that individual ICUs are limited to the locally
acquired
measurements. As a result, the number of clinically
"interesting"
episodes available to doctors is also very limited.
ICGrid (Intensive Care Grid) is a distributed platform that
enables the
seamless integration, correlation and retrieval of clinically
interesting episodes across Intensive Care Units, which is
currently
under development by our group. Such a task requires huge
processing
and data storage capabilities, which are common attributes of
Grid
infrastructures. ICGrid is based on a hybrid architecture that
combines
i) a heterogeneous set of monitors that sense the inpatients
and ii)
Grid technology that enables the storage, processing and
information
sharing task between Intensive Care Units. Our demonstration
aims at
presenting the first part of the hybrid architecture of ICGrid
(i.e.
the acquisition of real signals from inpatients and their
storage on
the Grid). Our demonstration platform operates on a standalone
laptop.
In a real setting, this software is able to extract the
physiological
parameters from monitoring devices installed at ICUs.
-
``Business Processes: Behavior Prediction and Capturing
Reasons for Evolution''
Location: Paphos, Cyprus ("8th International Conference on
Enterprise Information Systems)
Date: May 24th, 2006
-
Location: Nicosia, Cyprus (Computer Science Colloquium
Series, University of Cyprus)
Date: 31 March 2006.
Abstract: Wireless Sensor Networks offer a non-intrusive and
non-disruptive technology that enables users to monitor and
understand
the physical world at an extremely high fidelity. Research to
this day
has primarily focused on the trade-off between local
computation and
communication, in order to offset the expensive transfer of
data over
the fundamentally unreliable wireless link. On the contrary,
we focus
on the challenges of storing sensor readings locally at each
node. This
In-Situ storage paradigm offers a novel perspective for
conserving
energy, as we access the communication channel to answer
on-demand
queries rather than for percolating each and every event to a
centralized database. Storing large quantities of data locally
at each
node has to be complemented by efficient index structures that
will
enable access to data when required.
In this talk we present MicroHash, an external memory index
structure which is tailored to the distinct characteristics of
the most
prevalent type of non-volatile memory used in sensor systems,
namely
flash memory. Our index structure exploits the asymmetric
read/write
and wear characteristics of flash memory in order to offer
high
performance indexing and searching capabilities in the
presence of a
low energy budget.
-
Location: Nicosia, Cyprus (eNEXT Workshop on Sensor and
Ad-hoc Networks)
Date: March 14th, 2006.
-
``Global Internet Content Delivery''
Location:
Nicosia, Cyprus ("EPL602 - Programming of Internet Systems
and
Services", Department of Computer Science, University of
Cyprus)
Date: November 22th, 2005.
-
Location: Nicosia, Cyprus (Computer Science Colloquium
Series, University of Cyprus)
Date: November 16th, 2005.
Abstract:
Modern Sensor and Peer-to-Peer data management systems have to
cope
with data that is generated automatically and continuously
across
distributed and potentially geographically diverse locations.
Organizing data in centralized repositories is becoming
increasingly
expensive and in many occasions impractical. Additionally,
users are
usually only interested in finding the highest ranked answers
to their
queries rather that the complete range of answers.
In this talk, I will present efficient techniques to answer
Top-K
queries in a distributed environment. A Top-K query returns
the K
highest ranked answers to a user defined similarity function.
At the
same time it also minimizes some cost metric which is
associated with
the retrieval of the desired answer set. My talk focuses on
the
Threshold Join Algorithm (TJA), which is a novel distributed
Top-K
query processing algorithm that combines local similarity
scores
available at each computing site. I will also present the LB-K
and
UBLB-K algorithms which utilize lower and upper bounds, when
exact
scores are not available. An extensive experimental evaluation
with our
distributed middleware testbed reveals that the proposed
methods are
orders of magnitudes more efficient than their competitors.
-
Location: Toronto, Canada (DBISP2P'04 (VLDB
Conference))br> Date: September 2004
Abstract: We initiate a study on the effect of the network
topology on
the performance of Peer-to-Peer (P2P) information retrieval
systems.
The emerging P2P model has become a very powerful and
attractive
paradigm for developing Internet-scale systems for sharing
resources,
including files, or documents. We show that the performance of
Information Retrieval algorithms can be significantly improved
through
the use of fully distributed topologically aware overlay
network
construction techniques. Our empirical results, using the
Peerware
middleware infrastructure, show that the approach we propose
is both
efficient and practical.
-
Location: McLean VA, USA (The ACM Conference on
Information and Knowledge Management)
Date: November 2002.
Abstract: One important problem in peer-to-peer (P2P) networks
is
searching and retrieving the correct information. However,
existing
searching mechanisms in pure peer-to-peer networks are
inefficient due
to the decentralized nature of such networks. We propose two
mechanisms
for information retrieval in pure peer-to-peer networks. The
first, the
modified Breadth-First-Search (BFS) mechanism, is an extension
of the
current Gnuttela protocol, allows searching with keywords, and
is
designed to minimize the number of messages that are needed to
search
the network. The second, the Intelligent Search mechanism,
uses the
past behavior of the P2P network to further improve the
scalability of
the search procedure. In this algorithm, each peer
autonomously decides
which of its peers are most likely to answer a given query.
The
algorithm is entirely distributed, and therefore scales well
with the
size of the network. We implemented our mechanisms as
middleware
platforms. To show the advantages of our mechanisms we present
experimental results using the middleware implementation.
-
``A Quantitative Analysis of the Gnutella Network
Traffic''
Location: Nicosia, Cyprus (Computer Science Colloquium
Series, University of Cyprus)
Date: July 2002.
Talks in Greek
-
Nicosia, Cyprus ("EPL202 - Explorations in Computer
Science",
Undergraduate Course, Department of Computer Science,
University of
Cyprus)
Date: Sept 15, 2010
-
Nicosia, Cyprus (2η Ημερίδα Πληροφορικής για Μαθητές
Λυκείων και Τεχνικών Σχολών)
Date: March 19, 2011.
-
Nicosia, Cyprus ("EPL202 - Explorations in Computer
Science",
Undergraduate Course, Department of Computer Science,
University of
Cyprus)
Date: Sept 23, 2010
-
Location: Ayia Napa, Cyprus (The 9th Hellenic Data
Management Symposium (HDMS'10))
Date: Jul 2, 2010.
Περίληψη:
Σε αυτό το άρθρο παρουσιάζουμε έναν κατανεμημένο αλγόριθμο για
την
δημιουργία ενός ισοζυγισμένου δένδρου επικοινωνίας που
αποσκοπεί στην
συλλογή δεδομένων από ένα ασύρματο δίκτυο αισθητήρων. Ο
αλγόριθμος
αυτός έχει ελάχιστο κόστος εκτέλεσης και το απορρέον δένδρο
επικοινωνίας έχει σχεδόν βέλτιστη ισορροπία. Κατά την συλλογή
δεδομένων
κάθε σύγκρουση μεταξύ πακέτων προκαλεί την επαναποστολή τους.
Η ίση
κατανομή των βαθμών μεταξύ των κόμβων στο δένδρο επικοινωνίας
έχει ως
αποτέλεσμα την ελαχιστοποίηση των συγκρούσεων αυτών και
συνεπώς την
εξοικονόμηση ενέργειας και την αύξηση του χρόνου ζωής του
ασύρματου
δικτύου αισθητήρων. Συγκρίνουμε τον αλγόριθμό μας με έναν
υπάρχον
αλγόριθμο και έναν κεντρικό αλγόριθμο. Τα αποτελέσματα
δείχνουν ότι ο
αλγόριθμός μας υπερέχει του ανταγωνισμού για την πλειοψηφία
των
δικτυακών τοπολογιών και επιτυγχάνει σχεδόν βέλτιστη ισορροπία
στο
δέντρο. Επίσης, έχει το ελάχιστο δυνατό κόστος εκτέλεσης
συντελώντας
έτσι ακόμα περισσότερο στην εξοικονόμηση ενέργειας στο δίκτυο.
-
Nicosia, Cyprus (Ημερίδα Πληροφορικής για Μαθητές Λυκείων
και Τεχνικών Σχολών, Σάββατο, 6/5/2010, Πανεπιστημιούπολη.
Date: March 6th, 2010.
-
Nicosia, Cyprus ("EPL202 - Explorations in Computer
Science",
Undergraduate Course, Department of Computer Science,
University of
Cyprus)
Date: Sept 30th, 2009.
-
Location: Heraklion, Crete (The 7th Hellenic Data
Management Symposium (HDMS'08))
Date: July 29th, 2008.
Abstract: Continuous queries in wireless sensor networks are
established on the premise of a routing tree that provides
each sensor
with a path over which answers can be transmitted to the query
processor. The number of tuples received by S in a given epoch
e
denotes the workload of S. Since the exact workload of a node
is not
known ahead of time, a node has to enable its transceiver for
a
sub-optimal amount of time in order to collect the results
from its
children. We found that this leads to an enormous waste of
energy in
predominant data acquisition frameworks such as TAG and
Cougar. We
found that these structures are sub-optimality constructed in
predominant data acquisition systems leading to an enormous
waste of
energy. In this paper we present MicroPulse, a workload-aware
optimization algorithm for query routing trees in wireless
sensor
networks. Our algorithm is established on profiling recent
data
acquisition activity and on identifying the bottlenecks using
an
in-network execution of the critical path method. A node S
utilizes
this information in order to locally derive the time instance
during
which it should wake up, the interval during which it should
deliver
its workload and the workload increase tolerance of its parent
node. We
additionally provide an elaborate description of
energy-conscious
algorithms for disseminating and maintaining the critical path
cost in
a distributed manner. Our trace-driven experimentation with
real sensor
traces from Intel Research Berkeley shows that MicroPulse can
reduce
the data acquisition costs by many orders.
-
Location:
Nicosia, Cyprus ("EPL601 - Distributed Systems", Graduate
Course,
Department of Computer Science, University of Cyprus)
Date: November 16th, 2007.
-
Location: Athens, Greece (The 6th Hellenic Data Management
Symposium (HDMS'07))
Date: July 5th 2007.
Abstract: In this paper we introduce the distributed
spatio-temporal similarity search problem: given a query
trajectory Q, we want to find the trajectories that follow a
motion similar to Q, when each of the target trajectories is
segmented across a number of distributed nodes. We propose two
novel algorithms, UB-K and UBLB-K, which combine local
computations of lower and upper bounds on the matching between
the distributed subsequences and Q. Such an operation
generates the desired result without pulling together all the
distributed subsequences over the fundamentally expensive
communication medium. Our solutions find applications in a
wide array of domains, such as cellular networks, wildlife
monitoring and video surveillance. Our experimental evaluation
using realistic data demonstrates that our framework is both
efficient and robust to a variety of conditions.
-
Location: Athens, Greece (The 6th Hellenic Data Management
Symposium (HDMS'07))
Date: July 5th 2007.
Abstract: In this paper we introduce MINT (Materialized
In-Network Top-k) Views, a novel framework for optimizing the
execution of continuous monitoring queries in sensor networks.
A typical materialized view V maintains the complete results
of a query Q in order to minimize the cost of future query
executions. In a sensor network context, maintaining
consistency between V and the underlying and distributed base
relation R is very expensive in terms of communication. Thus,
our approach focuses on a subset V' (\subseteq V) that unveils
only the k highest-ranked answers at the sink for some user
defined parameter k. We additionally provide an elaborate
description of energy-conscious algorithms for constructing,
pruning and maintaining such recursively-defined in-network
views. Our trace-driven experimentation with real datasets
show that MINT offers significant energy reductions compared
to other predominant data acquisition models.
-
Location:
Nicosia, Cyprus ("EPL601 - Distributed Systems", Graduate
Course,
Department of Computer Science, University of Cyprus)
Date: October 12th, 2006.