MongoDB Blog
Articles, announcements, news, updates and more
Hydrus Helps Companies Improve ESG Performance
More organizations are embracing workforce diversity, environmental sustainability, and responsible corporate governance in an effort to improve their Environmental, Social, and Governance (ESG) performance. As investors increasingly favor ESG in their portfolios, organizations are under greater pressure to capture, store, and verify ESG metrics. San Francisco-based startup, Hydrus, is helping companies make ESG data more usable and actionable. The Platform Hydrus, a MongoDB for Startups program member, is a software platform that enables enterprises to collect, store, report, and act on their environmental, social, and governance data. ESG data includes things like: How a company safeguards the environment Its energy consumption and how it impacts climate change How it manages relationships with employees, suppliers, and customers Details about the company’s leadership, executive pay, audits, and internal controls The Hydrus platform enables organizations to collect, store, and audit diversity and environmental data, and run analytics and machine learning against that data. Hydrus offers users a first-rate UI/UX so that even non-technical users can leverage the platform. With the auditing capabilities, organizations can ensure the provenance and integrity of ESG data over time. Other solutions don't allow users to go back in time and determine who made changes to the data, why they made them, what earlier versions of the data looked like, and what time the changes were made. Hydrus gives users complete visibility into these activities. The Tech Stack MongoDB Atlas was the preferred database for Hydrus because of the flexibility of the data model. George Lee, founder and CEO of Hydrus, says the traditional SQL database model was too limiting for the startup's needs. MongoDB's document model eliminated the need to create tables or enforce restrictions of data fields. With MongoDB, they could simply add fields without undertaking any major schema changes. Hydrus also tapped MongoDB for access to engineers and technical resources. This enabled the company to architect its platform for all of the different types of sustainability data that exist. MongoDB technical experts helped Hydrus model data for future scalability and flexibility so it could add data fields when the need arises. On top of Atlas and MongoDB technical support, Hydrus leans heavily on MongoDB Charts , a data visualization tool for creating, sharing, and embedding visualizations from MongoDB Atlas. Charts enables Hydrus to derive insights from ESG data, giving its Fortune 200 clients better visibility into their operational efficiency. Charts uses a drag-and-drop interface that makes it easy to build charts and answer questions about ESG data. A Hydrus customer using MongoDB Charts was better able to understand the impact of their footprint from a greenhouse gas perspective and a resource usage perspective. Another customer detected a 30x increase in refrigerant usage in one of its facilities. The visual analytics generated with MongoDB Charts enabled the company to make changes to improve their ESG performance. MongoDB Charts enabled Hydrus to visualize sustainability data "MongoDB Charts enables our customers to directly report their sustainability data, customize the charts, and better tell the sustainability story in a visual format," Lee says. "It's way better than the traditional format where you have data, tables, and spreadsheets everywhere." The Roadmap Hydrus seeks to take the hassle out of managing a sustainable business by streamlining data collection, reporting, and auditing processes. Its platform is designed to eliminate manual tasks for sustainability managers so they can focus on decarbonization, resource usage optimization, and being able to hit their sustainability goals. Hydrus accelerates these activities by helping companies model their sustainability data around science-based targets so they can better decarbonize and meet other ESG goals. If you're interested in learning more about how to help your organization become more sustainable, decarbonize, and succeed in your sustainability journey, visit the Hydrus website . Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now . For more startup content, check out our wrap-up of the 2022 year in startups .
Predictions 2023: Modernization Efforts in the Financial Services Industry
As a global recession looms, banks are facing tough economic conditions in 2023. Lowering costs will be vital for many organizations to remain competitive in a data-intensive and highly regulated environment. Thus, it’s important that any IT investments accelerate digital transformation with innovative technologies that break down data silos, increase operational efficiency, and build personalized customer experiences. Read on to learn about areas in which banks are looking to modernize in 2023 to build better customer experiences at a lower cost and at scale. Shaping a better banking future with composable designs With banks eager to modernize and innovate, institutions must move away from the legacy systems that are restricting their ability to show progress. Placing consumers at the center of a banking experience made up of interconnected, yet independent services offers technology-forward banks the chance to reshape their business models and subsequently grow market share and increase profitability. These opportunities have brought to fruition a composable architecture design allows faster innovation, improved operational efficiency, and creates new revenue streams by extending the portfolio of services and products. Thus, banks are able to adopt the best-of-breed and perfect-fit-for-purpose software available by orchestrating strategic partnerships with relevant fintechs and software providers. This new breed of suppliers can provide everything from know your customer (KYC) services to integrated booking, load services or basic marketing and portfolio management functionalities. This approach is more cost efficient for institutions than having to build and maintain the infrastructure themselves, and it is significantly faster in terms of time to market and time to revenue. Banks adopting such an approach are seeing fintechs less as competitors and more as part of an ecosystem to collaborate with to accelerate innovation and reach customers. Operational efficiency with intelligent automation Financial institutions will continue to focus on operational efficiency and cost control through automating previous manual and paper-driven processes. Banks have made some progress digitizing and automating what were once almost exclusively paper-based, manual processes. But, the primary driver of this transformation has been compliance with local regulations rather than an overarching strategy for really getting to know the client and achieving true customer delight. The market is eager for better automated and data-driven decisions, and legacy systems can’t keep up. Creating hyper-personalized experiences that customers demand, which include things like chatbots, self-service portals, and digital forensics, is difficult for institutions using outdated technology. And, having data infrastructure in siloes prohibits any truly integrated modern experience. Using a combination of robotic process automation (RPA), machine learning (ML), and artificial intelligence (AI), financial institutions are able to streamline processes, thereby freeing the workforce to focus on tasks that drive a bigger impact for the customer and business. Institutions must not digitize without considering the human interaction that will be replaced, as customers prefer a hybrid approach. The ability to act on real-time data is the way forward for driving value and transforming customer experiences, which must be accompanied by the modernization of the underlying data architecture. The prerequisite for this goal involves the de-siloing of data and sources into a holistic data landscape. Some people call it a data mesh , some composable data sources, virtualized data. Solving ESG data challenges Along with high inflation, the cost-of-living crisis, energy turmoil, and rising interest rates, environmental, social, and governance (ESG) is also in the spotlight. There is growing pressure from regulators to provide ESG data and from investors to make sure portfolios are sustainable. The role of ESG data in conducting market analysis, supporting asset allocation and risk management, and providing insights into the long-term sustainability of investments continues to expand. The nature and variability of many ESG metrics is a major challenge facing companies today. Unlike financial datasets that are mostly numerical, ESG metrics can include both quantitative and qualitative data to help investors and other stakeholders understand a company’s actions and intentions. This complexity, coupled with the lack of a universally applicable ESG reporting standard, means institutions must consider different standards with different data requirements. To master ESG reporting, including the integration of relevant KPIs, appropriate, high-quality data is needed that is also at the right level of granularity and covers the required industries and region. Given the data volume and complexity, financial institutions are building ESG platforms underpinned by modern data platforms that are capable of consolidating different types of data from various providers, creating customized views, modeling data, and performing operations with no barriers. Digital payments - Unlocking an enriched experience Pushed by new technologies and global trends, the digital payments market is flourishing globally. With a valuation of more than $68 billion in 2021 and expectations of double-digit growth over the next decade, emerging markets are leading the way in terms of relative expansion. This growth has been driven by pandemic-induced cashless payments, e-commerce, government push, and fintechs. Digital payments are transforming the payments experience. While it was once enough for payment service providers to supply account information and orchestrate simple transactions, consumers now expect an enriched experience where each transaction offers new insights and value-added services. Meeting these expectations is difficult, especially for companies that rely on outdated technologies that were created long before transactions were carried out with a few taps on a mobile device. To meet the needs of customers, financial institutions are modernizing their payments data infrastructure to create personalized, secure, and real-time payment experiences — all while protecting consumers from fraud. This modernization allows financial institutions to ingest any type of data, launch services more quickly at a lower cost, and have the freedom to run in any environment, from on-premises to multi-cloud . Security and risk management Data is critical to every financial institution; it is recognized as a core asset to drive customer growth and innovation. As the need to leverage data efficiently increases, however, according to 57% of decision makers , the legacy technology that still underpins many organizations is too expensive and doesn’t fulfill the requirements of modern applications. Not only is this legacy infrastructure complex, it is unable to meet current security requirements. Given the huge amount of confidential client and customer data that the financial services industry deals with on a daily basis — and the strict regulations surrounding that data — security must be of highest priority. The perceived value of this data also makes financial services organizations a primary target for data breaches. Fraud protection, risk management, and anti-money laundering are high priorities for any new data platform according to Forrester’s What’s Driving Next-Generation Data Platform Adoption in Financial Services study. To meet these challenges, adoption of next-generation data platforms will continue to grow as financial institutions realize their full potential to manage costs, maximize security, and foster innovation. Download Forrester’s full study — What’s Driving Next-Generation Data Platform Adoption in Financial Services — to learn more.
How Startups Stepped Up in 2022
After muddling through the global pandemic in 2021, entrepreneurs emerged in 2022 ready to transform the way people live, learn, and work. Through the MongoDB for Startups program, we got a close-up view of their progress. What we observed was a good indication of how critical data is to delivering the transformative experiences users expect. Data access vs. data governance The increasing importance of data in the digital marketplace has created a conflict that a handful of startups are working to solve: Granting access to data to extract value from it while simultaneously protecting it from unauthorized use. In 2022, we were excited to work with promising startups seeking to strike a balance between these competing interests. Data access service provider Satori enables organizations to accelerate their data use by simplifying and automating access policies while helping to ensure compliance with data security and privacy requirements. At most organizations, providing access to data is a manual process often handled by a small team that's already being pulled in multiple directions by different parts of the organization. It's a time-consuming task that takes precious developer resources away from critical initiatives and slows down innovation. Data governance is a high priority for organizations because of the financial penalties of running afoul of data privacy regulations and the high cost of data breaches. While large enterprises make attractive targets, small businesses and startups in particular need to be vigilant because they can less afford financial and reputational setbacks. San Francisco-based startup Vanta is helping companies scale security practices and automate compliance for the most prevalent data security and privacy regulatory frameworks. Its platform gives organizations the tools they need to automate up to 90% of the work required for security audits. Futurology The Internet of Things (IoT), artificial intelligence (AI), virtual reality (VR), and natural language processing (NLP) remain at the forefront of innovation and are only beginning to fulfill their potential as transformative technologies. Through the MongoDB for Startups program, we worked with several promising ventures that are leveraging these technologies to deliver game-changing solutions for both application developers and users. Delaware-based startup Qubitro helps companies bring IoT solutions to market faster by making the data collected from mobile and IoT devices accessible anywhere it's needed. Qubitro creates APIs and SDKs that let developers activate device data in applications. With billions of devices producing massive amounts of data, the potential payoff in enabling data-driven decision making in modern application development is huge. London-based startup Concured uses AI technology to help marketers know what to write about and what's working for themselves and their competitors. It also enables organizations to personalize experiences for website visitors. Concured uses NLP to generate semantic metadata for each document or article and understand the relationship between articles on the same website. Another London-based startup using AI and NLP to deliver transformative experiences is Semeris . Analyzing legal documents is a tedious, time-consuming process, and Semeris enables legal professionals to reduce the time it takes to extract information from documentation. The company’s solution creates machine learning (ML) models based on publicly available documentation to analyze less seen or more private documentation that clients have internally The language we use in day-to-day communication says a lot about our state of mind. Sydney-based startup Pioneera looks at language and linguistic markers to determine if employees are stressed out at work or at risk for burnout. When early warning signs are detected, the person gets the help they need to reduce stress, promote wellness, and improve productivity confidentially and in real time. Technologies like AR and VR are transforming learning for students. Palo Alto-based startup Inspirit combines 3D and VR instruction to create an immersive learning experience for middle and high school students. The platform helps students who love science engage with the subject matter more deeply and those who dislike it to experience it in a more compelling format. No code and low code The startup space is rich with visionary thinkers and ideas. But the truth is that you can't get far with an idea if you don't have access to developer talent, which is scarce and costly in today's job market. We've worked with a couple of companies through the MongoDB for Startups program that are helping entrepreneurs breathe life into their ideas with low- and no-code solutions for building applications and bringing them to market. Low- and no-code platforms enable users with little or no coding background to satisfy their own development needs. For example, Alloy Automation is a no-code integration solution that integrates with and automates ecommerce services, such as CRM, logistics, subscriptions, and databases. Alloy can automate SMS messages, automatically start a workflow after an online transaction, determine if follow-up action should be taken, and automate actions in coordination with connected apps. Another example is Thunkable , a no-code platform that makes it easy to build custom mobile apps without any advanced software engineering knowledge or certifications. Thunkable's mission is to democratize mobile app development. It uses a simple drag-and-drop design and powerful logic blocks to give innovators the tools they need to breathe life into their app designs. The startup journey Although startups themselves are as diverse as the people who launch them, all startup journeys begin with the identification of a need in the marketplace. The MongoDB for Startups program helps startups along the way with free MongoDB Atlas credits, one-on-one technical advice, co-marketing opportunities, and access to a vast partner network. Are you a startup looking to build faster and scale further? Join our community of pioneers by applying to the MongoDB for Startups program. Apply now .
Improving Building Sustainability with MongoDB Atlas and Bosch
Every year developers from more than 45 countries head to Berlin to participate in the Bosch Connected Experience (BCX) hackathon — one of Europe’s largest AI and Internet of Things (AIoT) hackathons. This year, developers were tasked with creating solutions to tackle a mix of important problems, from improving sustainability in commercial building operations and facility management to accelerating innovation of automotive-grade, in-car software stacks using a variety of hardware and software solutions made available through Bosch, Eclipse, and their ecosystem partners. MongoDB also took part in this event and even helped one of the winning teams build their solution on top of MongoDB Atlas. I had the pleasure of connecting with a participant from that winning team, Jonas Bruns, to learn about his experience building an application for the first time with MongoDB Atlas. Ashley George: Tell us a little bit about your background and why you decided to join this year's BCX hackathon? Jonas Bruns: I am Jonas, an electrical engineering student from Friedrich Alexander University in Erlangen Nürnberg. Before I started my master’s program, I worked in the automotive industry in the Stuttgart area. I was familiar with the BCX hackathon from my time in Stuttgart and, together with two friends from my studies, decided to set off to Berlin this year to take part in this event. The BCX hackathon is great because there are lots of partners on site to help support the participants and provide knowledge on both the software and hardware solutions available to them — allowing teams to turn their ideas into a working prototype within the short time available. We like being confronted with new problems and felt this was an important challenge to take on, so participation this year was a must for us. AG: Why did you decide to use MongoDB Atlas for your project? JB: We started with just the idea of using augmented reality (AR) to improve the user experience (UX) of smart devices. To achieve this goal, we needed not only a smartphone app but also a backend in which all of our important data is stored. Due to both limited time and the fact that no one on our team had worked with databases before, we had to find a solution that would grow with our requirements and allow us to get started as easily as possible. Ideally, the solution would also be fully managed as well to eliminate us having to take care of security on our own. After reviewing our options, we quickly decided on using MongoDB Atlas . AG: What was it like working with MongoDB Atlas, especially having not worked with a database solution before? JB: The setup was super easy and went pretty fast. Within just a short time, we were able to upload our first set of data to Atlas using MongoDB Compass . As we started to dive in and explore Atlas a bit more we discovered the trigger functionality (Atlas Triggers), which we were able to use to simplify our infrastructure. Originally, we planned to use a server connected to the database, which would react to changed database entries. This would then send a request to control the desired periphery. The possibility to configure triggers directly in the database made a server superfluous and saved us a lot of time. We configured the trigger so that it executes a JavaScript function when a change is made to the database. This evaluates data from the database and executes corresponding requests, which directly control the periphery. Initially, we had hit a minor roadblock in determining how to handle the authentication needs (creating security tokens), which the periphery needs and expects during a request. To solve for this, we stored the security tokens on an AWS server which listens to an HTTP request. From Atlas, we then just have to call the URL and the AWS instance does the authentication and control of the lights. After we solved this problem, we were thrilled with how little configuration was needed and how intuitive Atlas is. The next steps, like connecting Atlas to the app, were easy. We achieved this by sending data from Flutter to Atlas over HTTPs with the Atlas Data API . AG: How did Atlas enable you to build your winning application? JB: By the end of the challenge, we had developed our idea into a fully functional prototype using Google ARcore, Flutter, MongoDB Atlas, and the Bosch Smart Home Hardware (Figure 1). We built a smartphone application that uses AR to switch on and off a connected light in a smart building. The position and state of the light (on or off) are stored in the database. If the state of the light should change, the app manipulates the corresponding value in the database. The change triggers a function that then sets the light to the desired state (on or off). The fact that we were able to achieve this within a short time without sufficient prior knowledge is mainly due to the ease and intuitive nature of Atlas. The simple handling allowed us to quickly learn and use the available features to build the functionality our app needed. Figure 1: Tech stack for the projects prototype. AG: What additional features within Atlas did you find the most valuable in building your application? JB: We created different users to easily control the access rights of the app and the smart devices. By eliminating the need for another server to communicate with the smart devices and using the trigger function of Atlas, we were able to save a lot of time on the prototype. In addition, the provided preconfigured code examples in various languages facilitated easy integration to our frontend and helped us avoid errors. Anyone who is interested can find the results of our work in the GitHub repo . AG: Do you see yourself using Atlas more in the future? JB: We will definitely continue to use Atlas in the future. The instance from the hackathon is still online, and we want to get to know the other functionalities that we haven't used yet. Given how intuitive Atlas was in this project, I am also sure that we will continue to use it for future projects as well. Through this project, Jonas and team were able to build a functional prototype that can help commercial building owners have more control over their buildings and take the steps to help reduce CO₂ emissions.
Introducing MongoDB Connector for Apache Kafka version 1.9
Today, MongoDB released version 1.9 of the MongoDB Connector for Apache Kafka! This article highlights the key features of this new release! Pre/Post document states In MongoDB 6.0, Change Streams added the ability to retrieve the before and after state of an entire document . To enable this functionality on the collection you can set it as a parameter in the createCollection command such as: db.createCollection( "temperatureSensor", { changeStreamPreAndPostImages: { enabled: true } } ) Alternatively, for existing collections, use colMod as shown below: db.runCommand( { collMod: <collection>, changeStreamPreAndPostImages: { enabled: <boolean> } } ) Once the collection is configured for pre and post images, you can set the change.stream.full.document.before.change source connector parameter to include this extra information in the change event. For example, consider this source definition: { "name": "mongo-simple-source", "config": { "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector", "connection.uri": "<< MONGODB CONNECTION STRING >>", "database": "test", "collection": "temperatureSensor", "change.stream.full.document.before.change":"whenavailable" } } When the following document is inserted: db.temperatureSensor.insertOne({'sensor_id':1,'value':100}) Then an update is applied: db.temperatureSensor.updateOne({'sensor_id':1},{ $set: { 'value':105}}) You can see the change stream event written to Kafka topic is as follows: { "_id": { "_data": "82636D39C8000000012B022C0100296E5A100444B0F5E386F04767814F28CB4AAE7FEE46645F69640064636D399B732DBB998FA8D67E0004" }, "operationType": "update", "clusterTime": { "$timestamp": { "t": 1668102600, "i": 1 } }, "wallTime": { "$date": 1668102600716 }, "ns": { "db": "test", "coll": "temperatureSensor" }, "documentKey": { "_id": { "$oid": "636d399b732dbb998fa8d67e" } }, "updateDescription": { "updatedFields": { "value": 105 }, "removedFields": [], "truncatedArrays": [] }, "fullDocumentBeforeChange": { "_id": { "$oid": "636d399b732dbb998fa8d67e" }, "sensor_id": 1, "value": 100 } } Note the fullDocumentBeforeChange key includes the original document before the update occurred. Starting the connector at a specific time Prior to version 1.9, when the connector starts as a source, it will open a MongoDB change stream and any new data will get processed by the source connector. To copy all the existing data in the collection first before you begin processing the new data, you specify the “ copy.existing ” property. One frequent user request is to start the connector based upon a specific timestamp versus when the connector starts. In 1.9 a new parameter called startup.mode was added to specify when to start writing data. startup.mode=latest (default) “Latest” is the default behavior and starts processing the data when the connector starts. It ignores any existing data when the connector starts. startup.mode=timestamp “timestamp” allows you to start processing at a specific point in time as defined by additional startup.mode.timestamp.* properties. For example, to start the connector from 7AM on November 21, 2022, you set the value as follows: startup.mode.timestamp.start.at.operation.time=’2022-11-21T07:00:00Z’ Supported values are an ISO-8601 format string date as shown above or as a BSON extended string format. startup.mode=copy.existing Same behavior as the existing as the configuration option, “copy.existing=true”. Note that “copy.existing” as a separate parameter is now deprecated. If you defined any granular copy.existing parameters such as copy.existing.pipeline, just prepend them with “startup.mode.copy.existing.” property name. Reporting MongoDB errors to the DLQ Kafka supports writing errors to a dead letter queue . In version 1.5 of the connector, you could write all exceptions to the DLQ through the mongo.error.tolerance=’all’ . One thing to note was that these errors were Kafka generated errors versus errors that occurred within MongoDB. Thus, if the sink connector failed to write to MongoDB due to a duplicate _id error, for example, this error wouldn’t be written to the DLQ. In 1.9, errors generated within MongoDB will be reported to the DLQ. Behavior change on inferring schema Prior to version 1.9 of the connector, if you are inferring schema and insert a MongoDB document that contains arrays with different value data types, the connector is naive and would simply set the type for the whole array to be a string. For example, consider a document that resembles: { "myfoo": [ { "key1": 1 }, { "key1": 1, "key2": "dogs" } ] } If we set output.schema.infer.value . to true on a source connector, the message in the Kafka Topic will resemble the following: … "fullDocument": { … "myfoo": [ "{\"key1\": 1}", "{\"key1\": 1, \"key2\": \"dogs\"}" ] }, … Notice the array items contain different values. In this example, key1 is a subdocument with a single value the number 1, the next item in the “myfoo” array is a subdocument with the same “key1” field and value of an integer, 1, and another field, “key 2” that has a string as a value. When this scenario occurs the connector will wrap the entire array as a string. This behavior can also apply when using different keys that contain different data type values. In version 1.9, the connector when presented with this configuration will not wrap the arrays, rather it will create the appropriate schemas for the variable arrays with different data type values. The same document when run in 1.9 will resemble: "fullDocument": { … "myfoo": [ { "key1": 1, }, { "key1": 1, "key2": "DOGS" } ] }, Note that this behavior is a breaking change and that inferring schemas when using arrays can cause performance degradation for very large arrays using different data type values. Download the latest version of the MongoDB Connector for Apache Kafka from Confluent Hub! To learn more about the connector read the MongoDB Online Documentation . Questions? Ask on the MongoDB Developer Community Connectors and Integrations forum!
Top 3 Wins and Wants from the Latest TDWI Modernization Report
We recently reported that analyst and research firm TDWI had released its latest report on IT modernization: Maximizing the Business Value of Data: Platforms, Integration, and Management . The report surveyed more than 300 IT executives, data analysts, data scientists, developers, and enterprise architects to find out what their priorities, objectives, and experiences have been in terms of IT modernization. In many ways, organizations have made great progress. From new data management and data integration capabilities to smarter processes for higher business efficiency and innovations, IT departments have helped organizations get more value from the data they generate. In other cases, organizations are still stuck in data silos and struggling with improving data quality as data distribution increases due to the proliferation of multi-cloud environments. In this article, we'll summarize the top three areas where organizations are winning and the top three ways that organizations are left wanting when it comes to digital transformation and IT modernization. Download the complete report, Maximizing the Business Value of Data: Platforms, Integration, and Management , and find out the latest strategies, trends, and challenges for businesses seeking to modernize. Wins 1. Cloud migration Moving legacy applications to the cloud is essential for organizations seeking to increase operational efficiency and effectiveness, generate new business models through analytics, and support automated decision-making — the three biggest drivers of modernization efforts. And, most organizations are succeeding. Seventy-two percent of respondents in the TDWI survey reported being very or somewhat successful moving legacy applications to cloud services. Migrating to the cloud is one thing, but getting data to the right people and systems at the right time is another. For organizations to get full value of their data in the cloud, they also need to ensure the flow of data into business intelligence (BI) reports, data warehouses, and embedded analytics in applications. 2. 24/7 operations The ability to run continuous operations is a widely shared objective when organizations take on a transformation effort. Increasingly global supply chains, smaller and more dispersed office locations, and growing international customer bases are major drivers of 24/7 ops. And, according to the TDWI survey, more than two-thirds of organizations say they've successfully transitioned to continuous operations. 3. User satisfaction Organizations are also winning the race to match users' needs when provisioning data for BI, analytics, data integration, and the data management stack. Eighty percent of respondents said their users were satisfied with these capabilities. Additionally, 72% trusted in the quality of data and how it's governed, and 68% were satisfied that role-based access controls were doing a good job of ensuring that only authorized users had access to sensitive data. Wants 1. Artificial intelligence, machine learning, and predictive intelligence Machine learning (ML) and artificial intelligence (AI) comprise a key area where organizations are left wanting. While 51% of respondents were somewhat or very satisfied with their use of AI and ML data, almost the same number (49%) said they were neither satisfied nor dissatisfied, somewhat dissatisfied, or very dissatisfied. Similar results were also reported for data-driven predictive modeling. The report notes that provisioning data for AI/ML is more complex and varied than for BI reporting and dashboards, but that cloud-based data integration and management platforms for analytics and AI/ML could increase satisfaction for these use cases. 2. More value from data Perhaps related to the AI/ML point, the desire to get more value out of their data was cited as the biggest challenge organizations face by almost 50% of respondents. Organizations today capture more raw, unstructured, and streaming data than ever, and they're still generating and storing structured enterprise data from a range of sources. One of the big challenges organizations reported is running analytics on so many different data types. According to TDWI, organizations need to overcome this challenge to inform data science and capitalize modern, analytics-infused applications . 3. Easier search A big part of extracting more value from data is making it easy to search. Traditional search functionality, however, depends on technically challenging SQL queries. According to the TDWI report, 19% of users were dissatisfied with their ability to search for data, reports, and dashboards using natural language. Unsurprisingly, frustration with legacy technologies was cited as the third biggest challenge facing organizations, according to the survey. The way forward "In most cases, data becomes more valuable when data owners share data," the TDWI report concludes. Additionally, the key to making data more shareable is moving toward a cloud data platform , one that makes data more available while simultaneously governing access when there's a need to protect the confidentiality of sensitive data. Not only does a cloud data platform make data more accessible and shareable for users, it also creates a pipeline for delivering data to applications that can use it for analytics, AI, and ML. Read the full TDWI report: Maximizing the Business Value of Data: Platforms, Integration, and Management .
MongoDB Is A Best Place to Work in 2023, According to Our Employees on Glassdoor
MongoDB is pleased to announce that we are among the winners of the annual Glassdoor Employees’ Choice Awards, a list of the Best Places to Work in 2023 . Unlike other workplace awards, there is no self-nomination or application process, instead it’s entirely based on the feedback our employees have voluntarily and anonymously shared on Glassdoor. To determine the winners of the awards, Glassdoor evaluates company reviews shared by current and former employees over the past year. This year, we are proud to be recognized as a Best Place to Work among U.S. companies with more than 1,000 employees. A huge thank you goes out to all our employees who took the time to share their perspective on what it’s like to work here. We appreciate all the valuable feedback as it only helps us improve. Below are just a few words employees shared on Glassdoor that contributed toward the award and make us feel incredibly honored: Senior Staff Engineer, Sydney “I have been working on the Storage Engine for MongoDB for over ten years now. In my tenure at MongoDB I have taken on a lot of different roles and responsibilities and am now a senior individual contributor. Working with my colleagues to build the best storage engine in the world as well as carefully crafting a diverse, inclusive, pragmatic, engaged and curious engineering culture. During my time here I've been able to actively contribute to its success, and have clearly understood the vision and pathway to that success. The company is continually growing and evolving to meet changing needs - it's an exciting place to work full of opportunity and challenges. Enterprise Account Executive, Tel-Aviv “Amazing tech and some of the most smart & experienced you'll ever have a chance to work with. Feedback is a big part of the culture and is given in an actionable, clear way that is intended to make you better in your craft and your results.” Deal Strategy Manager, Dublin “MongoDB is very passionate about culture and ensuring everyone who walks in the door fits the existing culture. This is a culture where openness, inclusiveness and respect are really important. Management wants to try as hard as they can to maintain the small company feel while the company scales. I have worked in some large companies where the term 'family' is used a lot but here there is truth in saying that there is a family feel amongst my team and in my office. I can attest to this as within my first year I have had to deal with two quite serious changes in my personal life and the team has been so supportive and nothing has ever been an issue. The Senior Leadership here is the strongest I have ever seen in my career and I have no doubt this company will continue to grow over the next 5 years. The offices are incredible and the employee benefits are exceptional.” Director, Developer Relations, Austin “The C-Suite management team is amazing. Dev is an amazing CEO who has surrounded himself with brilliant people who know how to execute. The market opportunity is incredible. MongoDB is the hands down leader in the NoSQL space and the "great replacement" of RDBMS is just getting started. Outstanding growth position in a turbulent market. The entire team is focused on one mission. MongoDB has one goal. We will extend our lead in the NoSQL technology sector as we disrupt the global database technology market and replace the RDBMS. Everyone here marches to the beat of the same drum.” We’re hiring in 2023 and would love for you to join us. View our current career opportunities .
Build Analytics-Driven Apps with MongoDB Atlas and the Microsoft Intelligent Data Platform
Customers increasingly expect engaging applications informed by real-time operational analytics, yet meeting these expectations can be difficult. MongoDB Atlas is a popular operational data platform that makes it straightforward to manage critical business data at scale. For some applications, however, enterprises may also want to apply insights gleaned from data warehouse, business intelligence (BI), and related solutions, and many enterprises depend on the Microsoft Intelligent Data Platform to apply analytics and governance solutions to operational data stores. MongoDB and Microsoft have partnered to make it simple to use the Microsoft Intelligent Data Platform to glean and apply comprehensive analytical insights to data stored in MongoDB. This article details how enterprises can successfully use MongoDB with the Microsoft Intelligent Data Platform to build more engaging, analytics-driven applications. Microsoft Intelligent Data Platform + MongoDB MongoDB Atlas provides a unified interface for developers to build distributed, serverless, and mobile applications with support for diverse workload types including operational, real-time analytics, and search. With the ability to model graph, geospatial, tabular, document, time series, and other forms of data, developers don’t have to go for multiple niche databases, which results in highly complex, polyglot architectures. The Microsoft Intelligent Data Platform offers a single platform for databases, analytics, and data governance by integrating Microsoft’s database, analytics, and data governance products. In addition to all Azure database services, the Microsoft Intelligent Data Platform includes Azure Synapse Analytics for data warehousing and analytics, Power BI for BI reporting, and Microsoft Purview for enterprise data governance requirements. Although customers have always been able to apply the Microsoft Intelligent Data Platform services to MongoDB data, doing so hasn't always been as simple as it could be. Through this new integration, customers gain a seamless way to run analytics and data warehousing operations on the operational data they store in MongoDB Atlas. Customers can also more easily use Microsoft Purview to manage and run data governance policies against their most critical MongoDB data, thereby ensuring compliance and security. Finally, through Power BI customers are empowered to easily query and extract insights from MongoDB data using powerful in-built and custom visualizations. Let’s deep dive into each of these integrations. Operationalize insights with MongoDB Atlas and Azure Synapse Analytics MongoDB Atlas is an Operational Data Platform which can handle multiple workload types including transactional, search, operational analytics, etc. It can cater to multiple application types including distributed, serverless, mobile, etc. For data warehousing workloads, long-running analytics, and AI/ML, we compliment Azure Synapse Analytics very well. MongoDB Atlas can be easily integrated as a source or as a sink resource in Azure Synapse Analytics. This connector is useful to: Fetch all the MongoDB Atlas historical data into Synapse Retrieve incremental data for a period based on filter criteria in a batch mode, to run SQL based or Spark based analytics The sink connector allows you to store the analytics results back to MongoDB, which can then power applications enabled on top of it. Many enterprises require real-time analytics, for example, in fraud detection, anomaly detection of IoT devices, predicting stock depletion, and maintenance of machinery, where a delay in getting insights could cause serious repercussions. MongoDB and Microsoft have worked together to come up with the best practice architecture for the same which can be found in this article . Figure 1: Schematic showing integration of MongoDB with Azure Synapse Analytics. Business intelligence reporting and visualization with PowerBI Together, MongoDB Atlas and Microsoft PowerBI offer a sophisticated real-time data platform, providing customers with the ability to present specialized operational and analytical query engines on the same data sets. Information on connecting from PowerBI desktop to MongoDB is available in the official documentation . MongoDB is also excited to announce the forthcoming MongoDB Atlas Power BI Connector that will expose the richness of the JSON document data with Power BI (see Figure 2). This MongoDB Atlas Power BI Connector allows users to unlock access to their Atlas cloud data. Figure 2: Schematic showing integration of MongoDB and Microsoft Power BI. Beyond providing mere access to MongoDB Atlas data, this connector will provide a SQL interface to let you interact with semi-structured JSON data in a relational way, thereby ensuring you can take full advantage of Power BI's rich business intelligence capabilities. Importantly, through the connector, support is planned for two connectivity modes: import and direct. This new MongoDB Atlas Power BI Connector will be available in the first half of 2023. Conclusion Together with the Microsoft Intelligent Data Platform offerings, MongoDB Atlas can help operationalize the insights driven from customers’ data spread across siloed legacy databases and help build modern applications with ease. With MongoDB Atlas on Microsoft Azure, developers receive access to the most comprehensive, secure, scalable, and cloud–based developer data platform in the market. Now, with the availability of Atlas on the Azure Marketplace, it’s never been easier for users to start building with Atlas while streamlining procurement and billing processes. Get started today through the MongoDB Atlas on Azure Marketplace listing .
Break Down Silos with a Data Mesh Approach to Omnichannel Retail
Omnichannel experiences are increasingly important for customers, yet still hard for many retailers to deliver. In this article, we’ll cover an approach to unlock data from legacy silos and make it easy to operate across the enterprise — perfect for implementing an omnichannel strategy. Establishing an omnichannel retail strategy An omnichannel strategy connects multiple, siloed sales channels (web, app, store, phone, etc.) into one cohesive and consistent experience. This strategy allows customers to purchase through multiple channels with a consistent experience (Figure 1). Most established retailers started with a single point of sale or “channel” — the first store — then moved to multiple stores and introduced new channels like ecommerce, mobile, and B2B. Omnichannel is the next wave in this journey, offering customers the ability to start a journey on one channel and end it on another. Figure 1: Omnichannel experience examples. Why are retailers taking this approach? In a super-competitive industry, an omnichannel approach lets retailers maximize great customer experience, with a subsequent effect on spend and retention. Looking at recent stats , Omnisend found that purchase frequency is 250% higher on omnichannel, and Harvard Business Review’s research saw omnichannel customers spend 10% more online and 4% more in-store. Omnichannel: What's the challenge? So, if all retailers want to provide these capabilities to their customers, why aren’t they? The answer lies in the complex, siloed data architectures that underpin their application architecture. Established retailers who have built up their business over time traditionally incorporated multiple off-the-shelf products (e.g., ERP, PIMS, CMS, etc.) running on legacy data technologies into their stack (mainframe, RDBMS, file-based). With this approach, each category of data is stored in a different technology, platform, and rigid format — making it impossible to combine this data to serve omnichannel use cases (e.g., in-store stock + ecommerce to offer same-day click and collect). See Figure 2. Figure 2: Data sources for omnichannel. The next challenge is the separation of operational and historical data — older data is moved to archives, data lakes, or warehouses. Perhaps you can see today’s stock in real time, but you can’t compare it to stock on the same day last year because that is held in a different system. Any business comparison occurs after the fact. To meet the varied volume and variety of requests, retailers must extract, transform, and load (ETL) data into different databases, creating a complex disjointed web of duplicated data. Figure 3 shows a typical retailer architecture: A document database for key-value lookup, cache added for speed, wide column storage for analytics, graph databases to look up three degrees of separation, time series to track changes over time, etc. Figure 3: An example of a typical data architecture sprawl in modern retailers. The problem is that ETL’d data becomes stale as it moves between technologies, lagging behind real-time and losing context. This sprawl of technology is complex to manage and difficult to develop against — inhibiting retailers from moving quickly and adapting to new requirements. If retailers want to create experiences that can be used by consumers in real-time — operational or analytical — this architecture does not give them what they need. Additionally, if they want to use AI or machine learning models, they need access to current behavior for accuracy. Thus, the obstacle to delivering omnichannel experiences is a data problem that requires a data solution. Let's look at a smart approach to fixing it. Modern retailers are taking a data mesh approach Retail architectures have gone through many iterations, starting from vendor solutions per use case, moving toward a microservices approach, and landing into domain-driven design (Figure 4). Vendor Applications Microservices Domain-Driven Design * Each vendor decides the framework and governance of the data layer. The enterprise has no control over app or data * Microservices pull data from the API layer * Microservices and core datasets are combined into bounded contexts by business function * Data is not interoperable between components * DevOps teams control their microservices, but data is managed by a centralized enterprise team * DevOps teams control microservices AND data Figure 4: Architecture evolution. Domain-driven design has emerged through an understanding that the team with domain expertise should have control over the application layer and its associated data — this is the “bounded context” for their business function. This means they can change the data to innovate quickly, without reliance on another team. Of course, if data remains in its bounded context only, we end up with the same situation as the commercial off-the-shelf (COTS) and legacy architecture model. Where we see value is when the data in each domain can be used as a product throughout the organization. Data as a product is a core data mesh concept — it includes data, metadata, and the code and infrastructure to use it. Data as a product is expected to be discoverable (searchable), addressable, self-identifying, and interoperable (Figure 5). In a retail example, the product, customer, and store can be thought of as bounded contexts. The product bounded context contains the product data and the microservices/applications that are built for product use cases. But, for a cross-domain use case like personalized product recommendations, the data from both customer and product domains must be available “as a product.” Figure 5: Bounded contexts and data as a product. What we’re creating here is a data mesh — an enterprise data architecture that combines intentionally distributed data across distinctly defined, bounded contexts. It is a business domain-oriented, decentralized data ownership and architecture, where each makes its data available as an interoperable “data product.” The key is that the data layer must serve all real-time workloads that are required of the business — both operational and real-time analytical (Figure 6). Figure 6: Data mesh. Why use MongoDB for omnichannel data mesh Let’s look at data layer requirements needed for a data mesh move to be successful and how MongoDB can meet those requirements. Capable of handling all operational workloads: Expressive query language, including joining data, ACID transactions, and IoT collections make it great for multiple workloads. MongoDB is known for its performance and speed. The ability to use secondary indexes means that several workloads can run performantly. Search is key for retail applications — MongoDB Atlas has Lucene search engine built-in for full-text search with no data movement. Omnichannel experiences often involve mobile interaction. MongoDB Realm and Flexible Device Sync can seamlessly ensure consistency between mobile and backend. Capable of handling analytical workloads: MongoDB’s distributed architecture means analytical workloads can run on a real-time data set, without ETL or additional technology and without disturbing operational workloads. For real-time analytical use cases, the aggregation framework can be used to perform powerful data transformations and run ad hoc exploratory queries. For business intelligence or reporting workloads, data can be queried by Atlas SQL or piped through the BI Connector to other data tools (e.g., Tableau and PowerBI). Capable of serving data as a product: When serving data as a product, it is often by API: MongoDB’s BSON-based document model maps well to JSON-based API payloads for speed and ease. MongoDB Atlas provides both the Data API and the GraphQL API fully hosted. Depending on the performance needed, direct access may also be required. MongoDB has drivers for all common programming languages, meaning that other teams using different languages can easily interact with it. Rules for access of course must be defined, and one option is to use MongoDB App Services . Real-time data can also be published to Apache Kafka topics using the MongoDB Kafka Connector , which can act as a sync and a source for data. For example, one bounded context could publish data in real-time to a named Kafka topic, allowing another context to consume this and store it locally to serve latency-sensitive use cases. The tunable schema allows for flexibility in non-product fields, while schema validation capabilities enforce specific fields and data types in a collection to provide consistent datasets. Resilient, secure, and scalable: MongoDB Atlas has a 99.995% uptime guarantee and provides auto-healing capability, with multi-region and multi-cloud resiliency options. MongoDB provides the ability to scale up or down to meet your application requirements — vertically and horizontally. MongoDB follows a best-in-class security protocol. Choose the flexible data mesh approach Providing customers with omnichannel experiences isn’t easy, especially with legacy siloed data architectures. Omnichannel requires a way of making your data work easily across the organization in real-time, giving access to data to those who need it while also giving the power to innovate to the domain experts in each field. A data mesh approach provides the capability and flexibility to continuously innovate. Ready to build deeper business insights with in-app analytics and real-time business visibility? Read our new white paper: Application-Driven Analytics: In-App and Real-Time Insights for Retailers .
Securing Multi-Cloud Applications with MongoDB Atlas
The rise of multi-cloud applications offers more versatility and flexibility for teams and users alike. Developers can leverage the strengths of different cloud providers, such as more availability in certain regions, improved resilience and availability, and more diverse features for use cases such as machine learning or events. As organizations transition to a public, multi-cloud environment, however, they also need to adjust their mindset and workflows — especially where it concerns security. Using multiple cloud providers requires teams to understand different security policies, and take extra steps to avoid potential breaches. In this article, we’ll examine three security challenges associated with multi-cloud applications, and explore how MongoDB Atlas can help you mitigate or reduce the risks posed by these challenges. Challenge 1: More clouds, more procedures, more complexity Security protocols, such as authentication, authorization, and encryption, vary between cloud providers. And, as time goes on, cloud providers will continue to update their features to stay current with the market and remain competitive, adding more potential complications to multi-cloud environments. Although there are broad similarities between AWS, Azure, and GCP, there are also many subtle differences. AWS Identity and Access Management (IAM) is built around root accounts and identities, such as users and roles. Root accounts are basically administrators with unlimited access to resources, services, and billing. Users represent credentials for humans or applications that interact with AWS, whereas roles serve as temporary access permissions that can be assumed by users as needed. In contrast, Azure and GCP use role-based access control (RBAC) and implement it in different ways. Azure Active Directory allows administrators to nest different groups of users within one another, forming a hierarchy of sorts — and making it easier to assign permissions. However, GCP uses roles , which include both preset and customizable permissions (e.g., editor or viewer), and scopes , or permissions that are allotted to a specific identity concerning a certain resource or project. For example, one scope could be a read-only viewer on one project but an editor on another. Given these differences, keeping track of security permissions across various cloud providers can be tricky. As a result, teams may fail to grant access to key clients in a timely manner or accidentally authorize the wrong users, causing delays or even security breaches. Challenge 2: Contributing factors Security doesn’t exist in a vacuum, and some factors (organizational and otherwise) can complicate the work of security teams. For example, time constraints can make it harder to implement or adhere to security policies. Turnover can also create security concerns, including lost knowledge (e.g., a team may lose its AWS expert) or stolen credentials. To avoid the latter, organizations must immediately revoke access privileges for departing employees and promptly grant credentials to incoming ones. However, one study found that 50% of companies took three days or longer to revoke system access for departing employees, while 72% of companies took one week or longer to grant access to new employees. Challenge 3: Misconfigurations and human error According to the Verizon Data Breach Investigations Report , nearly 13% of breaches involved human error — primarily misconfigured cloud storage. Overall, the Verizon team found that the human element (which includes phishing and stolen credentials) was responsible for 82% of security incidents. Because misconfigurations are such common mistakes, they comprise the majority of data breaches. For example, AWS governs permissions and resources through JSON files called policies. However, unless you’re an expert in AWS IAM, it’s hard to understand what a policy might really mean. Figure 1 shows a read-only policy that was accidentally altered to include writes through the addition of a single line of code, thereby inadvertently opening it to the public. That data could be sensitive personally identifiable information (PII); for example, it could be financial data — something that really shouldn’t be modified. Figure 1: Two examples of read-only policies laid out side by side, demonstrating how a single line of code can impact your security. Although the Verizon report concluded that misconfigurations have decreased during the past two years, these mistakes (often AWS S3 buckets improperly configured for public access) have resulted in high-profile leaks worldwide. In one instance, a former AWS engineer created a tool to find and download user data from misconfigured AWS accounts . She gained access to Capital One and more than 100 million customer credentials and credit card applications. The penalties for these vulnerabilities and violations are heavy. For example, the General Data Protection Regulation (GDPR) enacts a penalty of up to four percent of an organization’s worldwide revenue or €20,000,000 — whichever is larger. In the aftermath of the security event, Capital One was fined $80 million by regulators ; other incidents have resulted in fines ranging from $35 million to $700 million . Where does MongoDB Atlas come in? MongoDB Atlas is secure by default, which means minimal configuration is required, and it’s verified by leading global and regional certifications and assurances. These assurances include critical industry standards, such as ISO 27001 for information security, HIPAA for protected healthcare information, PCI-DSS for payment card transactions, and more . By abstracting away the details of policies, roles, and other protocols, Atlas centralizes and simplifies multi-cloud security controls. Atlas provides a regional selection option to control data residency, default virtual private clients (VPCs) for resource isolation, RBAC for fine-tuning access permissions, and more. These tools support security across an entire environment, meaning you can simply configure them as needed, without worrying about the nuances of each cloud provider. Atlas is also compatible with many of the leading security technologies and managers, including Google KMS, Azure Key Vault, or AWS KMS, enabling users to either bring their own keys or to secure their clusters with the software of their choice. Additionally, data is always encrypted in transit and at rest. For example, you can run rich queries on fully encrypted data using Queryable Encryption , which allows you to extract insights without compromising security. Data is only decrypted when the results are returned to the driver — where the key is located — otherwise, encrypted fields will display as randomized ciphertext. One real-world example involves a 2013 data breach at a supermarket chain in the United Kingdom, where a disgruntled employee accessed the personal data of nearly 100,000 employees. If Queryable Encryption had been available and in use at the time, the perpetrator would have downloaded only cipher text. With MongoDB Atlas, securing multi-cloud environments is simple and straightforward. Teams can use a single, streamlined interface to manage their security needs. There is no need to balance different security procedures and structures or keep track of different tools like hyperscalers or key management systems. Enjoy a streamlined, secure multi-cloud experience — sign up for a free MongoDB Atlas cluster today .
How to Get Mobile Data Sync Right with Mobile Backend as a Service (MBaaS)
Twenty years ago, Watts Humphrey, known today as the "Father of Software Quality," declared that every business is a software business. While his insight now seems obvious, digital technology has evolved to where we can add to it: Every business is also a mobile business. According to Gartner , 75% of enterprise data will be generated and processed away from the central data center by 2025. And according to data.ai, 84% of enterprises attribute growth in productivity to mobile apps. Today, mobile tech transforms every aspect of business. It enables the workforce through point-of-sale, inventory, service, and sales. It streamlines critical business processes like self-checkout and customer communications. And it powers essential work devices from telemetry to IoT to manufacturing. The data businesses capture on mobile and edge devices can be used to improve operational efficiency, drive process improvements, and deliver richer, real-time app experiences. But all of this requires a solution for synchronizing mobile data with backend systems, where it can be combined with other historical data, analyzed, or fed into predictive intelligence algorithms to surface new insights and trigger other value-add activities. But syncing mobile data with backend systems can be hard for a number of reasons. Mobile devices are constantly going in and out of coverage. When connections break and then resume, conflicts emerge between edits that were made on devices while offline and other data that's being processed on the backend. So conflict resolution becomes a crucial part of ensuring changes on the mobile device are captured on the backend in a way that ensures data integrity. Sync and swim Apps that are not designed with backend sync in mind can take a long time to load, are prone to crashing, and show stale information. When apps don’t deliver positive experiences, people stop trusting them — and stop using them. On the other hand, an app with robust sync between a device’s local data store and the back end lets workers see live data across users and devices, allowing for real-time collaboration and decision-making. According to Deloitte , 70% of workers don’t sit at a desk every day, so the ability to sync data will increasingly drive business outcomes. Indian startup, FloBiz , uses MongoDB Atlas Device Sync to handle the difficult job of keeping the mobile, desktop, and web apps in sync. This means even if multiple users were using the same account, going offline and online, there would be no issues, duplications or lost data. Why data sync is difficult A lot of organizations choose to build their own sync solutions. DIY solutions can go one of two ways: overly complex or oversimplified, resulting in sync that happens only a few times a day or in only one direction. It can be complicated and time-consuming for developers to write their own conflict-resolution code because building data sync the right way takes potentially thousands of lines of code. Developers frequently underestimate the challenge because it seems straightforward on the surface. They assume sync consists simply of the application making a request from the server, receiving some data, and using that data to update the app’s UI on the device. But when building for mobile devices this is a massive oversimplification. Building data sync can be more complicated than people assume. When developers attempt to build their own sync tool, they typically use RESTful APIs to connect the mobile app with the backend and exchange data between them. Mobile apps are often built more like web apps in the beginning. But once the need to handle offline scenarios arises, and because some functionality requires local persistence, then it becomes necessary to add a mobile database. Syncing with that mobile database then becomes a challenge. The exchange of data between the device and the back end gets complicated. It requires the developer to anticipate numerous offline use cases and write complex conflict-resolution code. It can be done, but it’s a time-consuming process that’s not guaranteed to solve all use cases. When data is requested, applications need to understand whether a network is available, and if not, whether the appropriate data is stored locally, leading to complex query, retry, and error handling logic. The worst part about all this complexity is that it’s non-differentiating, meaning it doesn’t set the business apart from the competition. Users expect the functionality powered by data sync and won’t tolerate anything less. An integrated, out-of-the-box solution MongoDB's Atlas Device Sync combined with Realm is a mobile backend as a service (MBaaS) solution that enables developers to build offline-first applications that automatically refresh when a connection is reestablished. Local and edge data persistence is managed by Realm, a development platform designed for modern, data-driven applications. Developers use Realm to build mobile, web, desktop, and IoT apps. Realm is a fast and scalable alternative to SQLite and Core Data for client-side persistence. The bidirectional data synchronization service between Realm and MongoDB Atlas allows businesses to do more with their data at the edge by tapping into some of MongoDB’s more powerful data processing capabilities in the cloud. Complex synchronization problems such as conflict resolution are handled automatically by MongoDB’s built-in sync. To learn more about the challenges of building real-time mobile apps that scale, with sample use cases about how thousands of businesses are handling it today, download our white paper, Building Real-time Mobile Apps that Scale .
Introducing MongoDB Spark Connector Version 10.1
Today, MongoDB released version 10.1 of the MongoDB Spark Connector. In this post, we highlight key features of this new release. Microbatch streaming support The MongoDB Spark connection version 10 introduced support for Apache Structured Spark Streaming. In this initial release, continuous mode streaming was the only mode supported. In this 10.1 update, microbatch mode is now supported, enabling you to stream writes to destinations that currently do not support continuous mode streams, such as Amazon S3 storage. Increased control of write behavior When the Spark Connector issues a write, the default behavior is for an upsert to occur. This can cause problems in some scenarios in which you may not want an upsert, such as with time series collections. There is a new configuration parameter, upsertDocument , that, when set to false, will only issue insert statements on write. solar.write.format("mongodb").mode("append").option("database", "sensors").option("collection", "panels").option("upsertDocument", "false").save() In the above code snippet we are writing to the "panels" time series collection by setting the upsertDocument to false. Alternatively, you can set operationType to the value, “insert”. Setting this option will ignore any upsertDocument option set. Support for BSON types The data types supported in BSON are not exactly the same as those supported in a Spark dataframe. For example, Spark doesn't support ObjectId as a type specifically. To mitigate these scenarios where you need to leverage different BSON types, you can now set the new configuration values : spark.mongodb.read.outputExtendedJson=<true/false> spark.mongodb.write.convertJson=<true/false> This will enable you to effectively leverage BSON datatypes within your Spark application. Call to action Version 10.1 of the MongoDB Spark Connector continues to enhance the streaming capabilities with support for microbatch processing. This version also adds more granular support for writing to MongoDB supporting use cases like time series collections. For those users wanting to upgrade from the 3.x version but could not because of lack of BSON data type support, the 10.1 version now provides an option for using BSON data types. To learn more about the MongoDB Spark Connector check out the online documentation . You can download the latest version of the MongoDB Spark Connector from the maven repository .