Published on April 26, 2024
Building serverless AI applications with Amazon Bedrock.
With serverless computing, you don’t have to worry about the heavy lifting of running servers or managing resources. It’s all about focusing on your app’s features. Amazon Bedrock and other services we’ll help you use powerful AI models to do things like understanding what’s said in an audio file or responding to user queries, all without needing to be an expert in AI.
In this guide, we’ll cover the basics like setting up your project, to more advanced stuff like turning speech into text and summarizing conversations. By the end, you’ll know how to make apps that can interact with users, using Amazon’s cloud services.
What is Amazon Bedrock?
Amazon Bedrock lets you easily use large language models without worrying about the underlying infrastructure. In this tutorial, we will use Amazon Bedrock to interact with these models. First, we’ll set up a simple environment by creating a new directory for our project.
Basic response for generating a paragraph
Here, you’ll learn how to generate a paragraph from a model by writing a Python script that uses Amazon Bedrock. We’ll start with a simple task: asking the model to summarize a topic in one sentence. This example will help you understand how to interact with the model and process its responses.
import boto3
import json
# Initialize the Bedrock runtime client
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
# Define the prompt
prompt = "Write a one sentence summary of Las Vegas."
# Define the request parameters
kwargs = {
"modelId": "amazon.titan-text-express-v1",
"contentType": "application/json",
"accept": "*/*",
"body": json.dumps(
{
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 100,
"temperature": 0.7,
"topP": 0.9
}
}
)
}
# Invoke the model and parse the response
response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())
# Extract and print the generated text
generation = response_body['results'][0]['outputText']
print(generation)
Las Vegas is a famous city known for its gambling, entertainment, and nightlife. It is located in Nevada and is the largest city within the Mojave Desert. Las Vegas is home to several iconic landmarks, including the Las Vegas Strip, the Bellagio, and the MGM Grand. The city is a popular destination for tourists from around the world, who come to enjoy the luxurious casinos, hotels, restaurants, and shows. Las Vegas is also known for its world-class entertainment, including live
Summarising the transcript file of a call
Now, we’ll take it a step further by summarizing the transcript of a conversation. This involves reading a text file, sending it to the model, and asking it to condense the conversation into key points. This part of the tutorial will show you how to process larger pieces of text and use models to identify and summarize important information.
import boto3
import json
# Initialize the Bedrock runtime client
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
# Define the prompt
prompt = "Write a one sentence summary of Las Vegas."
# Define the request parameters
kwargs = {
"modelId": "amazon.titan-text-express-v1",
"contentType": "application/json",
"accept": "*/*",
"body": json.dumps(
{
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 100,
"temperature": 0.7,
"topP": 0.9
}
}
)
}
# Invoke the model and parse the response
response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())
# Extract and print the generated text
generation = response_body['results'][0]['outputText']
print(generation)
Alex is looking to book a room for his 10th wedding anniversary at the Crystal Heights Hotel in Singapore. The hotel offers several room types that offer stunning views of the city skyline and the fictional Sapphire Bay. The special diamond suite even comes with exclusive access to the moonlit pool and star deck. The package includes breakfast, complimentary access to the moonlit pool and star deck, a one-time spa treatment for two, and a special romantic dinner at the cloud nine restaurant. A preauthorization amount of $1000 will be held on the card, which will be released upon checkout. There is a 10% service charge and a 7% fantasy tax applied to the room rate.
Summarising an audio file from a call
Next, we’ll work with audio files. You’ll learn how to upload an audio recording to AWS S3, transcribe it into text using Amazon Transcribe, and then summarize the conversation using the model. This section covers the entire process from handling audio files, transcribing them, and summarizing the content, which is a common workflow in processing audio data.
For the next sample, we will use the following process:
Import packages and load the audio file.
Setup:
S3 client.
Transcribe client.
Upload the audio file to S3.
Create the unique job name.
Build the transcription response.
Access the needed parts of the transcript.
Setup Bedrock runtime.
Create the prompt template.
Configure the model response.
Generate a summary of the audio transcript.
import os
import boto3
import uuid
import time
from IPython.display import Audio
# Initialize Bedrock runtime and S3 client
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
audio = Audio(filename="dialog.mp3")
bucket_name = os.environ['AI_LEARN_BEDROCK_BUCKETNAME']
# Setup S3 and upload audio file
s3_client = boto3.client('s3', region_name='us-east-1')
file_name = 'dialog.mp3'
s3_client.upload_file(file_name, bucket_name, file_name)
# Setup Transcribe client and generate a unique job ID
transcribe_client = boto3.client('transcribe', region_name='us-east-1')
job_name = 'transcription-job-' + str(uuid.uuid4())
response = transcribe_client.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': f's3://{bucket_name}/{file_name}'},
MediaFormat='mp3',
LanguageCode='en-US',
OutputBucketName=bucket_name,
Settings={
'ShowSpeakerLabels': True,
'MaxSpeakerLabels': 2
}
)
# Poll transcription job until completion
while True:
status = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
time.sleep(2)
# Print the transcription job status
print(status['TranscriptionJob']['TranscriptionJobStatus'])
Upload the audio file to s3
Build a string for the job (transcription is an async process, so we need a job to be able to know its state and know when it is ready) We need to poll the transcription job to see if it's complete or failed for some reason.This returns the status, but not the actual transcription.
Access the needed parts of the transcript
In this next step, you will access the transcription result, extract the necessary parts of the conversation, and format it for summarization. This teaches you how to navigate and use the output from Amazon Transcribe, preparing it for further processing with Amazon Bedrock.
import json
if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':
# Load the transcript from S3
transcript_key = f"{job_name}.json"
transcript_obj = s3_client.get_object(Bucket=bucket_name, Key=transcript_key)
transcript_text = transcript_obj['Body'].read().decode('utf-8')
transcript_json = json.loads(transcript_text)
output_text = ""
current_speaker = None
items = transcript_json['results']['items']
# Process each item in the transcript
for item in items:
speaker_label = item.get('speaker_label', None)
content = item['alternatives'][0]['content']
# Add speaker label at the start of a new line
if speaker_label is not None and speaker_label != current_speaker:
current_speaker = speaker_label
output_text += f"
{current_speaker}: "
# Add the speech content
if item['type'] == 'punctuation':
output_text = output_text.rstrip()
output_text += f"{content} "
# Save the transcript to a text file
with open(f'{job_name}.txt', 'w') as f:
f.write(output_text)
Setup Bedrock runtime, Create the prompt template
In this step, you’ll set up the Bedrock runtime and create a template for the prompt that will be sent to the model. This part involves reading the formatted transcript, creating a structured prompt that guides the model in generating a summary, and configuring the model response. It’s a crucial step in translating the raw transcript into a format that the model can understand and respond to effectively.
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')
# Read the transcript from the text file
with open(f'{job_name}.txt', "r") as file:
transcript = file.read()
# Read the prompt template from the text file
with open('prompt_template.txt', "r") as file:
template_string = file.read()
# Prepare the data for the template
data = {
'transcript': transcript
}
# Render the template with the data
template = Template(template_string)
prompt = template.render(data)
print(prompt)
###
I need to summarize a conversation. The transcript of the
conversation is between the XML-like tags.
spk_0: Hi, is this the Crystal Heights Hotel in Singapore?
spk_1: Yes, it is. Good afternoon. How may I assist you today?
spk_0: Fantastic, good afternoon. I was looking to book a room for my 10th wedding anniversary. I've heard your hotel offers exceptional views and services. Could you tell me more?
spk_1: Absolutely, Alex, and congratulations on your upcoming anniversary. That's a significant milestone, and we'd be honored to make it a special occasion for you. We have several room types that offer stunning views of the city skyline and the fictional Sapphire Bay. Our special diamond suite even comes with exclusive access to the moonlit pool and star deck. We also have in-house spa services, world-class dining options, and a shopping arcade.
spk_0: That sounds heavenly. I think my spouse would love the moonlit pool. Can you help me make a reservation for one of your diamond suites with a Sapphire Bay view?
spk_1: Of course. May I know the dates you're planning to visit?
spk_0: Sure. It would be from October 10th to 17th.
spk_1: Excellent. Let me check the availability. Ah, it looks like we have a diamond suite available for those dates. Would you like to proceed with the reservation?
spk_0: Definitely. What's included in the package?
spk_1: Wonderful. The package includes breakfast, complimentary access to the moonlit pool and star deck, a one-time spa treatment for two, and a special romantic dinner at our Cloud Nine restaurant.
spk_0: You're making it impossible to resist. Let's go ahead with the booking.
spk_1: Great. I'll need some personal information for the reservation. Can I get your full name, contact details, and a credit card for the preauthorization?
spk_0: Certainly. My full name is Alexander Thompson. My contact number is 12345678910. And the credit card is, wait, did you say preauthorization? How much would that be?
spk_1: Ah, I should have mentioned that earlier. My apologies. A preauthorization amount of $1000 will be held on your card, which would be released upon checkout.
spk_0: $1000. That seems a bit excessive, don't you think?
spk_1: I understand your concern, Alex. The preauthorization is a standard procedure to cover any incidental expenses you may incur during your stay. However, I assure you it's only a hold and not an actual charge.
spk_0: That's still a lot. Are there any additional charges that I should know about?
spk_1: Well, there is a 10% service charge and a 7% fantasy tax applied to the room rate.
spk_0: Mm. You know what? It's a special occasion. So let's go ahead.
spk_1: Thank you, Alex, for understanding. We'll ensure that your experience at Crystal Heights is well worth it.
The summary must contain a one-word sentiment analysis and
a list of issues, problems, or causes of friction
during the conversation. The output must be provided in
JSON format shown in the following example.
Example output:
{
"sentiment": ,
"issues": [
{
"topic": ,
"summary":
}
]
}
Write the JSON output and nothing more.
Here is the JSON output:
Configure the model response
Finally, you will configure the model’s response to your prompt. This includes setting parameters for the text generation and invoking the model with your prepared prompt. This last step completes the process of summarizing audio transcripts by extracting meaningful insights and presenting them in a structured format. This demonstrates the full potential of integrating various AWS services to process and understand audio data.
kwargs = {
"modelId": "amazon.titan-text-lite-v1",
"contentType": "application/json",
"accept": "*/*",
"body": json.dumps(
{
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 512,
"temperature": 0,
"topP": 0.9
}
}
)
}
response = bedrock_runtime.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())
generation = response_body['results'][0]['outputText']
print(generation)
###
{
"sentiment": "Positive",
"issues": [
{
"topic": "Hotel services",
"summary": "The hotel offers exceptional views and services."
},
{
"topic": "Room booking",
"summary": "The hotel has several room types that offer stunning views of the city skyline and the fictional Sapphire Bay."
},
{
"topic": "Diamond suite",
"summary": "The diamond suite comes with exclusive access to the moonlit pool and star deck."
},
{
"topic": "Spa services",
"summary": "The hotel has in-house spa services, world-class dining options, and a shopping arcade."
},
{
"topic": "Reservation process",
"summary": "The reservation process includes breakfast, complimentary access to the moonlit pool and star deck, a one-time spa treatment for two, and a special romantic dinner at the cloud nine restaurant."
},
{
"topic": "Pre-authorization",
"summary": "A pre-authorization of $1000 is held on the credit card, which is released upon checkout."
},
{
"topic": "Additional charges",
"summary": "There is a 10% service charge and a 7% fantasy tax applied to the room rate."
}
]
}
Conclusion
By now, you should have a solid foundation in using Amazon Bedrock alongside services like Amazon Transcribe and AWS Lambda. These skills are not just limited to the examples shown, but can be extended to a wide range of applications, from automated customer support to data analysis and beyond. The key takeaway is the ease with which we can integrate various AWS services to build powerful, serverless applications that understand and interact with human language, opening up a world of possibilities for developers and businesses alike.